nxcals.api.extraction.data.builders.DataFrame.inputFiles

DataFrame.inputFiles() List[str]

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

New in version 3.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns:

List of file paths.

Return type:

list

Examples

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a single-row DataFrame into a JSON file
...     spark.createDataFrame(
...         [{"age": 100, "name": "Hyukjin Kwon"}]
...     ).repartition(1).write.json(d, mode="overwrite")
...
...     # Read the JSON file as a DataFrame.
...     df = spark.read.format("json").load(d)
...
...     # Returns the number of input files.
...     len(df.inputFiles())
1