nxcals.api.extraction.data.builders.DataFrame.transform

DataFrame.transform(func: Callable[[...], DataFrame], *args: Any, **kwargs: Any) DataFrame

Returns a new DataFrame. Concise syntax for chaining custom transformations.

Added in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters:
  • func (function) – a function that takes and returns a DataFrame.

  • *args

    Positional arguments to pass to func.

    Added in version 3.3.0.

  • **kwargs

    Keyword arguments to pass to func.

    Added in version 3.3.0.

Returns:

Transformed DataFrame.

Return type:

DataFrame

Examples

>>> from pyspark.sql.functions import col
>>> df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
>>> def cast_all_to_int(input_df):
...     return input_df.select([col(col_name).cast("int") for col_name in input_df.columns])
...
>>> def sort_columns_asc(input_df):
...     return input_df.select(*sorted(input_df.columns))
...
>>> df.transform(cast_all_to_int).transform(sort_columns_asc).show()
+-----+---+
|float|int|
+-----+---+
|    1|  1|
|    2|  2|
+-----+---+
>>> def add_n(input_df, n):
...     return input_df.select([(col(col_name) + n).alias(col_name)
...                             for col_name in input_df.columns])
>>> df.transform(add_n, 1).transform(add_n, n=10).show()
+---+-----+
|int|float|
+---+-----+
| 12| 12.0|
| 13| 13.0|
+---+-----+