nxcals.api.extraction.data.builders.DataFrame.repartition

DataFrame.repartition(numPartitions: int, *cols: ColumnOrName) → DataFrame

DataFrame.repartition(*cols: ColumnOrName) → DataFrame

Returns a new DataFrame partitioned by the given partitioning expressions. The resulting DataFrame is hash partitioned.

New in version 1.3.0.

Parameters:

numPartitions (int) – can be an int to specify the target number of partitions or a Column. If it is a Column, it will be used as the first partitioning column. If not specified, the default number of partitions is used.
cols (str or Column) –
partitioning columns.

Changed in version 1.6: Added optional arguments to specify the partitioning columns. Also made numPartitions optional if partitioning columns are specified.

Examples

>>> df.repartition(10).rdd.getNumPartitions()
10
>>> data = df.union(df).repartition("age")
>>> data.show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
|  2|Alice|
|  5|  Bob|
+---+-----+
>>> data = data.repartition(7, "age")
>>> data.show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
|  2|Alice|
|  5|  Bob|
+---+-----+
>>> data.rdd.getNumPartitions()
7
>>> data = data.repartition(3, "name", "age")
>>> data.show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  5|  Bob|
|  2|Alice|
|  2|Alice|
+---+-----+