nxcals.api.extraction.data.builders.DataFrame.groupBy

DataFrame.groupBy(*cols: ColumnOrName) GroupedData
DataFrame.groupBy(__cols: Union[List[Column], List[str]]) GroupedData

Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.

groupby() is an alias for groupBy().

New in version 1.3.0.

Parameters:

cols (list, str or Column) – columns to group by. Each element should be a column name (string) or an expression (Column).

Examples

>>> df.groupBy().avg().collect()
[Row(avg(age)=3.5)]
>>> sorted(df.groupBy('name').agg({'age': 'mean'}).collect())
[Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)]
>>> sorted(df.groupBy(df.name).avg().collect())
[Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)]
>>> sorted(df.groupBy(['name', df.age]).count().collect())
[Row(name='Alice', age=2, count=1), Row(name='Bob', age=5, count=1)]