nxcals.api.extraction.data.builders.DataFrame.groupBy
- DataFrame.groupBy(*cols: ColumnOrName) GroupedData
- DataFrame.groupBy(__cols: Union[List[Column], List[str]]) GroupedData
Groups the
DataFrame
using the specified columns, so we can run aggregation on them. SeeGroupedData
for all the available aggregate functions.groupby()
is an alias forgroupBy()
.New in version 1.3.0.
- Parameters:
cols (list, str or
Column
) – columns to group by. Each element should be a column name (string) or an expression (Column
).
Examples
>>> df.groupBy().avg().collect() [Row(avg(age)=3.5)] >>> sorted(df.groupBy('name').agg({'age': 'mean'}).collect()) [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)] >>> sorted(df.groupBy(df.name).avg().collect()) [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)] >>> sorted(df.groupBy(['name', df.age]).count().collect()) [Row(name='Alice', age=2, count=1), Row(name='Bob', age=5, count=1)]