nxcals.api.extraction.data.builders.DataFrame.freqItems

DataFrame.freqItems(cols: Union[List[str], Tuple[str]], support: Optional[float] = None) DataFrame

Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in “https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou”. DataFrame.freqItems() and DataFrameStatFunctions.freqItems() are aliases.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters:
  • cols (list or tuple) – Names of the columns to calculate frequent items for as a list or tuple of strings.

  • support (float, optional) – The frequency with which to consider an item ‘frequent’. Default is 1%. The support must be greater than 1e-4.

Returns:

DataFrame with frequent items.

Return type:

DataFrame

Notes

This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.

Examples

>>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
>>> df.freqItems(["c1", "c2"]).show()  
+------------+------------+
|c1_freqItems|c2_freqItems|
+------------+------------+
|   [4, 1, 3]| [8, 11, 10]|
+------------+------------+