nxcals.api.extraction.data.builders.DataFrame.corr

DataFrame.corr(col1: str, col2: str, method: Optional[str] = None) float

Calculates the correlation of two columns of a DataFrame as a double value. Currently only supports the Pearson Correlation Coefficient. DataFrame.corr() and DataFrameStatFunctions.corr() are aliases of each other.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters:
  • col1 (str) – The name of the first column

  • col2 (str) – The name of the second column

  • method (str, optional) – The correlation method. Currently only supports “pearson”

Returns:

Pearson Correlation Coefficient of two columns.

Return type:

float

Examples

>>> df = spark.createDataFrame([(1, 12), (10, 1), (19, 8)], ["c1", "c2"])
>>> df.corr("c1", "c2")
-0.3592106040535498
>>> df = spark.createDataFrame([(11, 12), (10, 11), (9, 10)], ["small", "bigger"])
>>> df.corr("small", "bigger")
1.0