nxcals.api.extraction.data.builders.DataFrame.sample

DataFrame.sample(fraction: float, seed: Optional[int] = None) → DataFrame

DataFrame.sample(withReplacement: Optional[bool], fraction: float, seed: Optional[int] = None) → DataFrame

Returns a sampled subset of this DataFrame.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters:

withReplacement (bool, optional) – Sample with replacement or not (default False).
fraction (float, optional) – Fraction of rows to generate, range [0.0, 1.0].
seed (int, optional) – Seed for sampling (default a random seed).

Returns:

Sampled rows from given DataFrame.

Return type:

DataFrame

Notes

This is not guaranteed to provide exactly the fraction specified of the total count of the given DataFrame.

fraction is required and, withReplacement and seed are optional.

Examples

>>> df = spark.range(10)
>>> df.sample(0.5, 3).count() 
7
>>> df.sample(fraction=0.5, seed=3).count() 
7
>>> df.sample(withReplacement=True, fraction=0.5, seed=3).count() 
1
>>> df.sample(1.0).count()
10
>>> df.sample(fraction=1.0).count()
10
>>> df.sample(False, fraction=1.0).count()
10