nxcals.api.extraction.data.builders.DataFrame.sample
- DataFrame.sample(fraction: float, seed: Optional[int] = None) DataFrame
- DataFrame.sample(withReplacement: Optional[bool], fraction: float, seed: Optional[int] = None) DataFrame
Returns a sampled subset of this
DataFrame
.New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters:
withReplacement (bool, optional) – Sample with replacement or not (default
False
).fraction (float, optional) – Fraction of rows to generate, range [0.0, 1.0].
seed (int, optional) – Seed for sampling (default a random seed).
- Returns:
Sampled rows from given DataFrame.
- Return type:
Notes
This is not guaranteed to provide exactly the fraction specified of the total count of the given
DataFrame
.fraction is required and, withReplacement and seed are optional.
Examples
>>> df = spark.range(10) >>> df.sample(0.5, 3).count() 7 >>> df.sample(fraction=0.5, seed=3).count() 7 >>> df.sample(withReplacement=True, fraction=0.5, seed=3).count() 1 >>> df.sample(1.0).count() 10 >>> df.sample(fraction=1.0).count() 10 >>> df.sample(False, fraction=1.0).count() 10