nxcals.api.extraction.data.builders.DataFrame.drop
- DataFrame.drop(cols: ColumnOrName) DataFrame
- DataFrame.drop(*cols: str) DataFrame
Returns a new
DataFrame
without specified columns. This is a no-op if the schema doesn’t contain the given column name(s).Added in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters:
cols (str or
Column
) – a name of the column, or theColumn
to drop- Returns:
DataFrame without given columns.
- Return type:
Notes
When an input is a column name, it is treated literally without further interpretation. Otherwise, will try to match the equivalent expression. So that dropping column by its name drop(colName) has different semantic with directly dropping the column drop(col(colName)).
Examples
>>> from pyspark.sql import Row >>> from pyspark.sql.functions import col, lit >>> df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) >>> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
>>> df.drop('age').show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+ >>> df.drop(df.age).show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+
Drop the column that joined both DataFrames on.
>>> df.join(df2, df.name == df2.name, 'inner').drop('name').sort('age').show() +---+------+ |age|height| +---+------+ | 14| 80| | 16| 85| +---+------+
>>> df3 = df.join(df2) >>> df3.show() +---+-----+------+----+ |age| name|height|name| +---+-----+------+----+ | 14| Tom| 80| Tom| | 14| Tom| 85| Bob| | 23|Alice| 80| Tom| | 23|Alice| 85| Bob| | 16| Bob| 80| Tom| | 16| Bob| 85| Bob| +---+-----+------+----+
Drop two column by the same name.
>>> df3.drop("name").show() +---+------+ |age|height| +---+------+ | 14| 80| | 14| 85| | 23| 80| | 23| 85| | 16| 80| | 16| 85| +---+------+
Can not drop col(‘name’) due to ambiguous reference.
>>> df3.drop(col("name")).show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [AMBIGUOUS_REFERENCE] Reference...
>>> df4 = df.withColumn("a.b.c", lit(1)) >>> df4.show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+
>>> df4.drop("a.b.c").show() +---+-----+ |age| name| +---+-----+ | 14| Tom| | 23|Alice| | 16| Bob| +---+-----+
Can not find a column matching the expression “a.b.c”.
>>> df4.drop(col("a.b.c")).show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+