nxcals.api.extraction.data.builders.DataFrame.drop

DataFrame.drop(cols: ColumnOrName) DataFrame
DataFrame.drop(*cols: str) DataFrame

Returns a new DataFrame without specified columns. This is a no-op if the schema doesn’t contain the given column name(s).

Added in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters:

cols (str or Column) – a name of the column, or the Column to drop

Returns:

DataFrame without given columns.

Return type:

DataFrame

Notes

When an input is a column name, it is treated literally without further interpretation. Otherwise, will try to match the equivalent expression. So that dropping column by its name drop(colName) has different semantic with directly dropping the column drop(col(colName)).

Examples

>>> from pyspark.sql import Row
>>> from pyspark.sql.functions import col, lit
>>> df = spark.createDataFrame(
...     [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
>>> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
>>> df.drop('age').show()
+-----+
| name|
+-----+
|  Tom|
|Alice|
|  Bob|
+-----+
>>> df.drop(df.age).show()
+-----+
| name|
+-----+
|  Tom|
|Alice|
|  Bob|
+-----+

Drop the column that joined both DataFrames on.

>>> df.join(df2, df.name == df2.name, 'inner').drop('name').sort('age').show()
+---+------+
|age|height|
+---+------+
| 14|    80|
| 16|    85|
+---+------+
>>> df3 = df.join(df2)
>>> df3.show()
+---+-----+------+----+
|age| name|height|name|
+---+-----+------+----+
| 14|  Tom|    80| Tom|
| 14|  Tom|    85| Bob|
| 23|Alice|    80| Tom|
| 23|Alice|    85| Bob|
| 16|  Bob|    80| Tom|
| 16|  Bob|    85| Bob|
+---+-----+------+----+

Drop two column by the same name.

>>> df3.drop("name").show()
+---+------+
|age|height|
+---+------+
| 14|    80|
| 14|    85|
| 23|    80|
| 23|    85|
| 16|    80|
| 16|    85|
+---+------+

Can not drop col(‘name’) due to ambiguous reference.

>>> df3.drop(col("name")).show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [AMBIGUOUS_REFERENCE] Reference...
>>> df4 = df.withColumn("a.b.c", lit(1))
>>> df4.show()
+---+-----+-----+
|age| name|a.b.c|
+---+-----+-----+
| 14|  Tom|    1|
| 23|Alice|    1|
| 16|  Bob|    1|
+---+-----+-----+
>>> df4.drop("a.b.c").show()
+---+-----+
|age| name|
+---+-----+
| 14|  Tom|
| 23|Alice|
| 16|  Bob|
+---+-----+

Can not find a column matching the expression “a.b.c”.

>>> df4.drop(col("a.b.c")).show()
+---+-----+-----+
|age| name|a.b.c|
+---+-----+-----+
| 14|  Tom|    1|
| 23|Alice|    1|
| 16|  Bob|    1|
+---+-----+-----+