pyspark.sql.DataFrame.fillna#

DataFrame.fillna(value, subset=None)[source]#

Returns a new DataFrame which null values are filled with new value. DataFrame.fillna() and DataFrameNaFunctions.fill() are aliases of each other.

New in version 1.3.1.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

valueint, float, string, bool or dict, the value to replace null values with.: If the value is a dict, then subset is ignored and value must be a mapping from column name (string) to replacement value. The replacement value must be an int, float, boolean, or string.
subsetstr, tuple or list, optional: optional list of column names to consider. Columns specified in subset that do not have matching data types are ignored. For example, if value is a string, and subset contains a non-string column, then the non-string column is simply ignored.

Returns

DataFrame: DataFrame with replaced null values.

Examples

>>> df = spark.createDataFrame([
...     (10, 80.5, "Alice", None),
...     (5, None, "Bob", None),
...     (None, None, "Tom", None),
...     (None, None, None, True)],
...     schema=["age", "height", "name", "bool"])

Example 1: Fill all null values with 50 for numeric columns.

>>> df.na.fill(50).show()
+---+------+-----+----+
|age|height| name|bool|
+---+------+-----+----+
| 10|  80.5|Alice|NULL|
|  5|  50.0|  Bob|NULL|
| 50|  50.0|  Tom|NULL|
| 50|  50.0| NULL|true|
+---+------+-----+----+

Example 2: Fill all null values with False for boolean columns.

>>> df.na.fill(False).show()
+----+------+-----+-----+
| age|height| name| bool|
+----+------+-----+-----+
|  10|  80.5|Alice|false|
|   5|  NULL|  Bob|false|
|NULL|  NULL|  Tom|false|
|NULL|  NULL| NULL| true|
+----+------+-----+-----+

Example 3: Fill all null values with to 50 and “unknown” for: ‘age’ and ‘name’ column respectively.

>>> df.na.fill({'age': 50, 'name': 'unknown'}).show()
+---+------+-------+----+
|age|height|   name|bool|
+---+------+-------+----+
| 10|  80.5|  Alice|NULL|
|  5|  NULL|    Bob|NULL|
| 50|  NULL|    Tom|NULL|
| 50|  NULL|unknown|true|
+---+------+-------+----+

Example 4: Fill all null values with “Spark” for ‘name’ column.

>>> df.na.fill(value = 'Spark', subset = 'name').show()
+----+------+-----+----+
| age|height| name|bool|
+----+------+-----+----+
|  10|  80.5|Alice|NULL|
|   5|  NULL|  Bob|NULL|
|NULL|  NULL|  Tom|NULL|
|NULL|  NULL|Spark|true|
+----+------+-----+----+