pyspark.sql.DataFrame.fillna#

DataFrame.fillna(value, subset=None)[source]#

Returns a new DataFrame which null values are filled with new value. DataFrame.fillna() and DataFrameNaFunctions.fill() are aliases of each other.

New in version 1.3.1.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
valueint, float, string, bool or dict, the value to replace null values with.

If the value is a dict, then subset is ignored and value must be a mapping from column name (string) to replacement value. The replacement value must be an int, float, boolean, or string.

subsetstr, tuple or list, optional

optional list of column names to consider. Columns specified in subset that do not have matching data types are ignored. For example, if value is a string, and subset contains a non-string column, then the non-string column is simply ignored.

Returns
DataFrame

DataFrame with replaced null values.

Examples

>>> df = spark.createDataFrame([
...     (10, 80.5, "Alice", None),
...     (5, None, "Bob", None),
...     (None, None, "Tom", None),
...     (None, None, None, True)],
...     schema=["age", "height", "name", "bool"])

Example 1: Fill all null values with 50 for numeric columns.

>>> df.na.fill(50).show()
+---+------+-----+----+
|age|height| name|bool|
+---+------+-----+----+
| 10|  80.5|Alice|NULL|
|  5|  50.0|  Bob|NULL|
| 50|  50.0|  Tom|NULL|
| 50|  50.0| NULL|true|
+---+------+-----+----+

Example 2: Fill all null values with False for boolean columns.

>>> df.na.fill(False).show()
+----+------+-----+-----+
| age|height| name| bool|
+----+------+-----+-----+
|  10|  80.5|Alice|false|
|   5|  NULL|  Bob|false|
|NULL|  NULL|  Tom|false|
|NULL|  NULL| NULL| true|
+----+------+-----+-----+
Example 3: Fill all null values with to 50 and “unknown” for

‘age’ and ‘name’ column respectively.

>>> df.na.fill({'age': 50, 'name': 'unknown'}).show()
+---+------+-------+----+
|age|height|   name|bool|
+---+------+-------+----+
| 10|  80.5|  Alice|NULL|
|  5|  NULL|    Bob|NULL|
| 50|  NULL|    Tom|NULL|
| 50|  NULL|unknown|true|
+---+------+-------+----+

Example 4: Fill all null values with “Spark” for ‘name’ column.

>>> df.na.fill(value = 'Spark', subset = 'name').show()
+----+------+-----+----+
| age|height| name|bool|
+----+------+-----+----+
|  10|  80.5|Alice|NULL|
|   5|  NULL|  Bob|NULL|
|NULL|  NULL|  Tom|NULL|
|NULL|  NULL|Spark|true|
+----+------+-----+----+