option (DataFrameWriter)

Adds an output option for the underlying data source. For some available options, see Options.

Syntax

option(key, value)

Parameters

Parameter Type Description
key str The option key.
value str, int, float, or bool The option value.

Options

The following table contains some writer options:

Key Formats Description
arrayElementName XML The element name for array elements that have no explicit name. Default: item. Applies to xml (DataFrameWriter).
attributePrefix XML The prefix prepended to field names that correspond to XML attributes. Default: _. Applies to xml (DataFrameWriter).
avroSchema Avro The full Avro schema as a JSON string. Use this option to convert Spark SQL types to specific Avro types. Applies to Avro file.
charToEscapeQuoteEscaping CSV The character used to escape the escape character when it differs from the quote character. Default: \0 (not enabled). Applies to csv (DataFrameWriter).
clusterByAuto Delta Lake Whether to enable automatic liquid clustering, where Azure Databricks selects clustering columns based on query patterns. Only valid with mode("overwrite"). Cannot be used with append mode. Default: false. Available in Databricks Runtime 16.4 and above. Applies to Use liquid clustering for tables.
compression CSV, JSON, ORC, Parquet, Text, XML Compression codec to use when writing. Valid values vary by format. Applies to csv (DataFrameWriter), json (DataFrameWriter), orc (DataFrameWriter), parquet (DataFrameWriter), text (DataFrameWriter), xml (DataFrameWriter).
dateFormat CSV, JSON, XML Format string for date column values. Default: yyyy-MM-dd. Applies to csv (DataFrameWriter), json (DataFrameWriter), xml (DataFrameWriter).
declaration XML The XML declaration string written at the top of each output file. Set to an empty string to suppress the declaration. Default: version="1.0" encoding="UTF-8" standalone="yes". Applies to xml (DataFrameWriter).
emptyValue CSV The string written for empty (non-null) values. Default: "". Applies to csv (DataFrameWriter).
encoding CSV, JSON, XML The character encoding for the output files. Default: UTF-8. Applies to csv (DataFrameWriter), json (DataFrameWriter), xml (DataFrameWriter).
escape CSV The character used to escape quoted values. Default: \. Applies to csv (DataFrameWriter).
escapeQuotes CSV Whether to escape quote characters inside quoted field values. Default: true. Applies to csv (DataFrameWriter).
header CSV Whether to write column names as the first line of the output. Default: false. Applies to csv (DataFrameWriter).
ignoreLeadingWhiteSpace CSV Whether to trim leading whitespace from values when writing. Default: false. Applies to csv (DataFrameWriter).
ignoreNullFields JSON Whether to omit fields with null values from the JSON output. Default: value of spark.sql.jsonGenerator.ignoreNullFields. Applies to json (DataFrameWriter).
ignoreTrailingWhiteSpace CSV Whether to trim trailing whitespace from values when writing. Default: false. Applies to csv (DataFrameWriter).
lineSep CSV, JSON, Text The line separator string used between records. Default: \n. Applies to csv (DataFrameWriter), json (DataFrameWriter), text (DataFrameWriter).
mergeSchema Delta Lake Whether to enable schema evolution for the write operation. New columns in the source DataFrame are added to the target table schema. Applies to batch and streaming appends. Applies to Update table schema.
nullValue CSV String written for null values. Default: "". Applies to csv (DataFrameWriter).
nullValue XML The string written for null values. Default: null. When set to null, attributes and child elements for null fields are omitted. Applies to xml (DataFrameWriter).
overwriteSchema Delta Lake Whether to replace the table schema and partitioning when overwriting. Requires mode("overwrite") without replaceWhere. Cannot be used with partitionOverwriteMode. Applies to Update table schema.
partitionOverwriteMode Delta Lake The partition overwrite mode. Set this to dynamic to overwrite only partitions containing new data, leaving all other partitions unchanged. Legacy mode; not supported on serverless compute or Databricks SQL. Applies to Selectively overwrite data with Delta Lake.
quote CSV The character used to quote field values that contain the separator. Default: ". Applies to csv (DataFrameWriter).
quoteAll CSV Whether to enclose all field values in quotes regardless of content. Default: false. Applies to csv (DataFrameWriter).
recordName Avro The top-level record name in the output Avro schema. Default: topLevelRecord. Applies to Avro file.
recordNamespace Avro The namespace for the top-level record in the output Avro schema. Default: "". Applies to Avro file.
replaceWhere Delta Lake A predicate expression. Atomically overwrites only the records that match the predicate. Applies to Selectively overwrite data with Delta Lake.
rootTag XML The root element tag that wraps all row elements in the output. Default: ROWS. Applies to xml (DataFrameWriter).
rowTag XML The element tag that represents a row in the output. Default: ROW. Applies to xml (DataFrameWriter).
sep CSV The field delimiter character. Default: ,. Applies to csv (DataFrameWriter).
timestampFormat CSV, JSON, XML The format string for timestamp column values. Default: yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]. Applies to csv (DataFrameWriter), json (DataFrameWriter), xml (DataFrameWriter).
txnAppId Delta Lake A unique string identifying the application for idempotent writes in foreachBatch operations. Use together with txnVersion to ensure exactly-once writes to multiple Delta Lake tables. Applies to Use foreachBatch for idempotent table writes.
txnVersion Delta Lake A monotonically increasing number used as the transaction version for idempotent writes in foreachBatch operations. Use together with txnAppId to ensure exactly-once writes to multiple Delta Lake tables. Applies to Use foreachBatch for idempotent table writes.
userMetadata Delta Lake, Apache Iceberg A user-defined string appended to the commit metadata for the write operation. Visible in the output of DESCRIBE HISTORY. Applies to Enrich tables with custom metadata.
validateName XML Whether to throw an exception if a column name is not a valid XML element identifier. Default: true. Applies to xml (DataFrameWriter).
valueTag XML The field name used for character data in XML elements that also have attributes or child elements. Default: _VALUE. Applies to xml (DataFrameWriter).

Returns

DataFrameWriter

Examples

Write a DataFrame into a CSV file with the nullValue option set.

import tempfile
with tempfile.TemporaryDirectory(prefix="option") as d:
    df = spark.createDataFrame([(100, None)], "age INT, name STRING")
    df.write.option("nullValue", "Alice").mode("overwrite").format("csv").save(d)

    spark.read.schema(df.schema).format('csv').load(d).show()
    # +---+------------+
    # |age|        name|
    # +---+------------+
    # |100|Alice|
    # +---+------------+