from_avro

Converts a binary column of Avro format into its corresponding catalyst value. The specified schema must match the read data, otherwise the behavior is undefined: it may fail or return an arbitrary result.

If jsonFormatSchema is not provided but both subject and schemaRegistryAddress are provided, the function converts a binary column of Schema Registry Avro format into its corresponding catalyst value.

Syntax

from pyspark.sql.avro.functions import from_avro

from_avro(data, jsonFormatSchema=None, options=None, subject=None, schemaRegistryAddress=None)

Parameters

Parameter	Type	Description
`data`	`pyspark.sql.Column` or str	The binary column containing Avro-encoded data.
`jsonFormatSchema`	str, optional	The Avro schema in JSON string format.
`options`	dict, optional	Options to control how the Avro record is parsed and configuration for the schema registry client.
`subject`	str, optional	The subject in Schema Registry that the data belongs to.
`schemaRegistryAddress`	str, optional	The address (host and port) of the Schema Registry.

Returns

pyspark.sql.Column: A new column containing the deserialized Avro data as the corresponding catalyst value.

Examples

Example 1: Deserializing an Avro binary column using a JSON schema

from pyspark.sql import Row
from pyspark.sql.avro.functions import from_avro, to_avro

data = [(1, Row(age=2, name='Alice'))]
df = spark.createDataFrame(data, ("key", "value"))
avro_df = df.select(to_avro(df.value).alias("avro"))
json_format_schema = '''{"type":"record","name":"topLevelRecord","fields":
    [{"name":"avro","type":[{"type":"record","name":"value",
    "namespace":"topLevelRecord","fields":[{"name":"age","type":["long","null"]},
    {"name":"name","type":["string","null"]}]},"null"]}]}'''
avro_df.select(from_avro(avro_df.avro, json_format_schema).alias("value")).show(truncate=False)

+------------------+
|value             |
+------------------+
|{{2, Alice}}      |
+------------------+

Tilbakemeldinger

Var denne siden nyttig?

Last updated on 2026-04-17