Del via


json (DataFrameReader)

Loads JSON files and returns the results as a DataFrame. JSON Lines (newline-delimited JSON) is supported by default. For JSON with one record per file, set the multiLine option to True.

If schema is not specified, this function reads the input once to determine the input schema.

Syntax

json(path, schema=None, **options)

Parameters

Parameter Type Description
path str, list, or RDD A path to the JSON dataset, a list of paths, or an RDD of strings storing JSON objects.
schema StructType or str, optional An optional input schema as a StructType object or a DDL-formatted string (for example, 'col0 INT, col1 DOUBLE').

Returns

DataFrame

Examples

Write a DataFrame into a JSON file and read it back.

import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
    spark.createDataFrame(
        [{"age": 100, "name": "Hyukjin"}]
    ).write.mode("overwrite").format("json").save(d)

    spark.read.json(d).show()
    # +---+-------+
    # |age|   name|
    # +---+-------+
    # |100|Hyukjin|
    # +---+-------+

Read JSON from multiple directories.

from tempfile import TemporaryDirectory
with TemporaryDirectory(prefix="json2") as d1, TemporaryDirectory(prefix="json3") as d2:
    spark.createDataFrame(
        [{"age": 30, "name": "Bob"}]
    ).write.mode("overwrite").format("json").save(d1)
    spark.createDataFrame(
        [{"age": 25, "name": "Alice"}]
    ).write.mode("overwrite").format("json").save(d2)

    spark.read.json([d1, d2]).show()
    # +---+-----+
    # |age| name|
    # +---+-----+
    # | 25|Alice|
    # | 30|  Bob|
    # +---+-----+

Read JSON with a custom schema.

import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
    spark.createDataFrame(
       [{"age": 30, "name": "Bob"}]
    ).write.mode("overwrite").format("json").save(d)
    spark.read.json(d, schema="name STRING, age INT").show()
    # +----+---+
    # |name|age|
    # +----+---+
    # | Bob| 30|
    # +----+---+