Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Loads Parquet files and returns the result as a DataFrame.
Syntax
parquet(*paths, **options)
Parameters
| Parameter | Type | Description |
|---|---|---|
*paths |
str | One or more file paths to read the Parquet files from. |
Returns
DataFrame
Examples
Write a DataFrame into a Parquet file and read it back.
import tempfile
df = spark.createDataFrame(
[(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])
with tempfile.TemporaryDirectory(prefix="parquet") as d:
df.write.mode("overwrite").format("parquet").save(d)
spark.read.parquet(d).orderBy("name").show()
# +---+-----+
# |age| name|
# +---+-----+
# | 10|Alice|
# | 15| Bob|
# | 20| Tom|
# +---+-----+
Read multiple Parquet files and merge schemas.
import tempfile
df = spark.createDataFrame(
[(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])
df2 = spark.createDataFrame([(70, "Alice"), (80, "Bob")], schema=["height", "name"])
with tempfile.TemporaryDirectory(prefix="parquet1") as d1:
with tempfile.TemporaryDirectory(prefix="parquet2") as d2:
df.write.mode("overwrite").format("parquet").save(d1)
df2.write.mode("overwrite").format("parquet").save(d2)
spark.read.option(
"mergeSchema", "true"
).parquet(d1, d2).select(
"name", "age", "height"
).orderBy("name", "age").show()
# +-----+----+------+
# | name| age|height|
# +-----+----+------+
# |Alice|NULL| 70|
# |Alice| 10| NULL|
# | Bob|NULL| 80|
# | Bob| 15| NULL|
# | Tom| 20| NULL|
# +-----+----+------+