Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Creates a DataFrame from an RDD, a list, a pandas.DataFrame, a numpy.ndarray, or a pyarrow.Table.
Syntax
createDataFrame(data, schema=None, samplingRatio=None, verifySchema=True)
Parameters
| Parameter | Type | Description |
|---|---|---|
data |
RDD or iterable | An RDD of any kind of SQL data representation (Row, tuple, int, bool, dict, etc.), or a list, pandas.DataFrame, numpy.ndarray, or pyarrow.Table. |
schema |
DataType, str, or list, optional | A DataType, a datatype string, or a list of column names. When a list of column names is provided, the type of each column is inferred from data. When None, schema is inferred from data (requires Row, namedtuple, or dict). When a DataType or datatype string is provided, it must match the actual data. |
samplingRatio |
float, optional | The sample ratio of rows used for schema inference when data is an RDD. If None, the first few rows are used. |
verifySchema |
bool, optional | Verify data types of every row against the schema. Enabled by default. Not supported with pyarrow.Table input or Arrow-enabled pandas conversion. |
Returns
DataFrame
Notes
Usage with spark.sql.execution.arrow.pyspark.enabled=True is experimental.
Examples
# Create a DataFrame from a list of tuples.
spark.createDataFrame([('Alice', 1)]).show()
# +-----+---+
# | _1| _2|
# +-----+---+
# |Alice| 1|
# +-----+---+
# Create a DataFrame from a list of dictionaries.
spark.createDataFrame([{'name': 'Alice', 'age': 1}]).show()
# +---+-----+
# |age| name|
# +---+-----+
# | 1|Alice|
# +---+-----+
# Create a DataFrame with column names specified.
spark.createDataFrame([('Alice', 1)], ['name', 'age']).show()
# +-----+---+
# | name|age|
# +-----+---+
# |Alice| 1|
# +-----+---+
# Create a DataFrame with an explicit schema.
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
schema = StructType([
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)])
spark.createDataFrame([('Alice', 1)], schema).show()
# +-----+---+
# | name|age|
# +-----+---+
# |Alice| 1|
# +-----+---+
# Create a DataFrame with a DDL-formatted schema string.
spark.createDataFrame([('Alice', 1)], "name: string, age: int").show()
# +-----+---+
# | name|age|
# +-----+---+
# |Alice| 1|
# +-----+---+
# Create an empty DataFrame (schema is required when data is empty).
spark.createDataFrame([], "name: string, age: int").show()
# +----+---+
# |name|age|
# +----+---+
# +----+---+
# Create a DataFrame from Row objects.
from pyspark.sql import Row
Person = Row('name', 'age')
spark.createDataFrame([Person("Alice", 1)]).show()
# +-----+---+
# | name|age|
# +-----+---+
# |Alice| 1|
# +-----+---+