Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Computes a pair-wise frequency table of the given columns, also known as a contingency table. The first column of each row contains the distinct values of col1, and the column names are the distinct values of col2. The name of the first column is $col1_$col2. Pairs with no occurrences have a count of zero. DataFrame.crosstab and DataFrameStatFunctions.crosstab are aliases of each other.
Syntax
crosstab(col1, col2)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
str | The name of the first column. Distinct items make up the first column of each row. |
col2 |
str | The name of the second column. Distinct items make up the column names of the resulting DataFrame. |
Returns
DataFrame
Examples
df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
df.stat.crosstab("c1", "c2").sort("c1_c2").show()
# +-----+---+---+---+
# |c1_c2| 10| 11| 8|
# +-----+---+---+---+
# | 1| 0| 2| 0|
# | 3| 1| 0| 0|
# | 4| 0| 0| 2|
# +-----+---+---+---+