groupingSets

Maak multidimensionale aggregatie voor het huidige DataFrame met behulp van de opgegeven groeperingssets, zodat we aggregatie hierop kunnen uitvoeren.

Syntaxis

groupingSets(groupingSets: Sequence[Sequence["ColumnOrName"]], *cols: "ColumnOrName")

Parameterwaarden

Kenmerk	Typ	Beschrijving
`groupingSets`	reeks kolommen of str	Afzonderlijke set kolommen waarop u wilt groeperen.
`cols`	Kolom of str	Extra groeperingskolommen die zijn opgegeven door gebruikers. Deze kolommen worden weergegeven als de uitvoerkolommen na aggregatie.

Retouren

GroupedData: Sets van de gegevens groeperen op basis van de opgegeven kolommen.

Examples

from pyspark.sql import functions as sf
df = spark.createDataFrame([
    (100, 'Fremont', 'Honda Civic', 10),
    (100, 'Fremont', 'Honda Accord', 15),
    (100, 'Fremont', 'Honda CRV', 7),
    (200, 'Dublin', 'Honda Civic', 20),
    (200, 'Dublin', 'Honda Accord', 10),
    (200, 'Dublin', 'Honda CRV', 3),
    (300, 'San Jose', 'Honda Civic', 5),
    (300, 'San Jose', 'Honda Accord', 8)
], schema="id INT, city STRING, car_model STRING, quantity INT")

df.groupingSets(
    [("city", "car_model"), ("city",), ()],
    "city", "car_model"
).agg(sf.sum(sf.col("quantity")).alias("sum")).sort("city", "car_model").show()
# +--------+------------+---+
# |    city|   car_model|sum|
# +--------+------------+---+
# |    NULL|        NULL| 78|
# |  Dublin|        NULL| 33|
# |  Dublin|Honda Accord| 10|
# |  Dublin|   Honda CRV|  3|
# |  Dublin| Honda Civic| 20|
# | Fremont|        NULL| 32|
# | Fremont|Honda Accord| 15|
# | Fremont|   Honda CRV|  7|
# | Fremont| Honda Civic| 10|
# |San Jose|        NULL| 13|
# |San Jose|Honda Accord|  8|
# |San Jose| Honda Civic|  5|
# +--------+------------+---+

Feedback

Is deze pagina nuttig?

Last updated on 2026-04-19