How to unwrap nested Struct column into multiple columns?
I’m trying to expand a DataFrame column with nested struct
type (see below)
to multiple columns. The Struct schema I’m working with looks something like
{"foo": 3, "bar": {"baz": 2}}
.
Ideally, I’d like to expand the above into two columns ("foo"
and
"bar.baz"
). However, when I tried using .select("data.*")
(where data
is
the Struct column), I only get columns foo
and bar
, where bar
is still a
struct
.
Is there a way such that I can expand the Struct for both layers?
-
You can select
data.bar.baz
asbar.baz
:df.show() +-------+ | data| +-------+ |[3,[2]]| +-------+ df.printSchema() root |-- data: struct (nullable = false) | |-- foo: long (nullable = true) | |-- bar: struct (nullable = false) | | |-- baz: long (nullable = true)
In pyspark:
import pyspark.sql.functions as F df.select(F.col("data.foo").alias("foo"), F.col("data.bar.baz").alias("bar.baz")).show() +---+-------+ |foo|bar.baz| +---+-------+ | 3| 2| +---+-------+