In pandas, I can successfully run the following:
def car(t)
if t in df_a:
return df_a[t]/df_b[t]
else:
return 0
But how can I do the exact same thing with spark dataframe?Many thanks!
The data is like this
df_a
a 20
b 40
c 60
df_b
a 80
b 50
e 100
The result should be 0.25 when input car(a)
First you have to join
both dataframes, then you have to filter
by the letter you want and select
the operation you need.
df_a = sc.parallelize([("a", 20), ("b", 40), ("c", 60)]).toDF(["key", "value"])
df_b = sc.parallelize([("a", 80), ("b", 50), ("e", 100)]).toDF(["key", "value"])
def car(c):
return df_a.join(df_b, on=["key"]).where(df_a["key"] == c).select((df_a["value"] / df_b["value"]).alias("ratio")).head()
car("a")
# Row(ratio=0.25)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.