简体   繁体   中英

Spark Dataframe-Python

In pandas, I can successfully run the following:

def car(t)
    if t in df_a:
       return df_a[t]/df_b[t]
    else:
       return 0

But how can I do the exact same thing with spark dataframe?Many thanks!
The data is like this

df_a
a 20
b 40
c 60

df_b
a 80
b 50
e 100

The result should be 0.25 when input car(a)

First you have to join both dataframes, then you have to filter by the letter you want and select the operation you need.

df_a = sc.parallelize([("a", 20), ("b", 40), ("c", 60)]).toDF(["key", "value"])
df_b = sc.parallelize([("a", 80), ("b", 50), ("e", 100)]).toDF(["key", "value"])

def car(c):
  return df_a.join(df_b, on=["key"]).where(df_a["key"] == c).select((df_a["value"] / df_b["value"]).alias("ratio")).head()

car("a")

# Row(ratio=0.25)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM