简体   繁体   English

Spark数据框-Python

[英]Spark Dataframe-Python

In pandas, I can successfully run the following: 在熊猫中,我可以成功运行以下命令:

def car(t)
    if t in df_a:
       return df_a[t]/df_b[t]
    else:
       return 0

But how can I do the exact same thing with spark dataframe?Many thanks! 但是我该如何使用Spark数据框执行完全相同的操作?非常感谢!
The data is like this 数据是这样的

df_a
a 20
b 40
c 60

df_b
a 80
b 50
e 100

The result should be 0.25 when input car(a) 输入车(a)时结果应为0.25

First you have to join both dataframes, then you have to filter by the letter you want and select the operation you need. 首先,您必须同时join两个数据框,然后必须按所需字母进行filter ,然后select所需的操作。

df_a = sc.parallelize([("a", 20), ("b", 40), ("c", 60)]).toDF(["key", "value"])
df_b = sc.parallelize([("a", 80), ("b", 50), ("e", 100)]).toDF(["key", "value"])

def car(c):
  return df_a.join(df_b, on=["key"]).where(df_a["key"] == c).select((df_a["value"] / df_b["value"]).alias("ratio")).head()

car("a")

# Row(ratio=0.25)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM