简体   繁体   English

spark-shell scala 到 map dataframe 中的汇率值

[英]spark-shell scala to map the exchange rate value in dataframe

I have a dataframe df我有一个 dataframe df

+-----+--------+----------+-----+
|count|currency|      date|value|
+-----+--------+----------+-----+
|    3|     GBP|2021-01-14|    4|
|  102|     USD|2021-01-14|    3|
|  234|     EUR|2021-01-14|    5|
|   28|     GBP|2021-01-16|    5|
|   48|     USD|2021-01-16|    7|
|   68|     EUR|2021-01-15|    6|
|   20|     GBP|2021-01-15|    1|
|   33|     EUR|2021-01-17|    2|
|  106|     GBP|2021-01-17|   10|
+-----+--------+----------+-----+

I have a separate dataframe for USD exchange_rate我有一个单独的 dataframe 用于美元汇率

val exchange_rate = spark.read.format("csv").load("/Users/khan/data/exchange_rate.csv")
exchange_rate.show()

INSERTTIME  EXCAHNGERATE    CURRENCY
2021-01-14  0.731422        GBP
2021-01-14  0.784125        EUR
2021-01-15  0.701922        GBP
2021-01-15  0.731422        EUR
2021-01-16  0.851422        GBP
2021-01-16  0.721128        EUR
2021-01-17  0.771621        GBP
2021-01-17  0.751426        EUR

I want to convert the GBP and EUR currency to USD in df by looking in the exchange_rate dataframe corresponding to the date我想通过查看与日期对应的 exchange_rate dataframe 将英镑和欧元货币转换为df中的美元

My Idea我的想法

import com.currency_converter.CurrencyConverter from http://xavierguihot.com/currency_converter/#com.currency_converter.CurrencyConverter$

is there a simpler way to do it?有更简单的方法吗?

You can join both DataFrames and operate by row您可以加入两个DataFrames并按行操作

val dfJoin = df1.join(df2, df1.col("date") === df2.col("INSERTTIME") &&
  df1.col("currency") === df2.col("CURRENCY"),"left")#currecy should be changed to currency

dfJoin.withColumn("USD", col("value") * col("EXCAHNGERATE")).show()

You can use a correlated subquery (a fancy way of doing joins):您可以使用相关子查询(一种奇特的连接方式):

df.createOrReplaceTempView("df1")
exchange_rate.createOrReplaceTempView("df2")

val result = spark.sql("""
select count, 'USD' as currency, date, value,
    value * coalesce(
        (select min(df2.EXCAHNGERATE)
         from df2
         where df1.date = df2.INSERTTIME and df1.currency = df2.CURRENCY),
        1  -- use 1 as exchange rate if no exchange rate found
    ) as converted
from df1
""")

result.show
+-----+--------+----------+-----+---------+
|count|currency|      date|value|converted|
+-----+--------+----------+-----+---------+
|    3|     USD|2021-01-14|    4| 2.925688|
|  102|     USD|2021-01-14|    3|      3.0|
|  234|     USD|2021-01-14|    5| 3.920625|
|   28|     USD|2021-01-16|    5|  4.25711|
|   48|     USD|2021-01-16|    7|      7.0|
|   68|     USD|2021-01-15|    6| 4.388532|
|   20|     USD|2021-01-15|    1| 0.701922|
|   33|     USD|2021-01-17|    2| 1.502852|
|  106|     USD|2021-01-17|   10|  7.71621|
+-----+--------+----------+-----+---------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM