简体   繁体   English

Python字典键值进入Pyspark中的dataframe where子句

[英]Python dictionary key value into dataframe where clause in Pyspark

How can I pass a Python dictionary key value into dataframe where clause in Pyspark ... 如何将Python字典键值传递给Pyspark中的dataframe where子句...

Python dictionary as below ... Python字典如下......

column_dict= { 'email': 'customer_email_addr' ,
               'addr_bill': 'crq_st_addr' ,
               'addr_ship': 'ship_to_addr' ,
               'zip_bill': 'crq_zip_cd' ,
               'zip_ship':  'ship_to_zip' ,
               'phone_bill': 'crq_cm_phone' ,
               'phone_ship' : 'ship_to_phone'}

I've a spark dataframe with around 3 billion records. 我有一个大约有30亿条记录的火花数据框。 Dataframe as follows ... 数据帧如下......

source_sql= ("select cust_id, customer_email_addr, crq_st_addr, ship_to_addr,
 crq_zip_cd,ship_to_zip,crq_cm_phone,ship_to_phone from odl.cust_master  where
 trans_dt >= '{}' and trans_dt <= '{}' ").format('2017-11-01','2018-10-31')

cust_id_m = hiveCtx.sql(source_sql)
cust_id.cache()

My intention to find out distinct valid customer's for Email, Addr, Zip and Phone and run in loop for above dictionary keys. 我打算找出电子邮件,地址,邮编和电话的不同有效客户,并在上面的字典键中循环运行。 For this when I test spark shell for one key value as below ... 为此,当我测试火花壳的一个键值如下...

>>> cust_id_risk_m=cust_id_m.selectExpr("cust_id").where( 
("cust_id_m.'{}'").format(column_dict['email'])  != ''  ).distinct()

I'm getting error ... Need experts assistance in resolving this. 我收到错误......需要专家帮助解决这个问题。

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/mapr/spark/spark-2.1.0/python/pyspark/sql/dataframe.py", line 1026, in filter
    raise TypeError("condition should be string or Column")
TypeError: condition should be string or Column

Can you try using get method on your dictionary? 你能尝试在字典上使用get方法吗? I have tested this with below dataframe as: 我用以下数据框测试了这个:

df =spark.sql("select emp_id, emp_name, emp_city,emp_salary from udb.emp_table  where emp_joining_date >= '{}' ".format(2018-12-05))

>>> df.show(truncate=False)
+------+----------------------+--------+----------+
|emp_id|emp_name              |emp_city|emp_salary|
+------+----------------------+--------+----------+
|1     |VIKRANT SINGH RANA    |NOIDA   |10000     |
|3     |GOVIND NIMBHAL        |DWARKA  |92000     |
|2     |RAGHVENDRA KUMAR GUPTA|GURGAON |50000     |
+------+----------------------+--------+----------+

thedict={"CITY":"NOIDA"}

>>> newdf = df.selectExpr("emp_id").where("emp_city ='{}'".format(thedict.get('CITY'))).distinct()
>>> newdf.show();
+------+
|emp_id|
+------+
|     1|
+------+

or you can share your sample data for your dataframe? 或者您可以共享数据帧的示例数据?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对于字典中的每个键/值对,检查值是否为 pyspark Dataframe 类型 - For each key/value pair in a dictionary, check if the value is of type pyspark Dataframe python将数据框存储为与字典中的键相关联的值 - python storing a dataframe as value associated with key in dictionary python数据帧到字典,键值问题 - python dataframe to dictionary, key value issue 将标准 python 键值字典列表转换为 pyspark 数据框 - Convert a standard python key value dictionary list to pyspark data frame 将嵌套字典键值转换为 pyspark dataframe - Transform nested dictionary key values to pyspark dataframe Python-将数据框转换为字典,其中键为(行索引,列名) - Python - Convert Dataframe to Dictionary where key is (row index, column name) 如何将 pandas 数据框列与字典键与数据框索引匹配的字典值相乘 - How to multiply pandas dataframe columns with dictionary value where dictionary key matches dataframe index Python:检查字典中是否存在数据框单元格值。 如果存在,则用字典键替换数据框值 - Python: Check if dataframe cell value exists in Dictionary. If exists replace dataframe value with dictionary key Python字典理解:将值分配给键,其中值是一个列表 - Python dictionary comprehension: assign value to key, where value is a list 如何在Pyspark的Dictionary中转换Dataframe Column1:Column2(key:value)? - How can I convert Dataframe Column1:Column2 (key:value) in Dictionary in Pyspark?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM