简体   繁体   English

如何在 Pyspark 中处理包含 SQL 逻辑的 Table.column

[英]How to process a Table.column which contains a SQL logic in Pyspark

Eg:例如:

Table - MappingTable表 - 映射表

Col1 Col1 Col2 Col2 MappingLogic映射逻辑
One Two SELECT * FROM TableX SELECT * 来自 TableX
One Two SELECT * FROM TableX X Left Outer Join TableY Y on X.id=Y.ID SELECT * FROM TableX X Left Outer Join TableY Y on X.id=Y.ID

Other Tables - TableX and TableY其他表 - TableX 和 TableY

How Can I use this mapping table in Pyspark dataframe and build my logic using MappingLogic column??如何在 Pyspark dataframe 中使用此映射表并使用 MappingLogic 列构建我的逻辑?

Not sure what kind of answer are you expecting, but in general you can use sql expressions in your pyspark code.不确定您期待什么样的答案,但通常您可以在pyspark代码中使用 sql 表达式。 You just have to create views on your tables first:你只需要先在你的表上创建视图:

spark.read \
    .jdbc("jdbc:postgresql:dbserver", "tableX",
          properties={"user": "username", "password": "password"}).createOrReplaceTempView("tableX")

# Later you get sql-expression from your mapping logic table and execute it:
s = "SELECT * FROM TableX"
df = spark.sql(s)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM