[英]Rewriting the sql query with dataframe; how to mix different sources from select
I need to rewrite sql query with several joins to python using pandas and dataframes.我需要使用 pandas 和数据帧来重写 sql 查询,并通过几个连接到 python 。 I have no problem making joins, but it gets complicated when I want to refer to data that is not strictly from join.
我可以毫无问题地进行连接,但是当我想引用并非严格来自连接的数据时,它会变得复杂。
SQL to rewrite: SQL 重写:
SELECT
table1.id,
field1,
field2,
count(DISTINCT x.apple, x.grape) AS fruits,
min(x.time) min_value
FROM table1
JOIN table2 x using(id)
LEFT JOIN table3 using(id)
GROUP BY 1,2,3
My current code:我当前的代码:
mydf, df2, df3 = ...
mydf.merge(df2, on=['id'], how='inner')
mydf.merge(df3, on['id'], how='left')
mydf = mydf[['id', 'field1', 'field2']] # problem, missing fruits, min_value and id should be table1.id
mydf.groupby('id', 'field1', 'field2')
I know how to join them, but I don't know how to then create a dataframe that will contain the required elements from the select like eg table1.id
or min(x.time) min_value
我知道如何加入它们,但我不知道如何创建一个 dataframe ,它将包含 select 中所需的元素,例如
table1.id
或min(x.time) min_value
Tables表
Table1 columns: [id, field1, field2, field1_2, field1_3]
Table2 columns: [id, field1, field2, apple, grape, time, field2_1, field2_2]
Table3 columns: [id, field1, field2, field3_1, field3_2]
EDIT : I added sample tables, corrected the error in line 4 of my code, and added a summary.编辑:我添加了示例表,更正了代码第 4 行中的错误,并添加了摘要。
try this:尝试这个:
mydf = mydf[['id', 'table1', 'table2']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.