I need to rewrite sql query with several joins to python using pandas and dataframes. I have no problem making joins, but it gets complicated when I want to refer to data that is not strictly from join.
SQL to rewrite:
SELECT
table1.id,
field1,
field2,
count(DISTINCT x.apple, x.grape) AS fruits,
min(x.time) min_value
FROM table1
JOIN table2 x using(id)
LEFT JOIN table3 using(id)
GROUP BY 1,2,3
My current code:
mydf, df2, df3 = ...
mydf.merge(df2, on=['id'], how='inner')
mydf.merge(df3, on['id'], how='left')
mydf = mydf[['id', 'field1', 'field2']] # problem, missing fruits, min_value and id should be table1.id
mydf.groupby('id', 'field1', 'field2')
I know how to join them, but I don't know how to then create a dataframe that will contain the required elements from the select like eg table1.id
or min(x.time) min_value
Tables
Table1 columns: [id, field1, field2, field1_2, field1_3]
Table2 columns: [id, field1, field2, apple, grape, time, field2_1, field2_2]
Table3 columns: [id, field1, field2, field3_1, field3_2]
EDIT : I added sample tables, corrected the error in line 4 of my code, and added a summary.
try this:
mydf = mydf[['id', 'table1', 'table2']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.