简体   繁体   中英

Rewriting the sql query with dataframe; how to mix different sources from select

I need to rewrite sql query with several joins to python using pandas and dataframes. I have no problem making joins, but it gets complicated when I want to refer to data that is not strictly from join.

SQL to rewrite:

SELECT 
    table1.id,
    field1,
    field2,
    count(DISTINCT x.apple, x.grape) AS fruits,
    min(x.time) min_value
FROM table1
    JOIN table2 x using(id)
    LEFT JOIN table3 using(id)
GROUP BY 1,2,3

My current code:

mydf, df2, df3 = ...
mydf.merge(df2, on=['id'], how='inner')
mydf.merge(df3, on['id'], how='left')
mydf = mydf[['id', 'field1', 'field2']] # problem, missing fruits, min_value and id should be table1.id
mydf.groupby('id', 'field1', 'field2')

I know how to join them, but I don't know how to then create a dataframe that will contain the required elements from the select like eg table1.id or min(x.time) min_value

Tables

Table1 columns: [id, field1, field2, field1_2, field1_3]
Table2 columns: [id, field1, field2, apple, grape, time, field2_1, field2_2]
Table3 columns: [id, field1, field2, field3_1, field3_2]

EDIT : I added sample tables, corrected the error in line 4 of my code, and added a summary.

try this:

    mydf = mydf[['id', 'table1', 'table2']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM