简体   繁体   中英

Python SQL to pandas DataFrame 2

pd.read_sql_query("""SELECT Tab1.Title, NewTab.NewCol1 FROM
                            (SELECT Col1 AS NewCol, COUNT(*) AS NewCol1
                            FROM Tab2 GROUP BY Col1) AS NewTab
                     JOIN Tab1 ON NewTab.NewCol=Tab1.Id
                     WHERE Tab1.Num=1
                     ORDER BY NewCol1 DESC""", conn)

My goal is to rewrite it using only pandas' methods and functions. First things first, I'd like to assign a new column NewCol that would contain also a new column PostId , but I highly doubt that I should do it in two steps. Could anyone please guide me towards solution or provide a full code I could analyze?

Would you like to rewrite this query in pandas in only one line? It might be done but it's highly unreadable. Something like this looks much neater

NewTab = Tab2.groupby('Col1').size().reset_index(name = 'NewCol1').rename(columns = {'Col1': 'NewCol'})

And now you can merge those two tables:

result_df = pd.merge(NewTab, Tab1, left_on = 'NewCol', right_on = 'Id')[result_df.Num == 1]

You can now sort the data frame after merging and specify the columns:

result_df.sort_values(by=['NewCol1'], inplace = True)
result_df = result_df[['Title','NewCol1']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM