[英]How to combine rows and put into single row in dataframe by sql or python
我想基於與其他列的關系來聚合某些列中的行,並創建某些包含json格式的聚合數據的列。
這就是例子。
原始數據表
Child Name Child Age Father Name Father Age
Peter 5 Richard 40
James 15 Doug 45
Liz 2 Doug 45
Paul 6 Richard 40
Shirly 11 Charles 33
Eva 9 Chris 29
轉換后的數據表將是
Father Name Father Age Children
Richard 40 {"Peter":"5", "Paul":"6"}
Doug 45 {"James":"15","Liz":"2"}
Charles 33 {"Shirly" : "11"}
Chris 29 {"Eva" : "9"}
要么
Father Name Father Age Children Name Children Age
Richard 40 {"Peter", "Paul"} {"5","6"}
Doug 45 {"James", "Liz"} {"15","2"}
Charles 33 {"Shirly"} {"11"}
Chris 29 {"Eva"} {"9"}
我的代碼是
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
print df
g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1
和輸出將是
Father Name Children Name
0 Charles Shirly
1 Chris Eva
2 Doug James, Liz
3 Richard Peter, Paul
我不知道如何在列中添加“父親年齡”和“兒童年齡”。 如何以最有效的方式在數據框中轉換此內容? 我想避免通過python循環,因為這將需要很長時間來處理。
謝謝,
快速骯臟的低效黑客,但它避免了循環。 希望有更好的解決方案; 我假設可以簡化多個df副本和多個合並。
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()
df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)
輸出:
Father Name Child Name Father Age Child Age
0 Charles [Shirly] 33 [11]
1 Chris [Eva] 29 [9]
2 Doug [James, Liz] 45 [15, 2]
3 Richard [Peter, Paul] 40 [5, 6]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.