[英]How to combine rows and put into single row in dataframe by sql or python
I'd like to aggregate rows in certain column base on the relationship with other column and create certain column which contain aggregated data in json format. 我想基于与其他列的关系来聚合某些列中的行,并创建某些包含json格式的聚合数据的列。
This is the example. 这就是例子。
Original data table 原始数据表
Child Name Child Age Father Name Father Age
Peter 5 Richard 40
James 15 Doug 45
Liz 2 Doug 45
Paul 6 Richard 40
Shirly 11 Charles 33
Eva 9 Chris 29
Converted Data table will be either 转换后的数据表将是
Father Name Father Age Children
Richard 40 {"Peter":"5", "Paul":"6"}
Doug 45 {"James":"15","Liz":"2"}
Charles 33 {"Shirly" : "11"}
Chris 29 {"Eva" : "9"}
Or 要么
Father Name Father Age Children Name Children Age
Richard 40 {"Peter", "Paul"} {"5","6"}
Doug 45 {"James", "Liz"} {"15","2"}
Charles 33 {"Shirly"} {"11"}
Chris 29 {"Eva"} {"9"}
My code is 我的代码是
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
print df
g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1
and the output will be 和输出将是
Father Name Children Name
0 Charles Shirly
1 Chris Eva
2 Doug James, Liz
3 Richard Peter, Paul
I can't figure out how to add "Father Age" and "Children Age" in the columns. 我不知道如何在列中添加“父亲年龄”和“儿童年龄”。 how can I convert this in dataframe in most efficient way? 如何以最有效的方式在数据框中转换此内容? I'd like to avoid loop via python as it will take long to process. 我想避免通过python循环,因为这将需要很长时间来处理。
thanks, 谢谢,
Quick dirty inefficient hack, but it avoids for loops. 快速肮脏的低效黑客,但它避免了循环。 Would love to have a better solution; 希望有更好的解决方案; I assume the multiple df copies and multiple merges could be simplified. 我假设可以简化多个df副本和多个合并。
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()
df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)
Output: 输出:
Father Name Child Name Father Age Child Age
0 Charles [Shirly] 33 [11]
1 Chris [Eva] 29 [9]
2 Doug [James, Liz] 45 [15, 2]
3 Richard [Peter, Paul] 40 [5, 6]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.