[英]How to combine rows and put into single row in dataframe by sql or python
我想基于与其他列的关系来聚合某些列中的行,并创建某些包含json格式的聚合数据的列。
这就是例子。
原始数据表
Child Name Child Age Father Name Father Age
Peter 5 Richard 40
James 15 Doug 45
Liz 2 Doug 45
Paul 6 Richard 40
Shirly 11 Charles 33
Eva 9 Chris 29
转换后的数据表将是
Father Name Father Age Children
Richard 40 {"Peter":"5", "Paul":"6"}
Doug 45 {"James":"15","Liz":"2"}
Charles 33 {"Shirly" : "11"}
Chris 29 {"Eva" : "9"}
要么
Father Name Father Age Children Name Children Age
Richard 40 {"Peter", "Paul"} {"5","6"}
Doug 45 {"James", "Liz"} {"15","2"}
Charles 33 {"Shirly"} {"11"}
Chris 29 {"Eva"} {"9"}
我的代码是
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
print df
g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1
和输出将是
Father Name Children Name
0 Charles Shirly
1 Chris Eva
2 Doug James, Liz
3 Richard Peter, Paul
我不知道如何在列中添加“父亲年龄”和“儿童年龄”。 如何以最有效的方式在数据框中转换此内容? 我想避免通过python循环,因为这将需要很长时间来处理。
谢谢,
快速肮脏的低效黑客,但它避免了循环。 希望有更好的解决方案; 我假设可以简化多个df副本和多个合并。
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()
df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)
输出:
Father Name Child Name Father Age Child Age
0 Charles [Shirly] 33 [11]
1 Chris [Eva] 29 [9]
2 Doug [James, Liz] 45 [15, 2]
3 Richard [Peter, Paul] 40 [5, 6]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.