繁体   English   中英

如何通过sql或python将行合并到dataframe中的单行中

[英]How to combine rows and put into single row in dataframe by sql or python

我想基于与其他列的关系来聚合某些列中的行,并创建某些包含json格式的聚合数据的列。

这就是例子。

原始数据表

Child Name     Child Age    Father Name    Father Age
     Peter             5        Richard            40
     James            15           Doug            45
       Liz             2           Doug            45
      Paul             6        Richard            40
    Shirly            11        Charles            33
       Eva             9          Chris            29

转换后的数据表将是

Father Name    Father Age     Children 
    Richard            40     {"Peter":"5", "Paul":"6"}
       Doug            45     {"James":"15","Liz":"2"}
    Charles            33     {"Shirly" : "11"}
      Chris            29     {"Eva" : "9"}

要么

Father Name    Father Age     Children Name       Children Age
    Richard            40     {"Peter", "Paul"}      {"5","6"}
       Doug            45     {"James", "Liz"}      {"15","2"}
    Charles            33     {"Shirly"}                {"11"}
      Chris            29     {"Eva"}                    {"9"}

我的代码是

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

 print df

g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1

和输出将是

  Father Name   Children Name
0     Charles          Shirly
1       Chris             Eva
2        Doug      James, Liz
3     Richard     Peter, Paul

我不知道如何在列中添加“父亲年龄”和“儿童年龄”。 如何以最有效的方式在数据框中转换此内容? 我想避免通过python循环,因为这将需要很长时间来处理。

谢谢,

快速肮脏的低效黑客,但它避免了循环。 希望有更好的解决方案; 我假设可以简化多个df副本和多个合并。

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()

df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)

输出:

  Father Name     Child Name Father Age Child Age
0     Charles       [Shirly]         33      [11]
1       Chris          [Eva]         29       [9]
2        Doug   [James, Liz]         45   [15, 2]
3     Richard  [Peter, Paul]         40    [5, 6]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM