简体   繁体   English

如何通过sql或python将行合并到dataframe中的单行中

[英]How to combine rows and put into single row in dataframe by sql or python

I'd like to aggregate rows in certain column base on the relationship with other column and create certain column which contain aggregated data in json format. 我想基于与其他列的关系来聚合某些列中的行,并创建某些包含json格式的聚合数据的列。

This is the example. 这就是例子。

Original data table 原始数据表

Child Name     Child Age    Father Name    Father Age
     Peter             5        Richard            40
     James            15           Doug            45
       Liz             2           Doug            45
      Paul             6        Richard            40
    Shirly            11        Charles            33
       Eva             9          Chris            29

Converted Data table will be either 转换后的数据表将是

Father Name    Father Age     Children 
    Richard            40     {"Peter":"5", "Paul":"6"}
       Doug            45     {"James":"15","Liz":"2"}
    Charles            33     {"Shirly" : "11"}
      Chris            29     {"Eva" : "9"}

Or 要么

Father Name    Father Age     Children Name       Children Age
    Richard            40     {"Peter", "Paul"}      {"5","6"}
       Doug            45     {"James", "Liz"}      {"15","2"}
    Charles            33     {"Shirly"}                {"11"}
      Chris            29     {"Eva"}                    {"9"}

My code is 我的代码是

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

 print df

g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1

and the output will be 和输出将是

  Father Name   Children Name
0     Charles          Shirly
1       Chris             Eva
2        Doug      James, Liz
3     Richard     Peter, Paul

I can't figure out how to add "Father Age" and "Children Age" in the columns. 我不知道如何在列中添加“父亲年龄”和“儿童年龄”。 how can I convert this in dataframe in most efficient way? 如何以最有效的方式在数据框中转换此内容? I'd like to avoid loop via python as it will take long to process. 我想避免通过python循环,因为这将需要很长时间来处理。

thanks, 谢谢,

Quick dirty inefficient hack, but it avoids for loops. 快速肮脏的低效黑客,但它避免了循环。 Would love to have a better solution; 希望有更好的解决方案; I assume the multiple df copies and multiple merges could be simplified. 我假设可以简化多个df副本和多个合并。

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()

df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)

Output: 输出:

  Father Name     Child Name Father Age Child Age
0     Charles       [Shirly]         33      [11]
1       Chris          [Eva]         29       [9]
2        Doug   [James, Liz]         45   [15, 2]
3     Richard  [Peter, Paul]         40    [5, 6]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM