简体   繁体   中英

Python merge values from column-row (cell type is list)

I am using python to aggregate data as a test. For every columnA value, I want to have one row with the values from the columnB and columnC. Which works fine after working on that and also getting suggestions from stackoverflow:

df = pd.DataFrame({'columnA':[1111,1111,2222,3333,4444,4444,5555,6666],
                   'columnB':['AAAA','AAAA','BBBB','BBBB','CCCC','CCCC','BBBB','CCCC'],
                   'columnC':['one','two','one','one','one','one','two','one'],
                   'NUM1':[1,3,5,7,1,0,4,5],
                   'NUM2':[5,3,6,9,2,4,1,1],
                   'W':list('aaabbbbb')})

d = {'columnB':lambda x: x.tolist(), 'columnC':lambda x: x.tolist() }
df1 = df.groupby('columnA').agg(d)
print(df1)

在此输入图像描述

What I am trying to do now, is to merge the values within every cell, if the list shows two identical values as shown in the green arrow in my image.

So, I tried this:

d = {'columnB':lambda x: set(x.tolist()), 'columnC':lambda x: x.tolist() }
df1 = df.groupby('columnA').agg(d)
print(df1)

but I am not sure about the format of the column values. I am thinking of converting the column into a list again:

d = {'columnB':lambda x: list(set(x.tolist())), 'columnC':lambda x: x.tolist() }
df1 = df.groupby('columnA').agg(d)
print(df1)

Do you believe that this is a good practice? I am trying to learn more on the aggregation techniques.

what I will do unique

d = {'columnB':'unique', 'columnC':'unique' }
df1 = df.groupby('columnA').agg(d)
df1
Out[573]: 
        columnB     columnC
columnA                    
1111     [AAAA]  [one, two]
2222     [BBBB]       [one]
3333     [BBBB]       [one]
4444     [CCCC]       [one]
5555     [BBBB]       [two]
6666     [CCCC]       [one]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM