简体   繁体   中英

Pandas Data-frame Merge rows on a column to form list of dictionary

I have a data-frame that looks like

DATA

*id*,             *name*,                      *URL*,                 *Type*  
    2,             birth_france_by_region,    http://abc. com,       T1 
    2,             birth_france_by_region,    http://pt. python,     T2 
    3,             long_lat,                  http://abc. com,       T3 
    3,             long_lat,                  http://pqur. com,      T1 
    4,             random_time_series,        http://sadsdc. com,    T2 
    4,             random_time_series,        http://sadcadf. com,   T3
    5,             birth_names,               http://google. com,    T1 
    5,             birth_names,               http://helloworld. com,T2 
    5,             birth_names,               http://hu. com,        T3

I want a this dataframe to merge the rows where id are equal and have a list of dictionary's Type as key of dictionary URL as value so final output this :-

*id*, *name*,             *URL*  
2,birth_france_by_region,  [{T1:http://abc .com},{T2:http://pt.python}] 
3,long_lat,           [{T3:http://abc .com},{T1:http://pqur. com}] 
4,random_time_series, [{T2:http://sadsdc. com},{T3:http://sadcadf .com}] 
5,birth_names,        [{T1:http://google .com},{T2:http://helloworld. com},
                                       {T3:http://hu. com}] 

Use groupby with custom function:

df = (df.groupby([df['id'],df['name']])
       .apply(lambda x: [{k:v} for k, v in zip(x['Type'], x['URL'])])
       .reset_index(name='URL'))
print (df)
   id                    name  \
0   2  birth_france_by_region   
1   3                long_lat   
2   4      random_time_series   
3   5             birth_names   

                                                 URL  
0  [{'T1': 'http://abc. com'}, {'T2': 'http://pt....  
1  [{'T3': 'http://abc. com'}, {'T1': 'http://pqu...  
2  [{'T2': 'http://sadsdc. com'}, {'T3': 'http://...  
3  [{'T1': 'http://google. com'}, {'T2': 'http://...  

This is the way to get needed result:

df["temp"] = [{x: y} for x, y in list(zip(df["*Type*"], df["*URL*"]))]
df.groupby("*name*")["temp"].apply(lambda x: list(x))

For toy example:

df = pd.DataFrame({'b': ["100","1","2","4","6","-55"], 
                   'a': ['a','b','c','d','e','f'],
                   'c': ["A","A","B","B","C","C"]})

df["temp"] = [{x: y} for x, y in list(zip(df["a"], df["b"]))]
df.groupby("c")["temp"].apply(lambda x: list(x))

The output is:

c
A    [{'a': '100'}, {'b': '1'}]
B      [{'c': '2'}, {'d': '4'}]
C    [{'e': '6'}, {'f': '-55'}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM