简体   繁体   中英

In a dataframe column that is a list of dictionaries, how to join values with a comma having each nested key as a new column?

I have a dataframe that looks like this

╔══════╦══════╦══════════════════════════════════╦═══════════════════════╗
║ ID   ║ Name ║ Pets                             ║ CareCenter            ║
╠══════╬══════╬══════════════════════════════════╬═══════════════════════╣
║ 1    ║ John ║ [{'Name':'Alvin','Breed':'Dog'}, ║ {'Name':'PET SHOP 1'} ║
║      ║      ║ {'Name':'Rex','Type':'Cat'}]     ║                       ║
╠══════╬══════╬══════════════════════════════════╬═══════════════════════╣
║ 10   ║ Mary ║ {'Name':'Rose','Type':'Cat'}     ║ {'Name':'PET SHOP 2'} ║
╠══════╬══════╬══════════════════════════════════╬═══════════════════════╣
║ 1234 ║ Joe  ║ {'Name':'Max','Type':'Bird'}     ║ {'Name':'PET SHOP 3'} ║
╚══════╩══════╩══════════════════════════════════╩═══════════════════════╝
  1. I iterate over each column to check its type.

  2. if it's not an object (eg string, int, etc) i leave it untouched.

  3. If it's an object, then:

    3.1) I get the first row and get the types for each column

    3.2) if the column is an object and a dictionary (just one item) I get this df

╔══════╦═══════════════════════╗
║ ID   ║ CareCenter            ║
╠══════╬═══════════════════════╣
║ 1    ║ {'Name':'PET SHOP 1'} ║
╠══════╬═══════════════════════╣
║ 10   ║ {'Name':'PET SHOP 2'} ║
╠══════╬═══════════════════════╣
║ 1234 ║ {'Name':'PET SHOP 3'} ║
╚══════╩═══════════════════════╝

I then apply a json_normalize on the single item-dict column.

╔══════╦═══════════════════════╦═══════════════════════╗
║ ID   ║ CareCenter            ║ CareCenter_Name       ║
╠══════╬═══════════════════════╬═══════════════════════╣
║ 1    ║ {'Name':'PET SHOP 1'} ║ PET SHOP 1            ║
╠══════╬═══════════════════════╬═══════════════════════╣
║ 10   ║ {'Name':'PET SHOP 2'} ║ PET SHOP 2            ║
╠══════╬═══════════════════════╬═══════════════════════╣
║ 1234 ║ {'Name':'PET SHOP 3'} ║ PET SHOP 3            ║
╚══════╩═══════════════════════╩═══════════════════════╝

And then join it back to the original df on the ID column. Using the key name concatenated with the Original Column Name and dropping the original column.

╔══════╦══════╦══════════════════════════════════╦═════════════════╗
║ ID   ║ Name ║ Pets                             ║ CareCenter_Name ║
╠══════╬══════╬══════════════════════════════════╬═════════════════╣
║ 1    ║ John ║ [{'Name':'Alvin','Breed':'Dog'}, ║ PET SHOP 1      ║
║      ║      ║ {'Name':'Rex','Type':'Cat'}]     ║                 ║
╠══════╬══════╬══════════════════════════════════╬═════════════════╣
║ 10   ║ Mary ║ {'Name':'Rose','Type':'Cat'}     ║ PET SHOP 2      ║
╠══════╬══════╬══════════════════════════════════╬═════════════════╣
║ 1234 ║ Joe  ║ {'Name':'Max','Type':'Bird'}     ║ PET SHOP 3      ║
╚══════╩══════╩══════════════════════════════════╩═════════════════╝

3.3) If the column is a list of dictionaries, I create a new df like this, just taking the ID and the incumbent column:

╔══════╦══════════════════════════════════╗
║ ID   ║ Pets                             ║
╠══════╬══════════════════════════════════╣
║ 1    ║ [{'Name':'Alvin','Breed':'Dog'}, ║
║      ║ {'Name':'Rex','Type':'Cat'}]     ║
╠══════╬══════════════════════════════════╣
║ 10   ║ {'Name':'Rose','Type':'Cat'}     ║
╠══════╬══════════════════════════════════╣
║ 1234 ║ {'Name':'Max','Type':'Bird'}     ║
╚══════╩══════════════════════════════════╝

However I'm stuck as to how to concatenate the values of each nested keys into new columns, I'm looking for something like this, after that I can take it from there and join it back to the original df . Order of concatenated strings is not important, as long as they're in their correct column

╔══════╦═══════════╦═══════════╗
║ ID   ║ Pets_Name ║ Pets_Type ║
╠══════╬═══════════╬═══════════╣
║ 1    ║ Alvin,Rex ║ Dog,Cat   ║
╠══════╬═══════════╬═══════════╣
║ 10   ║ Rose      ║ Cat       ║
╠══════╬═══════════╬═══════════╣
║ 1234 ║ Name      ║ Bird      ║
╚══════╩═══════════╩═══════════╝

Here is my code so far:

for column in data_df:
        if data_df.dtypes[column] == "object":
            new = data_df[['Id', column]].copy()
            new = new.dropna(subset = [column])
            a = (new.sample(1).applymap(type) == list).all()
            islist = False
            for i,v in a.items():
                if(v==True and i==column):
                    islist = True
            if(islist==True):
                for c in(json_normalize(new[column].sample(1).iloc[0])):
******STUCK*****    new = new.join(','.join({v for x in new[column] for y in x for k,v in y.items() if k==c}))  ****** STUCK ****
            else:
                new = new.join(json_normalize(new[column]))
            new = new.drop(column,axis=1)
            new = new.add_prefix(column + '_')
            #data_df=data_df.join(new, on='Id',how='left')
            data_df=pd.merge(data_df, new, how='left', left_on='Id', right_on=column + '_Id' )
            data_df = data_df.drop(column, 1)
            data_df = data_df.drop(column + '_Id', 1)

EDIT: This would be the final table I'm looking for

╔══════╦══════╦═══════════╦═══════════╦═════════════════╗
║ ID   ║ Name ║ Pets_Name ║ Pets_Type ║ CareCenter_Name ║
╠══════╬══════╬═══════════╬═══════════╬═════════════════╣
║ 1    ║ John ║ Alvin,Rex ║ Dog,Cat   ║ PET SHOP 1      ║
╠══════╬══════╬═══════════╬═══════════╬═════════════════╣
║ 10   ║ Mary ║ Rose      ║ Cat       ║ PET SHOP 2      ║
╠══════╬══════╬═══════════╬═══════════╬═════════════════╣
║ 1234 ║ Joe  ║ Max       ║ Bird      ║ PET SHOP 3      ║
╚══════╩══════╩═══════════╩═══════════╩═════════════════╝

Thanks in advance for any pointer you could give me,

FINAL EDIT:

This is my working code with the help from @BEN_YO

for column in data_df:
        #mytype = data_df.dtypes[column]
        mn = data_df.sample(1).applymap(type)
        mytype = mn[column].values[0]
        if mytype is dict or mytype is list:
            new = data_df[['Id', column]].copy()
            new = new.dropna(subset = [column])
            a = (new.sample(1).applymap(type) == list).all()
            islist = False
            for i,v in a.items():
                if(v==True and i==column):
                    islist = True
            if(islist==True):
                for c in(json_normalize(new[column].sample(1).iloc[0])):
                    #new = new.join(','.join({v for x in new[column] for y in x for k,v in y.items() if k==c}))
                    new = new.join(new[column].explode().apply(pd.Series).groupby(level=0)[[c]].agg(','.join))
                    #print(column)
            else:
                new = new.join(json_normalize(new[column]))
            new = new.drop(column,axis=1)
            new = new.add_prefix(column + '_')
            #data_df=data_df.join(new, on='Id',how='left')
            data_df=pd.merge(data_df, new, how='left', left_on='Id', right_on=column + '_Id' )
            data_df = data_df.drop(column, 1)
            data_df = data_df.drop(column + '_Id', 1)

Try with explode

out = df.join(df['Pets'].explode().apply(pd.Series).groupby(level=0)[['Name']].agg(','.join))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM