简体   繁体   中英

Converting single column of dictionary like text to multiple columns with key as column name using Pandas

I have the following data to parse,

                                                                                 Data
0    {"key0":"rand_val","key1":"rand_val","key2":"rand_val", ..., "keyn":"rand_val_n"}
1    {"key0":"rand_val","key1":"rand_val","key2":"rand_val", ..., "keyn":"rand_val_n"}
2    {"key0":"rand_val","key1":"rand_val","key2":"rand_val", ..., "keyn":"rand_val_n"}
3    {"key0":"rand_val","key1":"rand_val","key2":"rand_val", ..., "keyn":"rand_val_n"}
4    {"key0":"rand_val","key1":"rand_val","key2":"rand_val", ..., "keyn":"rand_val_n"}

Required to be converted to,

     key0      key1      key2      keyn      
0    rand_val  rand_val  rand_val  rand_val
1    rand_val  rand_val  rand_val  rand_val
2    rand_val  rand_val  rand_val  rand_val
3    rand_val  rand_val  rand_val  rand_val
4    rand_val  rand_val  rand_val  rand_val

I was able to extract the keys and convert them to column labels the hard way but kind of stuck in getting the final outcome

attr_data = data.loc[:, ['Data']]
print attr_data.iloc[0]
new_attr1 = pd.DataFrame(attr_data.Data.str.replace('{', ''))
new_attr2 = pd.DataFrame(new_attr1.Data.str.replace('}', ''))
new_attr3 = pd.DataFrame(new_attr2.Data.str.replace('"', ''))
new_attr4 = pd.DataFrame(new_attr3.Data.str.split(','))

print new_attr4.iloc[0]
column_names = []
for label, content in new_attr4.iloc[0].items():
    print label
    for item in content:
        column_names.append(item.split(':')[0])

print column_names

We can do with dataframe

yourdf=pd.DataFrame(df.Data.tolist())

Try this:

df = pd.read_csv('test.csv', sep='|')
dfs = []
for i in range(0, df.shape[0]):
    json_string = df.loc[[i]].iloc[0, 0]
    res = json.loads(json_string)
    d = pd.json_normalize(res)
    dfs.append(d)

df = pd.concat(dfs).reset_index().drop(columns=['index'])
print(df)

Output:

       key0      key1      key2        keyn
0  rand_val  rand_val  rand_val  rand_val_n
1  rand_val  rand_val  rand_val  rand_val_n
2  rand_val  rand_val  rand_val  rand_val_n
3  rand_val  rand_val  rand_val  rand_val_n
4  rand_val  rand_val  rand_val  rand_val_n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM