简体   繁体   中英

How to extract key value from deep dictionary in pandas || Python || dataframe

I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic. However, unfortunately, I have deep dictionaries available in a few columns in the data frame. I am unable to extract key values from it. I am attaching the CSV file with a few columns and the important one is the "guest" column.

I have been looking on the inte.net and have tried so many things that by now I am so confused about what all is correct and incorrect. below is the snapshot of my code and trials.

Adata = response.json()

## Loading the Json Data to DataFrame
df = pd.DataFrame(Adata)
df = df.astype(str)

## Exporting the Dataframe to csv file.
df.to_csv('Appointments.csv')

## Trying to create a new column with key values that I want out of guest column.
AB = df[['guest']]
print(AB)

BA = df['guest'].str.strip().to_frame()
print(BA)

BA.to_csv('BA_sheet.csv')

##Loaded single row and tried to check if I can do something about it.
test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}}
df3 = pd.DataFrame(test)
#print (df3)
df3.to_csv('df3_testsheet.csv')


## Trying to lambda function to extract the data that I want.
AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name'])
print(AB)

## Trying regex to get the desired data.
pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB))
print(pp)

All I want is to extract id , first_name and the last_name from the dictionary from that guest column. Use this link to access the csv file which has the DataFrame result.

The way you're doing it, you're trying to extract your first_name , last_name and id keys from a str representation of a dict. You can convert it back to a dict using the eval builtin (not recommended if you're not sure of where the data is coming from), or the ast.literal_eval function from the ast module.

import ast

df['guest'] = df['guest'].apply(ast.literal_eval)

Once you have the guest dictionaries as dict objects, you can simply apply pd.Series to convert it to a separate DataFrame

guest_df = df['guest'].apply(pd.Series)

guest_df['id'] # => gives you id
guest_df['first_name'] # => gives you first name
guest_df['last_name'] # => gives you last name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM