简体   繁体   English

pandas中如何从深度字典中提取键值 || Python || dataframe

[英]How to extract key value from deep dictionary in pandas || Python || dataframe

I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic.我正在发出请求调用并将数据存储到 JSON,然后从那里我将 JSON 加载到 pandas DataFrame,好消息是它像魔术一样工作。 However, unfortunately, I have deep dictionaries available in a few columns in the data frame.但是,不幸的是,我在数据框中的几列中提供了深层词典。 I am unable to extract key values from it.我无法从中提取关键值。 I am attaching the CSV file with a few columns and the important one is the "guest" column.我附上了带有几列的 CSV 文件,其中重要的一列是“访客”列。

I have been looking on the inte.net and have tried so many things that by now I am so confused about what all is correct and incorrect.我一直在查看 inte.net 并尝试了很多东西,以至于现在我对什么是正确的什么是错误的感到很困惑。 below is the snapshot of my code and trials.下面是我的代码和试验的快照。

Adata = response.json()

## Loading the Json Data to DataFrame
df = pd.DataFrame(Adata)
df = df.astype(str)

## Exporting the Dataframe to csv file.
df.to_csv('Appointments.csv')

## Trying to create a new column with key values that I want out of guest column.
AB = df[['guest']]
print(AB)

BA = df['guest'].str.strip().to_frame()
print(BA)

BA.to_csv('BA_sheet.csv')

##Loaded single row and tried to check if I can do something about it.
test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}}
df3 = pd.DataFrame(test)
#print (df3)
df3.to_csv('df3_testsheet.csv')


## Trying to lambda function to extract the data that I want.
AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name'])
print(AB)

## Trying regex to get the desired data.
pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB))
print(pp)

All I want is to extract id , first_name and the last_name from the dictionary from that guest column.我只想从该来宾列的字典中提取idfirst_namelast_name Use this link to access the csv file which has the DataFrame result.使用此链接访问 csv 文件,该文件的结果为 DataFrame。

The way you're doing it, you're trying to extract your first_name , last_name and id keys from a str representation of a dict.你这样做的方式是,你试图从 dict 的 str 表示中提取你的first_namelast_nameid键。 You can convert it back to a dict using the eval builtin (not recommended if you're not sure of where the data is coming from), or the ast.literal_eval function from the ast module.您可以使用内置的eval将其转换回字典(如果您不确定数据的来源,则不推荐),或者使用ast模块中的ast.literal_eval function。

import ast

df['guest'] = df['guest'].apply(ast.literal_eval)

Once you have the guest dictionaries as dict objects, you can simply apply pd.Series to convert it to a separate DataFrame将来宾词典作为 dict 对象后,您可以简单地应用pd.Series将其转换为单独的DataFrame

guest_df = df['guest'].apply(pd.Series)

guest_df['id'] # => gives you id
guest_df['first_name'] # => gives you first name
guest_df['last_name'] # => gives you last name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM