[英]How to extract key value from deep dictionary in pandas || Python || dataframe
I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic.我正在发出请求调用并将数据存储到 JSON,然后从那里我将 JSON 加载到 pandas DataFrame,好消息是它像魔术一样工作。 However, unfortunately, I have deep dictionaries available in a few columns in the data frame.
但是,不幸的是,我在数据框中的几列中提供了深层词典。 I am unable to extract key values from it.
我无法从中提取关键值。 I am attaching the CSV file with a few columns and the important one is the "guest" column.
我附上了带有几列的 CSV 文件,其中重要的一列是“访客”列。
I have been looking on the inte.net and have tried so many things that by now I am so confused about what all is correct and incorrect.我一直在查看 inte.net 并尝试了很多东西,以至于现在我对什么是正确的什么是错误的感到很困惑。 below is the snapshot of my code and trials.
下面是我的代码和试验的快照。
Adata = response.json()
## Loading the Json Data to DataFrame
df = pd.DataFrame(Adata)
df = df.astype(str)
## Exporting the Dataframe to csv file.
df.to_csv('Appointments.csv')
## Trying to create a new column with key values that I want out of guest column.
AB = df[['guest']]
print(AB)
BA = df['guest'].str.strip().to_frame()
print(BA)
BA.to_csv('BA_sheet.csv')
##Loaded single row and tried to check if I can do something about it.
test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}}
df3 = pd.DataFrame(test)
#print (df3)
df3.to_csv('df3_testsheet.csv')
## Trying to lambda function to extract the data that I want.
AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name'])
print(AB)
## Trying regex to get the desired data.
pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB))
print(pp)
All I want is to extract id
, first_name
and the last_name
from the dictionary from that guest column.我只想从该来宾列的字典中提取
id
、 first_name
和last_name
。 Use this link to access the csv file which has the DataFrame result.使用此链接访问 csv 文件,该文件的结果为 DataFrame。
The way you're doing it, you're trying to extract your first_name
, last_name
and id
keys from a str representation of a dict.你这样做的方式是,你试图从 dict 的 str 表示中提取你的
first_name
、 last_name
和id
键。 You can convert it back to a dict using the eval
builtin (not recommended if you're not sure of where the data is coming from), or the ast.literal_eval
function from the ast
module.您可以使用内置的
eval
将其转换回字典(如果您不确定数据的来源,则不推荐),或者使用ast
模块中的ast.literal_eval
function。
import ast
df['guest'] = df['guest'].apply(ast.literal_eval)
Once you have the guest dictionaries as dict objects, you can simply apply pd.Series
to convert it to a separate DataFrame
将来宾词典作为 dict 对象后,您可以简单地应用
pd.Series
将其转换为单独的DataFrame
guest_df = df['guest'].apply(pd.Series)
guest_df['id'] # => gives you id
guest_df['first_name'] # => gives you first name
guest_df['last_name'] # => gives you last name
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.