[英]How to extract a nested dictionary from a STRING column in Python Pandas Dataframe?
There's a table where one data point of its column event
looks like this:有一个表,其中列
event
一个数据点如下所示:
THE 'event IS A STRING COLUMN! '事件是一个字符串列!
df['event']
RETURNS:
"{'eventData': {'type': 'page', 'name': "WHAT'S UP"}, 'eventId': '1003', 'deviceType': 'kk', 'pageUrl': '/chick 2/whats sup', 'version': '1.0.0.888-10_7_2020__4_18_30', 'sessionGUID': '1b312346a-cd26-4ce6-888-f25143030e02', 'locationid': 'locakdi-3b0c-49e3-ab64-741f07fd4cb3', 'eventDescription': 'Page Load'}"
I'm trying to extract the nested dictionary eventData
from the dictionary and create a new column like below:我正在尝试从字典中提取嵌套字典
eventData
并创建一个如下所示的新列:
df['event']
RETURNS:
{'eventId': '1003', 'deviceType': 'kk', 'pageUrl': '/chick 2/whats sup', 'version': '1.0.0.888-10_7_2020__4_18_30', 'sessionGUID': '1b312346a-cd26-4ce6-888-f25143030e02', 'locationid': 'locakdi-3b0c-49e3-ab64-741f07fd4cb3', 'eventDescription': 'Page Load'}
df['eventData']
RETURNS:
{'type': 'page', 'name': "WHAT'S UP"}
How do I do this?我该怎么做呢?
I would look at using the pandas apply method on the event
column.我会考虑在
event
列上使用pandas apply方法。
If the eventData
key is expected to be present in the event
column dictionary for all rows of the data frame, something below may suffice如果希望
eventData
键出现在数据框所有行的event
列字典中,则以下内容可能就足够了
import json
import numpy as np
def get_event_data_from_event(event_str):
"""
Convert event string to dict and return event_data
"""
try:
event_as_dict = json.loads(event_str)
except json.decoder.JSONDecodeError:
return np.nan
else
if not "eventData" in event_as_dict.keys():
return np.nan
return event_as_dict["eventData"]
df["eventData"] = df["event"].apply(lambda x: get_event_data_from_event(x))
Which will return an N/A for that row in the eventData
column if the event
dictionary is not formatted as you expected it to be.如果
event
字典的格式不符合您的预期,它将为eventData
列中的该行返回 N/A。
You could then drop those non-conforming rows with a dropna like so:然后,您可以使用dropna删除那些不符合要求的行, 如下所示:
df_subset = df.dropna(axis='columns', subset="eventData")
I've finally fot the answer from another post: Python flatten multilevel/nested JSON我终于找到了另一篇文章的答案: Python flatten multilevel/nested JSON
How to use: json_col = pd.DataFrame([flatten_json(x) for x in df['json_column']])使用方法:json_col = pd.DataFrame([flatten_json(x) for x in df['json_column']])
def flatten_json(nested_json, exclude=['']):
out = {}
def flatten(x, name='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude: flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.