[英]extract an element from a dictionary inside a list in a dataframe
假设我们有一个 dataframe 格式如下:
col1
[{'overall_prop': '0.812'}, {'overall_prop': '0.125'}, {'overall_prop': '0.062'}]
{}
原始数据为 json 格式。 我想从每行列表中的第一个元素中提取'overall_prop'
的值,这是我试图提取第一个元素的内容:
df['col1'].str[0]
一切都很好,然后以下提取'overall_prop'
:
df['col1'].str[0].map(lambda x: x.get('overall_prop'))
但抱怨:
{AttributeError}'float' object has no attribute 'get'
因为{}
(python dict 对象)变成了nan
然后我尝试了这个:
df['col1'].where(df['col1'].notna(), lambda x: [{}]).str[0].map(lambda x: x.get('overall_prop'))
但这次:
{TypeError}argument of type 'NoneType' is not iterable
总之,我正在寻找一种解决方案来从可以处理 null 值的列表中的字典中提取元素。
你可以这样做。 使用df.col1.apply(lambda x: x[0]['overall_prop'])
从列表中获取第一个元素,并从字典中的第一个元素中overall_prop
值。
这里的假设是col1
中的每一行都是一个字典,并且有overall_prop
import pandas as pd
df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
{'overall_prop': '0.002'},
{'overall_prop': '0.003'}],
[{'overall_prop': '0.004'},
{'overall_prop': '0.005'},
{'overall_prop': '0.006'}],
[{'overall_prop': '0.007'},
{'overall_prop': '0.008'},
{'overall_prop': '0.009'}],
[{'overall_prop': '0.010'},
{'overall_prop': '0.011'},
{'overall_prop': '0.012'}],
[{'overall_prop': '0.013'},
{'overall_prop': '0.014'},
{'overall_prop': '0.015'}]]})
print (df)
df['overall_prop'] = df['col1'].apply(lambda x: x[0]['overall_prop'])
print (df)
output 将是:
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{'overall_prop': '0.004'}, {'overall_prop': '... 0.004
2 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
3 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
4 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
如果您有没有以overall_prop
作为键的行,则可以使用它。
df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
{'overall_prop': '0.002'},
{'overall_prop': '0.003'}],
[{}],
[{'incorrect_key': '0.004'},
{'overall_prop': '0.005'},
{'overall_prop': '0.006'}],
[{'overall_prop': '0.007'},
{'overall_prop': '0.008'},
{'overall_prop': '0.009'}],
[{'overall_prop': '0.010'},
{'overall_prop': '0.011'},
{'overall_prop': '0.012'}],
[{'overall_prop': '0.013'},
{'overall_prop': '0.014'},
{'overall_prop': '0.015'}]]})
import numpy as np
df['overall_prop'] = df['col1'].apply(lambda x: x[0]['overall_prop'] if 'overall_prop' in x[0] else np.NaN)
output 将是:
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
3 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
4 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
5 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
{'overall_prop': '0.002'},
{'overall_prop': '0.003'}],
[{}],
{'bad':'0.999'},
{},
'just a bad string',
250,
35.25,
True,
False,
(10,20),
[{'incorrect_key': '0.004'},
{'overall_prop': '0.005'},
{'overall_prop': '0.006'}],
[{'overall_prop': '0.007'},
{'overall_prop': '0.008'},
{'overall_prop': '0.009'}],
[{'overall_prop': '0.010'},
{'overall_prop': '0.011'},
{'overall_prop': '0.012'}],
[{'overall_prop': '0.013'},
{'overall_prop': '0.014'},
{'overall_prop': '0.015'}]]})
def prop_check(x):
if isinstance(x,list) and isinstance(x[0],dict) and 'overall_prop' in x[0]:
return x[0]['overall_prop']
else: return np.NaN
df['overall_prop'] = df['col1'].apply(lambda x: prop_check(x))
print (df)
output 将是:
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 {'bad': '0.999'} NaN
3 {} NaN
4 just a bad string NaN
5 250 NaN
6 35.25 NaN
7 True NaN
8 False NaN
9 (10, 20) NaN
10 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
11 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
12 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
13 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.