[英]Extracting dictionary values from a pandas dataframe
I need to extra a features from a dataset I imported from a .json file. 我需要从从.json文件导入的数据集中增加功能。
This is what it looks like: 看起来是这样的:
f1 = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json')
print(f1.head())
short_desc
1 [{'when': 1002742486, 'what': 'Usability issue...
10 [{'when': 1002742495, 'what': 'API - VCM event...
100 [{'when': 1002742586, 'what': 'Would like a wa...
10000 [{'when': 1014113227, 'what': 'getter/setter c...
100001 [{'when': 1118743999, 'what': 'Create Help Ind...
In essence, I need to take 'short_desc' as the column name, and populate it with the string values directly below it: 'Usability issue... 本质上,我需要将“ short_desc”作为列名,并在其正下方使用字符串值进行填充:“可用性问题...
So far, I've tried the following: 到目前为止,我已经尝试了以下方法:
f1['desc'] = pd.DataFrame([x for x in f1['short_desc']])
Wrong number of items passed 19, placement implies 1
Is there an easy way to accomplish this without the use of loops? 有没有一种简单的方法可以在不使用循环的情况下完成此任务? Could someone point this newbie in the right direction? 有人可以指出这个新手正确的方向吗?
Don't initialise a dataframe and try to assign it to a column - columns are meant to be pd.Series
. 不要初始化一个数据框并尝试将它分配到一列-列应该是pd.Series
。
You should just assign the list comprehension directly, like this: 您应该只直接分配列表理解,如下所示:
f1['desc'] = [x[0]['what'] for x in f1['short_desc']]
As an alternative, I would propose a solution not involving any lambda functions, using operator
and pd.Series.apply
: 作为替代方案,我将使用operator
和pd.Series.apply
提出不涉及任何lambda函数的解决方案:
import operator
f1['desc'] = f1.short_desc.apply(operator.itemgetter(0))\
.apply(operator.itemgetter('what'))
print(f1.desc.head())
1 Usability issue with external editors (1GE6IRL)
10 API - VCM event notification (1G8G6RR)
100 Would like a way to take a write lock on a tea...
10000 getter/setter code generation drops "F" in ".....
100001 Create Help Index Fails with seemingly incorre...
Name: desc, dtype: object
or you can try apply
(PS: apply
consider as a time cost function) 或者您可以尝试apply
(PS: apply
考虑为时间成本函数)
f1['short_desc'].apply(pd.Series)[0].apply(pd.Series)
Out[864]:
what when who
1 Usability issue with external editors (1GE6IRL) 1002742486 21
10 API - VCM event notification (1G8G6RR) 1002742495 10
100 Would like a way to take a write lock on a tea... 1002742586 24
10000 getter/setter code generation drops "F" in "..... 1014113227 331
100001 Create Help Index Fails with seemingly incorre... 1118743999 9571
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.