[英]Convert nested dictionary of lists into pandas dataframe efficiently
I have a json object such that 我有一个这样的json对象
{
"hits": {
"hits": [
{
"_source": {
"TYPES": [
{
"_ID": 130,
"_NM": "ARB-130"
},
{
"_ID": 131,
"_NM": "ARB-131"
},
{
"_ID": 132,
"_NM": "ARB-132"
}
]
}
},
{
"_source": {
"TYPES": [
{
"_ID": 902,
"_NM": "ARB-902"
},
{
"_ID": 903,
"_NM": "ARB-903"
},
{
"_ID": 904,
"_NM": "ARB-904"
}
]
}
}
]
}
}
I need to unpack it into a pandas dataframe such that I get all the unique _id and _nm pairs under the _types object 我需要将其解包到pandas数据框中,以便在_types对象下获得所有唯一的_id和_nm对
_ID _NM
0 130 ARB-130
1 131 ARB-131
2 132 ARB-132
3 902 ARB-902
4 903 ARB-903
5 904 ARB-904
I am looking for the fastest possible solution since the number of types and number of pairs within types can be in hundred of thousands. 我正在寻找最快的解决方案,因为类型数和类型中的对数可能达到数十万。 So my unpacking using pd.Series and using apply makes it slow and I would like to avoid it if possible.
因此,使用pd.Series进行解压缩并使用apply会使速度变慢,如果可能,我想避免这样做。 Any ideas would be appreciated.
任何想法,将不胜感激。 Also about exploding dictionaries or lists in a column into separate columns without using pd.Series as I encounter this use case on the regular
也涉及将字典或一列中的列表分解为单独的列而无需使用pd.Series的情况,因为我经常遇到此用例
One way is to restructure your dictionary and flatten using itertools.chain
. 一种方法是重组字典并使用
itertools.chain
展平。
For performance, you should benchmark with your data. 为了提高性能,您应该以数据为基准。
from itertools import chain
res = list(chain.from_iterable(i['_source']['TYPES'] for i in d['hits']['hits']))
df = pd.DataFrame(res)
print(df)
_ID _NM
0 130 ARB-130
1 131 ARB-131
2 132 ARB-132
3 902 ARB-902
4 903 ARB-903
5 904 ARB-904
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.