I have a json object such that
{
"hits": {
"hits": [
{
"_source": {
"TYPES": [
{
"_ID": 130,
"_NM": "ARB-130"
},
{
"_ID": 131,
"_NM": "ARB-131"
},
{
"_ID": 132,
"_NM": "ARB-132"
}
]
}
},
{
"_source": {
"TYPES": [
{
"_ID": 902,
"_NM": "ARB-902"
},
{
"_ID": 903,
"_NM": "ARB-903"
},
{
"_ID": 904,
"_NM": "ARB-904"
}
]
}
}
]
}
}
I need to unpack it into a pandas dataframe such that I get all the unique _id and _nm pairs under the _types object
_ID _NM
0 130 ARB-130
1 131 ARB-131
2 132 ARB-132
3 902 ARB-902
4 903 ARB-903
5 904 ARB-904
I am looking for the fastest possible solution since the number of types and number of pairs within types can be in hundred of thousands. So my unpacking using pd.Series and using apply makes it slow and I would like to avoid it if possible. Any ideas would be appreciated. Also about exploding dictionaries or lists in a column into separate columns without using pd.Series as I encounter this use case on the regular
One way is to restructure your dictionary and flatten using itertools.chain
.
For performance, you should benchmark with your data.
from itertools import chain
res = list(chain.from_iterable(i['_source']['TYPES'] for i in d['hits']['hits']))
df = pd.DataFrame(res)
print(df)
_ID _NM
0 130 ARB-130
1 131 ARB-131
2 132 ARB-132
3 902 ARB-902
4 903 ARB-903
5 904 ARB-904
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.