将列表的嵌套字典有效地转换为pandas数据框

Question

I have a json object such that 我有一个这样的json对象

{
   "hits": {
      "hits": [
         {
            "_source": {
               "TYPES": [
                  {
                     "_ID": 130,
                     "_NM": "ARB-130"
                  },
                  {
                     "_ID": 131,
                     "_NM": "ARB-131"
                  },
                  {
                     "_ID": 132,
                     "_NM": "ARB-132"
                  }
               ]
            }
         },
         {
            "_source": {
               "TYPES": [
                  {
                     "_ID": 902,
                     "_NM": "ARB-902"
                  },
                  {
                     "_ID": 903,
                     "_NM": "ARB-903"
                  },
                  {
                     "_ID": 904,
                     "_NM": "ARB-904"
                  }
               ]
            }
         }
      ]
   }
}

I need to unpack it into a pandas dataframe such that I get all the unique _id and _nm pairs under the _types object 我需要将其解包到pandas数据框中，以便在_types对象下获得所有唯一的_id和_nm对

           _ID          _NM
0          130          ARB-130
1          131          ARB-131
2          132          ARB-132
3          902          ARB-902
4          903          ARB-903
5          904          ARB-904

I am looking for the fastest possible solution since the number of types and number of pairs within types can be in hundred of thousands. 我正在寻找最快的解决方案，因为类型数和类型中的对数可能达到数十万。 So my unpacking using pd.Series and using apply makes it slow and I would like to avoid it if possible. 因此，使用pd.Series进行解压缩并使用apply会使速度变慢，如果可能，我想避免这样做。 Any ideas would be appreciated. 任何想法，将不胜感激。 Also about exploding dictionaries or lists in a column into separate columns without using pd.Series as I encounter this use case on the regular 也涉及将字典或一列中的列表分解为单独的列而无需使用pd.Series的情况，因为我经常遇到此用例

Answer 1

One way is to restructure your dictionary and flatten using itertools.chain . 一种方法是重组字典并使用itertools.chain展平。

For performance, you should benchmark with your data. 为了提高性能，您应该以数据为基准。

from itertools import chain

res = list(chain.from_iterable(i['_source']['TYPES'] for i in d['hits']['hits']))

df = pd.DataFrame(res)

print(df)

   _ID      _NM
0  130  ARB-130
1  131  ARB-131
2  132  ARB-132
3  902  ARB-902
4  903  ARB-903
5  904  ARB-904

将列表的嵌套字典有效地转换为pandas数据框

问题描述

1 个解决方案

解决方案1
2 2018-05-04 16:03:58

将列表的嵌套字典有效地转换为pandas数据框

问题描述

1 个解决方案

解决方案1 2 2018-05-04 16:03:58

解决方案1
2 2018-05-04 16:03:58