簡體   English   中英

從python中的復雜數據結構中提取數據

[英]extract data from a complicated data structure in python

我有一個像數據的結構

[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

它是一個包含許多詞典的列表,每個都有3對 'uid': 'test_subject145', 'class':'?', 'data':[] 在最后一對'data' ,值是一個列表,它再次包含一個字典,其中有2對 'chunk':1, 'writing':[] ,在' writing '對中,它的值是一個包含的列表再次列出很多 我想要提取的是所有這些句子的內容,比如'this is exciting''you are good'等等,然后將其放入一個簡單的列表中。 它的最終形式應該是list_final = ['this is exciting', 'you are good', 'he died',... ]

鑒於您的原始列表是命名input ,只需使用列表理解:

[elem for dic in input
      for dat in dic.get('data',())
      for writing in dat.get('writing',())
      for elem in writing]

您可以使用.get(..,())這樣如果沒有這樣的鍵,它仍然有效:如果沒有這樣的鍵,我們返回空元組()因此沒有迭代。

根據您的示例輸入,我們得到:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']

TL;博士

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

慢慢來,一次做一層。 然后重構代碼以使其更小。

data = [{'class': '?',
         'data': [{'chunk': 1,
                   'writing': [['this is exciting'], ['you are good']]}],
         'uid': 'test_subject145'},
        {'class': '?',
         'data': [{'chunk': 2,
         'writing': [['he died'], ['go ahead']]}],
         'uid': 'test_subject166'}]

for d in data:
    print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}

for d in data:
     data_list = d['data']
     print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             for str in writing_sub_list:
                  print(str)
# this is exciting
# you are good
# he died
# go ahead

然后轉換為更小(但難以閱讀)的東西,重寫上面這樣的代碼。 應該很容易看到如何從一個到另一個:

strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']

然后,用Willem的回答更好地命名:

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

所以我相信下面的內容會有效

lista = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
          {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

list_of_final_products = []

for itema in lista:
  try:
    for data_item in itema['data']:
      for writa in data_item['writing']:
        for writa_itema in writa:
          list_of_final_products.append(writa)
  except:
    pass

這個項目,如上所述,我認為有助於理解 - python從dict列表中獲取值列表 (感謝McGrady)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM