[英]extract data from a complicated data structure in python
我有一個像數據的結構
[ {'uid': 'test_subject145', 'class':'?', 'data':[ {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]} ] },
{'uid': 'test_subject166', 'class':'?', 'data':[ {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]} ] }, ...]
它是一個包含許多詞典的列表,每個都有3對 'uid': 'test_subject145', 'class':'?', 'data':[]
。 在最后一對'data'
,值是一個列表,它再次包含一個字典,其中有2對 'chunk':1, 'writing':[]
,在' writing '對中,它的值是一個包含的列表再次列出很多 。 我想要提取的是所有這些句子的內容,比如'this is exciting'
和'you are good'
等等,然后將其放入一個簡單的列表中。 它的最終形式應該是list_final = ['this is exciting', 'you are good', 'he died',... ]
鑒於您的原始列表是命名input
,只需使用列表理解:
[elem for dic in input
for dat in dic.get('data',())
for writing in dat.get('writing',())
for elem in writing]
您可以使用.get(..,())
這樣如果沒有這樣的鍵,它仍然有效:如果沒有這樣的鍵,我們返回空元組()
因此沒有迭代。
根據您的示例輸入,我們得到:
>>> input = [ {'uid': 'test_subject145', 'class':'?', 'data':[ {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]} ] },
... {'uid': 'test_subject166', 'class':'?', 'data':[ {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]} ] }]
>>>
>>> [elem for dic in input
... for dat in dic.get('data',())
... for writing in dat.get('writing',())
... for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']
TL;博士
[str for dic in data
for data_dict in dic['data']
for writing_sub_list in data_dict['writing']
for str in writing_sub_list]
慢慢來,一次做一層。 然后重構代碼以使其更小。
data = [{'class': '?',
'data': [{'chunk': 1,
'writing': [['this is exciting'], ['you are good']]}],
'uid': 'test_subject145'},
{'class': '?',
'data': [{'chunk': 2,
'writing': [['he died'], ['go ahead']]}],
'uid': 'test_subject166'}]
for d in data:
print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}
for d in data:
data_list = d['data']
print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]
for d in data:
data_list = d['data']
for d2 in data_list:
print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}
for d in data:
data_list = d['data']
for d2 in data_list:
writing_list = d2['writing']
print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]
for d in data:
data_list = d['data']
for d2 in data_list:
writing_list = d2['writing']
for writing_sub_list in writing_list:
print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']
for d in data:
data_list = d['data']
for d2 in data_list:
writing_list = d2['writing']
for writing_sub_list in writing_list:
for str in writing_sub_list:
print(str)
# this is exciting
# you are good
# he died
# go ahead
然后轉換為更小(但難以閱讀)的東西,重寫上面這樣的代碼。 應該很容易看到如何從一個到另一個:
strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']
然后,用Willem的回答更好地命名:
[str for dic in data
for data_dict in dic['data']
for writing_sub_list in data_dict['writing']
for str in writing_sub_list]
所以我相信下面的內容會有效
lista = [ {'uid': 'test_subject145', 'class':'?', 'data':[ {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]} ] },
{'uid': 'test_subject166', 'class':'?', 'data':[ {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]} ] }, ...]
list_of_final_products = []
for itema in lista:
try:
for data_item in itema['data']:
for writa in data_item['writing']:
for writa_itema in writa:
list_of_final_products.append(writa)
except:
pass
這個項目,如上所述,我認為有助於理解 - python從dict列表中獲取值列表 (感謝McGrady)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.