[英]Extract lists within lists containing a string in python
我正在嘗試使用列表推導將嵌套列表划分為兩個嵌套列表。 如果不將內部列表轉換為字符串,我無法這樣做,這反過來又破壞了我以后訪問/打印/控制值的能力。
我試過這個::
paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived: This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn't you like to know?'], ...]
derived = [k for k in paragraphs3 if 'Derived:' in k]
therest = [k for k in paragraphs3 if 'Derived:' not in k]
會發生的是整個paragraph3 = []最終在where = [],除非我做這樣的事情:
for i in paragraphs3:
i = str(i)
paragraphs4.append(i)
如果我然后將paragraph4提供給列表理解,我會得到兩個列表,就像我想要的那樣。 但是它們不再是嵌套列表了:
for i in therest:
g.write('\n'.join(i))
g.write('\n\n')
寫每個!角色! inst = []在一個單獨的行中:
'
P
a
g
e
:
2
'
因此,我正在尋找一種更好的方法來分割段落3 ......或者解決方案可能在其他地方? 我正在尋找的最終結果/輸出是:
Page: 2
Bib: Something
Derived: This n that
Page: 3
Bib: Something
.
.
.
此代碼根據子列表是否包含以'Derived:'
開頭的字符串來分隔子列表。
paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived: This n that'], ['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"], ]
def show(paragraphs):
for para in paragraphs:
print('\n'.join(para), '\n')
derived = []
therest = []
print('---input---')
show(paragraphs3)
for para in paragraphs3:
if any(item.startswith('Derived:') for item in para):
derived.append(para)
else:
therest.append(para)
print('---derived---')
show(derived)
print('---therest---')
show(therest)
---input---
Page: 2
Bib: Something
Derived: This n that
Page: 3
Bib: Something
Argument: Wouldn't you like to know?
---derived---
Page: 2
Bib: Something
Derived: This n that
---therest---
Page: 3
Bib: Something
Argument: Wouldn't you like to know?
這段代碼最重要的部分是
`any(item.startswith('Derived:') for item in para)`
這將迭代para
(當前段落)中的各個字符串,並在找到以'Derived:'
開頭的字符串時立即返回True
。
FWIW, for
循環可以縮減為:
for para in paragraphs3:
(therest, derived)[any(item.startswith('Derived:') for item in para)].append(para)
因為False
和True
計算為0和1,所以它們可以用來索引(therest, derived)
元組。 然而,許多人會認為這是不可讀的。 :)
你寫的代碼幾乎是正確的。 您需要檢查列表的第3個元素中是否存在'Derived:'
。 k
基本上包含paragraphs3
一個元素
>>> paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived: This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?']]
>>> paragraphs3[0]
['Page: 2', 'Bib: Something', 'Derived: This n that']
>>> paragraphs3[0][2] # Here is where you want to check the condition
'Derived: This n that'
因此,您所要做的就是將條件更改為if 'Derived:' in k[2]
。
>>> [k for k in paragraphs3 if 'Derived:' in k[2]]
[['Page: 2', 'Bib: Something', 'Derived: This n that']]
>>> [k for k in paragraphs3 if 'Derived:' not in k[2]]
[['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"]]
derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]
therest = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' not in k, l))]
復制整個列表:
[l for l in paragraph3]
帶條件的復制列表:
[l for l in paragraph3 if sublist_contains('Derived: ', l)]
函數sublist_contains
尚未實現,所以讓我們實現它。
僅檢索與condition_check
匹配的項:
filter(condition_check, l)
由於condition_check
可以表示為lambda函數:
filter(lambda k: 'Derived: ' in k, l)
將結果轉換為布爾值(如果找到至少一個匹配條件的項,則為True):
any(filter(lambda k: 'Derived: ' in k, l))
並使用生成的內聯代碼替換sublist_contains
:
derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]
在我看來,這似乎是最直接的方式:
[p for p in paragraphs3 if 'Derived:' in '\n'.join(p)]
[p for p in paragraphs3 if 'Derived:' not in '\n'.join(p)]
但是,如果你願意的話,你可以獲得更多的動力,並將其拉成一條線(雖然它會比必要的更復雜)。
result = {k:[p for p in paragraphs3 if ('Derived:' in '\n'.join(p)) == test] for k,test in {'derived': True, 'therest': False}.items()}
這會產生一個帶有'derived'
和'therest'
作為鍵的dict
。 現在你可以這樣做:
for k,p in result.items():
print(k)
for i in p:
print(''.join(i))
看起來你的內心清單有結構; 列表本身是一個值,而不僅僅是一個不相關的值列表。 考慮到這一點,您可以編寫一個類來表示該數據。
paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived: This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?'], ...]
class Paragraph(object):
def __init__(self, page, bib, extra):
self.page = page
self.bib = bib
self.extra = extra
@property
def is_derived(self):
return 'Derived: ' in self.extra
paras = [Paragraph(p) for p in paragraphs3]
然后,您可以使用itertools中的分區配方將該列表拆分為兩個迭代器。
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
(not_derived_paras, derived_paras) = partition(lambda p: p.is_derived, paras)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.