[英]Removing words from list, only first if statement is being executed
我有一長串難以破譯的文本,每一行都被括號截斷(只包括一個,因為我無法讓這個程序在一行上運行):
"Thyroid Disorders Understanding Concepts Kaplan Endocrine Focused Review Tests n/a 88% (35/40)"
我正在嘗試像這樣格式化它,並將 append 到一個文件中:
"Thyroid Disorders Understanding Concepts 88% (35/40)"
所以我需要從每個字符串中刪除字符串“Kaplan”、“Endocrine”、“Focused”、“Review”、“Tests”和“n/a”,並去掉制表符/換行符。
這是我的代碼:
text = """Thyroid Disorders Understanding Concepts Kaplan Endocrine A Focused Review Tests n/a 88% (35/40)
"""
line = ''
for character in text:
line = line + character # append every character to string
if character == ')': # closing parenthesis signals end of one line
print('Original line: '+ line) # sanity check
line_as_list = line.split() # removes tabs/newlines and makes it easier to remove certain strings
for word in line_as_list: # loop through each list item, remove if needed
if word == 'Kaplan':
line_as_list.remove(word)
print(line_as_list) # another sanity check, 'Kaplan' is gone
if word == 'Endocrine': # never runs
line_as_list.remove(word)
print(line_as_list )
# Intentionally left out the rest of the words that need to be removed
這將返回以下內容:
"Original line: Thyroid Disorders Understanding Concepts Kaplan Endocrine A Focused Review Tests n/a 88% (35/
40)"
['Thyroid', 'Disorders', 'Understanding', 'Concepts', 'Endocrine', 'A', 'Focused', 'Review', 'Tests',
'n/a', '88%', '(35/40)']
第一個if
語句下的代碼按我的意圖執行,但if word == 'Endocrine'
下的代碼塊永遠不會運行。
我試過了
if word == 'Kaplan' or word == 'Endocrine':
line_as_list.remove(word)
和
if word == 'Kaplan':
line_as_list.remove(word)
elif word == 'Endocrine':
line_as_list.remove(word)
兩者都無效,“卡普蘭”是唯一被刪除的詞。 感謝您對此的任何幫助。
問題是您正在改變您當前正在迭代的列表。 由於Kaplan
和Endocrine
緊隨其后,因此 Endocrine 將被跳過,因為它接管了Kaplan
的索引,並且循環繼續到下一個索引(這是 Endocrine 的舊索引)。 如果您在自己的代碼中在 Kaplan 和 Endocrine 之間添加另一個字符串,這很容易說明,並且您會看到兩者都被刪除,因為中間的單詞會被跳過。
最佳做法是創建一個沒有您要刪除的項目的新列表,而不是改變輸入列表。
我建議使用列表理解來解決它並創建一個新列表。
text = """Thyroid Disorders Understanding Concepts Kaplan Endocrine A Focused Review Tests n/a 88% (35/40)
"""
line = ''
for character in text:
line += character # append every character to string
if character == ')': # closing parenthesis signals end of one line
print('Original line: '+ line) # sanity check
new_list = [word for word in line.split() if word not in ["Kaplan", "Endocrine"]] # loop through each list item, remove if needed
print(new_list)
此處出現錯誤的原因是,remove 會在退后一步后拉出所有元素,並且迭代器不會更新,因此在 Thyroid 被刪除后,內分泌在它的 position 中並且不再被觸發。 一個簡單的解決方法是:
text = """Thyroid Disorders Understanding Concepts Kaplan Endocrine A Focused Review Tests n/a 88% (35/40)
"""
line = ''
print([char for char in text.split()])
for character in text:
line = line + character # append every character to string
if character == ')': # ')' signals end of one line
print('Original line: '+ line) # sanity check
line_as_list = line.split()
if "Kaplan" in line_as_list:
line_as_list.remove("Kaplan")
if "Endocrine" in line_as_list:
line_as_list.remove("Endocrine")
更改此行:
for word in line_as_list:
至:
for word in line_as_list.copy():
這樣,當您從原始列表中刪除“Kaplan”時,它不會影響列表上的迭代。
試試下面
text = "Thyroid Disorders Understanding Concepts Kaplan Endocrine A Focused Review Tests n/a 88% (35/40)"
words_to_remove = {'Kaplan', 'Endocrine', 'Focused', 'Review', 'Tests', 'n/a'}
print(' '.join([w for w in text.split() if w not in words_to_remove]))
output
Thyroid Disorders Understanding Concepts A 88% (35/40)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.