I have a handful of for loops currently to test various things, and for some reason whatever for loop is the last one in the code, it is always ignored. I can rearrange the loops however I like, but the last loop is always ignored. And I have no idea why. There are two more loops before the code snippet but they are what create docsplit.
docsplit is a list of words taken from a document:
for word in docsplit:
if "http" in word or "HTTP" in word or "Http" in word:
#if it isn't already in the list
docsplit.remove(word)
for word in docsplit:
if "@" in word:
docsplit.remove(word)
for word in docsplit:
if "+1" in word:
docsplit.remove(word)
#find any websites in docsplit
#remove any strings that have a regex like XXX-XXXX or XXX-XXXX-XXXX
for word in docsplit:
if re.search(r'\d{3}-\d{4}', word):
docsplit.remove(word)
#remove any strings that have a regex like (XXX)
for word in docsplit:
if re.search(r"\(\d{3}\)", word):
docsplit.remove(word)
Use list comprehension , instead of changing the list while iterating over it (which is a bad practice):
import re
docsplit = ['http', '123-4567', 'foo']
docsplit = [word for word in docsplit if
not ("http" in word or "HTTP" in word or "Http" in word)
and not "@" in word
and not "+1" in word
and not re.search(r'\d{3}-\d{4}', word)
and not re.search(r"\(\d{3}\)", word)
]
print(docsplit)
# ['foo']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.