[英]Python: How can I check if an item in a list contains a string within an elif statement where two conditions must be met?
I'm making a scraper in python that executes a search, then opens each link in the search and makes a list of everything within a strong tag. 我正在用python做一个刮板,该刮板执行搜索,然后打开搜索中的每个链接,并在强标签内列出所有内容。
Then it append the list to a Dataset. 然后将列表追加到数据集。 Not all of the pages are the same so I am organizing them according to how many strong tags and in some cases if a particular tag contains one or more words.
并非所有页面都相同,因此我将根据有多少个强标签,在某些情况下,如果一个特定的标签包含一个或多个单词来组织它们。 I need both conditions to be met in order for the contents of the strong tag to go the right column.
我需要同时满足两个条件才能使strong标签的内容移到正确的列。
The code works but is bulky, and I'm trying to work on making clean code. 该代码可以工作,但是体积很大,我正在尝试制作干净的代码。
for a in addr:
driver.get(a)
print(a)
WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "_errorElement_")))
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
columns = ['Business Name', 'Control Number', 'Business Type', 'Business Status', 'NAICS Code', 'NAICS Sub Code',
'Principal Office Address', 'Date of Formation/ Registration Date', 'State of Formation/ Jurisdiction',
'Last Registration Year', 'Dissolved Date', 'Registered Agent', 'Registered Agent Address', 'County']
df = pd.DataFrame(columns=columns)
strong = []
for strong_tag in soup.find_all('strong'):
strong.append(str(strong_tag.text))
if len(strong) == 14:
values = [strong[0], strong[1], strong[2], strong[3], strong[4], strong[5], strong[6], strong[7], strong[8],
strong[9], strong[10], strong[11], strong[12], strong[13]]
elif len(strong) == 6:
values = [strong[0], '', '', 'Name Reservation', '', '', strong[3], strong[1], '', '', '', strong[2], '', '']
elif len(strong) == 13 and "Active" in str(strong[3]):
values = [strong[0], strong[1], strong[2], strong[3], strong[4], strong[5], strong[6], strong[7], strong[8],
strong[9], '', strong[10], strong[11],strong[12]]
#the above code appears to be correct for 13 length active compliance Domestic LLC( and possibly active owes current year)
The following 5 elif statements are what i'm trying to combine. 以下5条elif语句是我要结合的内容。 I'm not sure how to check if an item in a list contains any of the 5 words while also checking the length of the list.
我不确定如何检查列表中的项目是否包含5个单词中的任何一个,同时还检查列表的长度。
elif len(strong) == 13 and "Admin" in str(strong[3]):
values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
strong[9], strong[10], strong[11], strong[12]]
elif len(strong) == 13 and "Abandoned" in str(strong[3]):
values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
strong[9], strong[10], strong[11], strong[12]]
elif len(strong) == 13 and "Withdrawn" in str(strong[3]):
values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
strong[9], strong[10], strong[11], strong[12]]
elif len(strong) == 13 and "Dissolved" in str(strong[3]):
values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
strong[9], strong[10], strong[11], strong[12]]
elif len(strong) == 13 and "Terminated" in str(strong[3]):
values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
strong[9], strong[10], strong[11], strong[12]]
elif len(strong) == 12:
values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
'', strong[9], strong[10], strong[11]]
else:
values = [strong[0], '', '', '', '', '', '', '', '', '', '', '', '', '']
print("WARNING! New values length...")
df = df.append(pd.Series(values, index=columns), ignore_index=True)
df2 = df2.append(df)
driver.close()
driver.switch_to.window(driver.window_handles[0])
Just use in
the other way around, you want to check if strong[3]
is in
the array ['Admin', 'Abandoned', ...]
: 只需
in
其他方式使用,您要检查是否in
数组['Admin', 'Abandoned', ...]
是否包含strong[3]
['Admin', 'Abandoned', ...]
:
l = ['Admin', 'Abandoned', 'Withdrawn', 'Dissolved', 'Terminated']
if len(strong) == 13 and strong[3] in l:
values = strong[:5] + [''] + strong[5:]
elif len(strong) == 12:
values = strong[:5] + [''] + strong[5:9] + [''] + strong[9:]
else:
values = [strong[0]] + ['']*12
PS And you can also combine elements when assigning to values
to make it more concise PS并且您还可以在分配
values
时组合元素以使其更简洁
The inside checks are redundant, I will suggest you to add a length condition outside and when that condition is accomplished, then met the following requirements inside as example: 内部检查是多余的,我建议您在外部添加一个长度条件,当该条件完成时,例如在内部满足以下要求:
if len(strong) == 13:
# All the flow comming here has a list of length 13
if "Dissolved" in strong[3]:
# Do whatever
pass
elif ...:
...
elif len(strong) == 12:
...
This way it's more understandable. 这样更容易理解。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.