[英]Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore
I am performing web scraping via Python \ Selenium \ Chrome headless driver.我正在通过 Python \ Selenium \ Chrome 无头驱动程序执行 web 抓取。 I am reading the results from JSON - here is my code:
我正在阅读 JSON 的结果——这是我的代码:
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
#try:
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
# Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]['addressDisplayName'])
writefunction()
CustId = CustId+1
The issue is sometimes 'addressDisplayName' will be present in the result set and sometimes not.问题是有时“addressDisplayName”会出现在结果集中,有时不会。 If its not, it errors with the error:
如果不是,它会出现以下错误:
IndexError: list index out of range
Which makes sense, as it doesn't exist.这是有道理的,因为它不存在。 How do I ignore this though - so if 'addressDisplayName' doesn't exist just continue with the loop?
不过,我该如何忽略这一点——所以如果“addressDisplayName”不存在,就继续循环? I've tried using a TRY but the code still stops executing.
我试过使用 TRY 但代码仍然停止执行。
If you get an IndexError (with an index of '0') it means that your list is empty.如果您收到 IndexError(索引为“0”),则表示您的列表为空。 So it is one step in the path earlier (otherwise you'd get a KeyError if 'addressDisplayName' was missing from the dict).
所以这是前面路径中的一个步骤(否则,如果字典中缺少“addressDisplayName”,你会得到一个 KeyError)。
You can check if the list has elements:您可以检查列表是否包含元素:
if dict_from_json['customerShowCommand']['customerAddressShowCommandSet']:
# get the data
Otherwise you can indeed use try..except:否则你确实可以使用 try..except:
try:
# get the data
except IndexError, KeyError:
# handle missing data
try..except block should resolved your issue. try..except 块应该可以解决您的问题。
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
try:
Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]'addressDisplayName'])
except:
Addr ="NaN"
CustId = CustId+1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.