Python Web 抓取错误 - 从 JSON 读取 - IndexError：列表索引超出范围 - 我该如何忽略

Question

I am performing web scraping via Python \ Selenium \ Chrome headless driver.我正在通过 Python \ Selenium \ Chrome 无头驱动程序执行 web 抓取。 I am reading the results from JSON - here is my code:我正在阅读 JSON 的结果——这是我的代码：

CustId=500
while (CustId<=510):
  
  print(CustId)

  # Part 1: Customer REST call:
  urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
  driver.get(urlg)

  soup = BeautifulSoup(driver.page_source,"lxml")

  dict_from_json = json.loads(soup.find("body").text)
  # print(dict_from_json)

  #try:
 
  CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])

  # Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]['addressDisplayName'])

  writefunction()

  CustId = CustId+1

The issue is sometimes 'addressDisplayName' will be present in the result set and sometimes not.问题是有时“addressDisplayName”会出现在结果集中，有时不会。 If its not, it errors with the error:如果不是，它会出现以下错误：

IndexError: list index out of range

Which makes sense, as it doesn't exist.这是有道理的，因为它不存在。 How do I ignore this though - so if 'addressDisplayName' doesn't exist just continue with the loop?不过，我该如何忽略这一点——所以如果“addressDisplayName”不存在，就继续循环？ I've tried using a TRY but the code still stops executing.我试过使用 TRY 但代码仍然停止执行。

Answer 1

If you get an IndexError (with an index of '0') it means that your list is empty.如果您收到 IndexError（索引为“0”），则表示您的列表为空。 So it is one step in the path earlier (otherwise you'd get a KeyError if 'addressDisplayName' was missing from the dict).所以这是前面路径中的一个步骤（否则，如果字典中缺少“addressDisplayName”，你会得到一个 KeyError）。

You can check if the list has elements:您可以检查列表是否包含元素：

if dict_from_json['customerShowCommand']['customerAddressShowCommandSet']:
    # get the data

Otherwise you can indeed use try..except:否则你确实可以使用 try..except：

try:
    # get the data
except IndexError, KeyError:
    # handle missing data

Answer 2

try..except block should resolved your issue. try..except 块应该可以解决您的问题。

CustId=500
while (CustId<=510):
  
  print(CustId)

  # Part 1: Customer REST call:
  urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
  driver.get(urlg)

  soup = BeautifulSoup(driver.page_source,"lxml")

  dict_from_json = json.loads(soup.find("body").text)
  # print(dict_from_json)

  
 
  CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
  try:
      Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]'addressDisplayName'])

  except:
      Addr ="NaN"

  CustId = CustId+1

Python Web 抓取错误 - 从 JSON 读取 - IndexError：列表索引超出范围 - 我该如何忽略

问题描述

2 个解决方案

解决方案1
1 2022-04-21 11:33:50

解决方案2
1 已采纳 2022-04-21 11:38:09

Python Web 抓取错误 - 从 JSON 读取 - IndexError：列表索引超出范围 - 我该如何忽略

问题描述

2 个解决方案

解决方案1 1 2022-04-21 11:33:50

解决方案2 1 已采纳 2022-04-21 11:38:09

解决方案1
1 2022-04-21 11:33:50

解决方案2
1 已采纳 2022-04-21 11:38:09