用 python 抓取网页（'NoneType' 对象没有属性 'get_text'）

Question

I would like to extract multiple drug informations from multiple pages in https://www.medindia.net/doctors/drug_information/abacavir.htm , https://www.medindia.net/doctors/drug_information/talimogene_laherparepvec.htm , and etc我想从https://www.medindia.net/doctors/drug_information/abacavir.htm 、 https://www.medindia.net/doctors/drug_information/talimogene_laherparepvec.htm等多个页面中提取多个药物信息

On each pages, The information that I would like to extract are as follows: General, Brands, Prescription Contraindications, Side effects, Dosage, How to Take, Warning and Storage.在每一页上，我想提取的信息如下：一般、品牌、处方禁忌、副作用、剂量、如何服用、警告和储存。

By using Beautiful soup, I am able to identify the class needed for extraction.通过使用 Beautiful Soup，我能够确定提取所需的类。 However, when i am trying to extract the information and store the information into a variable, it shows the 'NoneType' object has no attribute 'get_text' .但是，当我尝试提取信息并将信息存储到变量中时，它显示'NoneType' object has no attribute 'get_text' 。 It seems that there is no element with the class 'drug-content'.似乎没有“药物含量”类的元素。 However, when I print the items it shows the class.但是，当我打印项目时，它会显示类。 Please help me.请帮我。 Below is my code:下面是我的代码：

import pandas as pd
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'https://www.medindia.net/doctors/drug_information/abacavir.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
drug = soup.find(class_='mi-container__fluid')
print(drug)

# whole page contain drug content
items = drug.find_all(class_='drug-content')
print(items)

# extract drug information from drug content into individual variable
general = items[0].find(class_='drug-content').get_text(strip=True).replace("\n", "")
brand = items[1].find(class_='report-content').get_text(strip=True).replace("\n", "")
prescription = items[1].find(class_='drug-content').get_text(strip=True).replace("\n", "")
contraindications = items[2].find(class_='drug-content').get_text(strip=True).replace("\n", "")
side_effect = items[2].find(class_='drug-content').get_text(strip=True).replace("\n", "")
dosage = items[3].find(class_='drug-content').get_text(strip=True).replace("\n", "")
how_to_use = items[4].find(class_='drug-content').get_text(strip=True).replace("\n", "")
warnings = items[5].find(class_='drug-content').get_text(strip=True).replace("\n", "")
storage = items[7].find(class_='drug-content').get_text(strip=True).replace("\n", "")

I have try to change the class to 'report-content drug-widget'.我尝试将课程更改为“报告内容药物小部件”。 However, with that class, I am unable to extract the general information.但是，对于该课程，我无法提取一般信息。 And also side-effect is unavailable for this drug.而且这种药物也没有副作用。 How can I put an NA into the variable if the information is not available for the drug.如果该药物的信息不可用，我如何将 NA 放入变量中。

# whole page contain drug content
items = drug.find_all(class_='report-content drug-widget')
print(items)

# extract drug information from drug content into individual variable
general = items.find(class_='drug-content').get_text(strip=True).replace("\n", "")
brand = items[0].find(class_='drug-content').get_text(strip=True).replace("\n", "")

Please advice how to extract the information and how can I put NA where information which I need are not available.请建议如何提取信息以及如何将 NA 放在我需要的信息不可用的地方。

Answer 1

I can help you with the first one, it should help you get started on how to deal with non finds, and how to search for the pattern your looking for:我可以帮助您解决第一个问题，它应该可以帮助您开始了解如何处理未找到的问题，以及如何搜索您要查找的模式：

try:
  general = items[0].find('h3', attrs={'style': 'margin:0px!important'}).get_text(strip=True).replace("\n", "").replace("\xa0", " ")
except:
  general = "N/A"

You can slice the Generic Name: out since it's probably the same size for each answer by:您可以通过以下方式将 Generic Name: 切片，因为每个答案的大小可能相同：

general = general[15:]  
print(general):
#'Abacavir'

用 python 抓取网页（'NoneType' 对象没有属性 'get_text'）

问题描述

1 个解决方案

解决方案1
0 2019-11-25 19:25:21

用 python 抓取网页（&#39;NoneType&#39; 对象没有属性 &#39;get_text&#39;）

问题描述

1 个解决方案

解决方案1 0 2019-11-25 19:25:21

用 python 抓取网页（'NoneType' 对象没有属性 'get_text'）

解决方案1
0 2019-11-25 19:25:21