简体   繁体   English

用 python 抓取网页('NoneType' 对象没有属性 'get_text')

[英]Web scraping with python ('NoneType' object has no attribute 'get_text')

I would like to extract multiple drug informations from multiple pages in https://www.medindia.net/doctors/drug_information/abacavir.htm , https://www.medindia.net/doctors/drug_information/talimogene_laherparepvec.htm , and etc我想从https://www.medindia.net/doctors/drug_information/abacavir.htmhttps://www.medindia.net/doctors/drug_information/talimogene_laherparepvec.htm等多个页面中提取多个药物信息

On each pages, The information that I would like to extract are as follows: General, Brands, Prescription Contraindications, Side effects, Dosage, How to Take, Warning and Storage.在每一页上,我想提取的信息如下:一般、品牌、处方禁忌、副作用、剂量、如何服用、警告和储存。

By using Beautiful soup, I am able to identify the class needed for extraction.通过使用 Beautiful Soup,我能够确定提取所需的类。 However, when i am trying to extract the information and store the information into a variable, it shows the 'NoneType' object has no attribute 'get_text' .但是,当我尝试提取信息并将信息存储到变量中时,它显示'NoneType' object has no attribute 'get_text' It seems that there is no element with the class 'drug-content'.似乎没有“药物含量”类的元素。 However, when I print the items it shows the class.但是,当我打印项目时,它会显示类。 Please help me.请帮我。 Below is my code:下面是我的代码:

import pandas as pd
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'https://www.medindia.net/doctors/drug_information/abacavir.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
drug = soup.find(class_='mi-container__fluid')
print(drug)

# whole page contain drug content
items = drug.find_all(class_='drug-content')
print(items)

# extract drug information from drug content into individual variable
general = items[0].find(class_='drug-content').get_text(strip=True).replace("\n", "")
brand = items[1].find(class_='report-content').get_text(strip=True).replace("\n", "")
prescription = items[1].find(class_='drug-content').get_text(strip=True).replace("\n", "")
contraindications = items[2].find(class_='drug-content').get_text(strip=True).replace("\n", "")
side_effect = items[2].find(class_='drug-content').get_text(strip=True).replace("\n", "")
dosage = items[3].find(class_='drug-content').get_text(strip=True).replace("\n", "")
how_to_use = items[4].find(class_='drug-content').get_text(strip=True).replace("\n", "")
warnings = items[5].find(class_='drug-content').get_text(strip=True).replace("\n", "")
storage = items[7].find(class_='drug-content').get_text(strip=True).replace("\n", "")

I have try to change the class to 'report-content drug-widget'.我尝试将课程更改为“报告内容药物小部件”。 However, with that class, I am unable to extract the general information.但是,对于该课程,我无法提取一般信息。 And also side-effect is unavailable for this drug.而且这种药物也没有副作用。 How can I put an NA into the variable if the information is not available for the drug.如果该药物的信息不可用,我如何将 NA 放入变量中。

# whole page contain drug content
items = drug.find_all(class_='report-content drug-widget')
print(items)

# extract drug information from drug content into individual variable
general = items.find(class_='drug-content').get_text(strip=True).replace("\n", "")
brand = items[0].find(class_='drug-content').get_text(strip=True).replace("\n", "")

Please advice how to extract the information and how can I put NA where information which I need are not available.请建议如何提取信息以及如何将 NA 放在我需要的信息不可用的地方。

I can help you with the first one, it should help you get started on how to deal with non finds, and how to search for the pattern your looking for:我可以帮助您解决第一个问题,它应该可以帮助您开始了解如何处理未找到的问题,以及如何搜索您要查找的模式:

try:
  general = items[0].find('h3', attrs={'style': 'margin:0px!important'}).get_text(strip=True).replace("\n", "").replace("\xa0", " ")
except:
  general = "N/A"

You can slice the Generic Name: out since it's probably the same size for each answer by:您可以通过以下方式将 Generic Name: 切片,因为每个答案的大小可能相同:

general = general[15:]  
print(general):
#'Abacavir'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AttributeError: 'NoneType' 对象没有属性 'get_text' python web-scraping - AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping 在亚马逊网页抓取时在 BS4 中收到错误:AttributeError: 'NoneType' 对象没有属性 'get_text' - Receiving an error in BS4 while amazon web scraping : AttributeError: 'NoneType' object has no attribute 'get_text' AttributeError: 'NoneType' 对象在 beautifulsoop web-scraping 中没有属性 'get_text' - AttributeError: 'NoneType' object has no attribute 'get_text' in beautifulsoop web-scraping Python-AttributeError:“ NoneType”对象没有属性“ get_text” - Python - AttributeError: 'NoneType' object has no attribute 'get_text' NoneType 对象没有属性“get_text”——Python - NoneType Object has no attribute “get_text” — Python Python,'NoneType' 对象没有属性 'get_text' - Python, 'NoneType' object has no attribute 'get_text' AttributeError: 'NoneType' 对象没有属性 'get_text' - AttributeError: 'NoneType' object has no attribute 'get_text' 使用 beautifulsoup 'NoneType' 对象抓取网页没有属性 'get_text' - Webscraping with beautifulsoup 'NoneType' object has no attribute 'get_text' AttributeError: 'NoneType' object 没有带有输入 id 的属性 'get_text' - AttributeError: 'NoneType' object has no attribute 'get_text' with input id “AttributeError: 'NoneType' object 没有属性 'get_text'” - "AttributeError: 'NoneType' object has no attribute 'get_text'"
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM