[英]How would I use beautiful soup to webscrape data from this website?
[![Problem][1]][1] [![问题][1]][1]
Above is the HTML, what the website looks like, and my code.上面是 HTML、网站的外观和我的代码。 I am trying to extract this information into a dictionary.我正在尝试将此信息提取到字典中。 for example {"Official Symbol: ELF4"} and so on.例如 {"Official Symbol: ELF4"} 等等。 I have already watched a few tutorials but I'm still confused.我已经看过一些教程,但我仍然感到困惑。 can anyone help me out?谁能帮我吗?
import requests
from bs4 import BeautifulSoup
url = "https://www.ncbi.nlm.nih.gov/gene/2000"
r = requests.get(url)
data = r.content
soup = BeautifulSoup(data, 'html.parser')
#text_found = soup.find("dd",attrs={"class":"noline"}).text
dd_data = soup.find_all("dd")
for dditem in dd_data:
if dditem != "None":
print(dditem.string)
da_data = soup.find_all("dt")
for daitem in da_data:
if daitem != "None":
print(daitem.string)
To scrape the data as a dict
see the following example:要将数据作为dict
抓取,请参见以下示例:
import requests
from bs4 import BeautifulSoup
URL = "https://www.ncbi.nlm.nih.gov/gene/2000"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
result = {
k.text.replace(" ", "").replace("\n", " "): v.find_next(text=True)
for k in soup.select("dt.noline")
for v in soup.select("dd.noline")
}
print(result)
Output:输出:
{'Official Symbol': 'ELF4'}
I think you can just create two lists, fill them in your loops or at once like this:我认为您可以创建两个列表,将它们填充到您的循环中或立即像这样:
dd_data = soup.find("dl", { "id" : "summaryDl" })
fields =[]
contents = []
# Append them sequentially, assuming the order is correct
for dditem in dd_data.find_all():
if dditem.name == "dd":
fields.append(dditem.text)
if dditem.name == "dt":
contents.append(dditem.text)
# zip the two lists together creating a list of pairs, then make a dictionary out of the list of pairs
contents = dict((zip(contents,fields)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.