简体   繁体   English

无法从网站上抓取一些“ div”标签

[英]Can't scrape some “div” tags from a site

I am trying to scrape job posts from this page: https://www.fl.ru . 我正在尝试从以下页面抓取职位发布: https : //www.fl.ru

Probably quite a newbie problem, but it turns out I can get certain tags, while others seem to be unreachable, eg: 可能是一个新手问题,但事实证明我可以获得某些标签,而其他标签似乎无法访问,例如:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("https://www.fl.ru/projects/")
bsObj = BeautifulSoup(html, "lxml")

textTags = bsObj.findAll("div", class_="b-post__txt ")
print(str(textTags))

Thanks 谢谢

If you download the page html using some downloader ( wget or curl ) you will see that the elements are not in the page at all. 如果使用某些下载程序( wgetcurl )下载html页面,您将看到元素根本不在页面中。 The elements are being generated by javascript. 元素是由javascript生成的。

For example (snippet from the source of the page): 例如(来自页面源的摘录):

<script type="text/javascript">document.write('<div class="b-post__body b-post__body_padtop_15 b-post__body_overflow_hidden b-layuot_width_full"> <div class="b-post__txt "> У нас есть для вас вакансия Full-stack PHP-разработчика на удаленную работу (полный рабочий день) или в офис (г. Москва).&nbsp; Работать нужно будет над нашими проектами, в том... </div> <div id="project-reason-3728923" style="display: none"> </div> </div>');</script>

You have two options: Execute the javascript (with a browser and something like selenium to drive it) or parse it manually, by using beautiful soup to get the <script> tag contents, then extracting the text inside document.write() and reparsing it with beautiful soup. 您有两个选择:通过使用漂亮的汤来获取<script>标签内容,执行javascript(使用浏览器和类似selenium的驱动程序)或手动解析它,然后提取document.write()的文本并重新解析它与美丽的汤。

Many modern webpages build the DOM in the browser dynamically using Javascript, and the parts you're looking for do not exist until the browser has finished building the page. 许多现代网页都使用Javascript在浏览器中动态构建DOM,而您要查找的部分在浏览器完成页面构建之前就不存在。

If you're not using a browser or library that has Javascript functionality, the page elements you're looking for will simply not exist. 如果您不使用具有Javascript功能的浏览器或库,则所寻找的页面元素将根本不存在。

试图从网站上抓取一张桌子<div tags< div><div id="text_translate"><p> 我正在尝试刮这张桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我试过这个:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一个空列表。 有人可以就如何解决这个问题提供建议吗?</p></div></div> - Trying to scrape a table from a website with <div tags

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我需要从网站上抓取数字,但它们没有标签 - I need to scrape numbers from a site but they don't have tags 无法从 airbnb 网站上抓取 ahref - can't scrape ahref from airbnb site 无法从表中刮取所有 ul 标签 - Can't scrape all of ul tags from a table 试图从网站上抓取一张桌子<div tags< div><div id="text_translate"><p> 我正在尝试刮这张桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我试过这个:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一个空列表。 有人可以就如何解决这个问题提供建议吗?</p></div></div> - Trying to scrape a table from a website with <div tags 无法使用请求从网页中抓取某些字段 - Can't scrape some fields from a webpage using requests 刮掉 div 标签内的文本,并带有一些其他带有文本的标签 - 只刮掉 div 标签的文本 - Scrape text within div tag having some other tags with text - only scrape text of the div tag 无法使用BeautifulSoup抓取嵌套标签 - Can't scrape nested tags using BeautifulSoup 无法从延迟加载网站获取某些标签 - Can't grab certain a tags from a lazyloading site 无法使用 BeautifulSoup 抓取一些表格 - Can't scrape some tables with BeautifulSoup 我无法抓取该 div 中的信息 - I can't scrape the information inside that div
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM