简体   繁体   English

如何使用 Beautiful Soup 从 python 代码中获取 javascript 函数的结果?

[英]How to get the result of a javascript function from a python code using Beautiful Soup?

I want to scrape data from a website using Beautiful Soup in Python.我想使用 Python 中的 Beautiful Soup 从网站上抓取数据。 The site changes the values of a drop down menu based on selection by user.该站点根据用户的选择更改下拉菜单的值。 There is no api call in changing the values of drop down menu.更改下拉菜单的值没有 api 调用。 On taking a closer look, I observed there is one javascript function which is called internally to get the values of drop down menu.仔细观察后,我发现有一个 javascript 函数在内部调用以获取下拉菜单的值。 My problem is values of that drop down menu are not there in page source.我的问题是页面源中没有该下拉菜单的值。 They are got by calling that js function but sice there is no api call, I can't request that values.它们是通过调用该 js 函数获得的,但是因为没有 api 调用,我无法请求该值。 Can anyone tell me how can I call a javascript function from a python code.谁能告诉我如何从 python 代码调用 javascript 函数。 I'm using the Beautiful Soup for web scraping.我正在使用 Beautiful Soup 进行网页抓取。

Thanks谢谢

You can't.你不能。 BeautifulSoup is an HTML parser. BeautifulSoup 是一个 HTML 解析器。

You want to do more than parse HTML;您想做的不仅仅是解析 HTML; you want to evaluate Javascript.你想评估 Javascript。

Perhaps you are looking for a Javascript-capable browser, like Selenium .也许您正在寻找支持 Javascript 的浏览器,例如Selenium

You might be interested in the Pyv8 module ;您可能对Pyv8 模块感兴趣; it lets you embed a javascript interpreter in Python code, but does not include a browser DOM.它允许您在 Python 代码中嵌入 javascript 解释器,但不包括浏览器 DOM。 I give a short example in Why is BeautifulSoup not finding a specific table class?我在为什么 BeautifulSoup 没有找到特定的表类中给出了一个简短的例子

For javascript that makes more extensive use of browser features, you may prefer ghost.py , a headless Webkit-based browser with a Python API.对于更广泛地使用浏览器功能的 javascript,您可能更喜欢ghost.py ,这是一个带有 Python API 的基于 Webkit 的无头浏览器。

Failing that, if you gave the page url, we could take a look at the javascript and see if there's a quick way to duplicate the call in Python.否则,如果您提供了页面 url,我们可以查看 javascript,看看是否有一种快速的方法可以在 Python 中复制调用。

Beautiful Soup can't be used for parsing javascript loaded content. Beautiful Soup 不能用于解析 javascript 加载的内容。 You should use something like Selenium你应该使用像Selenium这样的东西

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM