我如何从中抓取文字<script> tags with bs4?

Question

我正在尝试使用BS4从标记中抓取一些文本，但是每次运行脚本时，我都会不断收到TypeError。

我尝试使用几个不同的解析器，但它们都返回相同的TypeError。

我的python代码是：

s = requests.Session()
r = (s.get(url, headers=headers))
soup = BeautifulSoup(r.content, 'html5lib')
profile = soup.find('script', attrs={'name': 'window.profile'})['value']

我要抓取的HTML是：

<script>
// Profile helper.
window.profile = 'PROFILEIDHERE';
</script>

我的代码的预期结果是将'window.profile'的值分配给变量'profile'，但是每次运行脚本时都会收到TypeError。

Answer 1

您可以使用get_text（）获取标签的文本值：

allScripts = soup.find_all("script")
for script in allScripts:
    scriptText = script.get_text()
    scriptTextValue = scriptText.split("'")[1]
    print(scriptTextValue)

我如何从中抓取文字<script> tags with bs4?

问题描述

1 个解决方案

解决方案1
0 2019-08-08 05:59:47

我如何从中抓取文字<script> tags with bs4?

问题描述

1 个解决方案

解决方案1 0 2019-08-08 05:59:47

解决方案1
0 2019-08-08 05:59:47