使用python搜索javascript中的字符串

Question

Following my previous question : how to fetch javascript contents in python 继我之前的问题：如何在python中获取javascript内容

I tried to make another script which fetches the data from a javascript. 我试图制作另一个脚本，该脚本从javascript中获取数据。 After getting the webpage contents of course. 获得网页内容后当然。

But, it's just not showing up the content I want. 但是，它只是没有显示我想要的内容。 I want to find "content_id" from the javascript of the page. 我想从页面的javascript中找到“ content_id”。 This is the page :- http://www.hulu.com/watch/815743 这是页面： -http : //www.hulu.com/watch/815743

Here's what I have right now. 这就是我现在所拥有的。

import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput


Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('script',{'type':'text/javascript'})
pattern = re.compile(r'"content_id":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

I get this error : 我收到此错误：

AttributeError: 'NoneType' object has no attribute 'text' AttributeError：“ NoneType”对象没有属性“ text”

Any idea how to solve this issue..? 任何想法如何解决这个问题..？

Answer 1

There are two problems in your regular expression pattern: 正则表达式模式中有两个问题：

the quotes are escaped with backslashes in the script contents, take that into account 脚本内容中的引号用反斜杠转义 ，请考虑在内
there is a whitespace after the colon 冒号后面有一个空格

Here is the fixed version: 这是固定版本：

pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)

Works for me, getting 60585710 as a result. 为我工作，得到60585710 。

FYI, here is the complete code that I'm executing: 仅供参考，这是我正在执行的完整代码：

import re

import requests
from bs4 import BeautifulSoup

Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)

pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

使用python搜索javascript中的字符串

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-09-27 20:04:47

使用python搜索javascript中的字符串

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-09-27 20:04:47

解决方案1
2 已采纳 2015-09-27 20:04:47