[英]Search a string in javascript using python
Following my previous question : how to fetch javascript contents in python 继我之前的问题: 如何在python中获取javascript内容
I tried to make another script which fetches the data from a javascript. 我试图制作另一个脚本,该脚本从javascript中获取数据。 After getting the webpage contents of course.
获得网页内容后当然。
But, it's just not showing up the content I want. 但是,它只是没有显示我想要的内容。 I want to find "content_id" from the javascript of the page.
我想从页面的javascript中找到“ content_id”。 This is the page :- http://www.hulu.com/watch/815743
这是页面: -http : //www.hulu.com/watch/815743
Here's what I have right now. 这就是我现在所拥有的。
import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput
Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('script',{'type':'text/javascript'})
pattern = re.compile(r'"content_id":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)
I get this error : 我收到此错误:
AttributeError: 'NoneType' object has no attribute 'text'
AttributeError:“ NoneType”对象没有属性“ text”
Any idea how to solve this issue..? 任何想法如何解决这个问题..?
There are two problems in your regular expression pattern: 正则表达式模式中有两个问题:
Here is the fixed version: 这是固定版本:
pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
Works for me, getting 60585710
as a result. 为我工作,得到
60585710
。
FYI, here is the complete code that I'm executing: 仅供参考,这是我正在执行的完整代码:
import re
import requests
from bs4 import BeautifulSoup
Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.