[英]Get the parameters of a JavaScript function with Scrapy
I was wondering if it is possible to extract the parameters of a JavaScript function with Scrapy, from a code similar to this one: 我想知道是否可以使用类似于此代码的代码从Scrapy中提取JavaScript函数的参数:
<script type="text/javascript">
var map;
function initialize() {
var fenway = new google.maps.LatLng(43.2640611,2.9388228);
};
}
</script>
I would like to extract the coordinates 43.2640611
and 2.9388228
. 我想提取坐标
43.2640611
和2.9388228
。
This is where re()
method would help. 这是
re()
方法有用的地方。
The idea is to locate the script
tag via xpath()
and use re()
to extract the lat
and lng
from the script
tag's contents. 想法是通过
xpath()
定位script
标记,并使用re()
从script
标记的内容中提取lat
和lng
。 Demo from the scrapy shell
: 来自
scrapy shell
演示:
$ scrapy shell index.html
>>> response.xpath('//script').re(r'new google\.maps\.LatLng\(([0-9.]+),([0-9.]+)\);')
[u'43.2640611', u'2.9388228']
where index.html
contains: 其中
index.html
包含:
<script type="text/javascript">
var map;
function initialize() {
var fenway = new google.maps.LatLng(43.2640611,2.9388228);
};
}
</script>
Of course, in your case the xpath would not be just //script
. 当然,在你的情况下,xpath不仅仅是
//script
。
FYI, new google\\.maps\\.LatLng\\(([0-9.]+),([0-9.]+)\\);
仅供参考,
new google\\.maps\\.LatLng\\(([0-9.]+),([0-9.]+)\\);
regular expression uses the saving groups ([0-9.]+)
to extract the coordinate values. 正则表达式使用保存组
([0-9.]+)
来提取坐标值。
Also see Using selectors with regular expressions . 另请参阅使用具有正则表达式的选择器 。
Disclaimer: I haven't tried this approach, but here's how I would think about it if I was constrained to using Scrapy and didn't want to parse JavaScript the way alecxe suggested above. 免责声明:我没有尝试过这种方法,但如果我被限制使用Scrapy并且不想按照alecxe建议的方式解析JavaScript,我会考虑如何。 This is a finicky, fragile hack :-)
这是一个挑剔,脆弱的黑客:-)
You can try using scrapyjs to execute the JavaScript code from your scrapy crawler. 您可以尝试使用scrapyjs从scrapy搜寻器中执行JavaScript代码。 In order to capture those parameters, you'd need to do the following:
要捕获这些参数,您需要执行以下操作:
More on step 2: Make your fake LatLng function modify the HTML page to expose lat and lng variables so that you could parse them out with Scrapy. 有关步骤2的更多信息:使您的假LatLng函数修改HTML页面以显示lat和lng变量,以便您可以使用Scrapy解析它们。 Here is some crude code to illustrate:
这里有一些粗略的代码来说明:
var LatLng = function LatLng(lat, lng) {
var latDiv = document.createElement("div");
latDiv.id = "extractedLat";
latDiv.innerHtml = lat;
document.body.appendChild(latDiv);
var lngDiv = document.createElement("div");
lngDiv.id = "extractedLng";
lngDiv.innerHtml = lng;
document.body.appendChild(lngDiv);
}
google = {
map: {
LatLng: LatLng
}
};
Overall, this approach sounds a bit painful, but could be fun to try. 总的来说,这种方法听起来有点痛苦,但尝试起来会很有趣。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.