[英]How do I go about scraping a url from inline javascript
This is repeated 240 times, each time the two sets of the last digits are different numbers, i would like a list of all the urls. 这重复了240次,每当最后一组的两组数字都是不同的数字时,我想要所有URL的列表。
So i suppose i need to find each script and then find the first "commtArr" in each script, assuming its always the first. 所以我想我需要找到每个脚本,然后在每个脚本中找到第一个“ commtArr”,并假设它始终是第一个。
Where do I even start? 我什至从哪里开始?
<script type="text/javascript">
commArr[commArr.length] = "http://example.com/index.php?option==down&pid=123&id=389";
commtArr[commtArr.length] = "mp3";
commnArr[commnArr.length] = "john doe.mp3";
</script">
The URL is actually being inserted into commArr , not commtArr It seems commArr will only ever have the URL. 该URL实际上是插入到commArr中 ,而不是commtArr中 。看来commArr将永远只有该URL。
Assuming the script is repeated X times on the same page, you're left with a single variable with all the URLs already. 假设脚本在同一页面上重复了X次,则剩下的变量已经包含所有URL。 It's just a simple case of listing it out. 这只是列出它的简单情况。
for (i = 0; i < commArr.length; i++) { console.log(commArr[i]) }
If it's on various pages, then you may need some kind of spider bot script to go to all the pages, run a script that grabs commArr and persistently saves it. 如果它在各个页面上,则可能需要某种蜘蛛机器人脚本才能转到所有页面,运行一个可捕获commArr并永久保存的脚本。 I'm afraid I can't suggest anything for that aside from doing it manually. 恐怕除了手动操作外,我什么也不能建议。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.