简体   繁体   中英

extract js data from a web page using scrapy

I am crawling a web page using scrapy.

Now there's some data in a script tag. I got all data in script tag using xpath and looks like this.

 <script>
 some data

 abc.xyz=[["mohit","gupta","456123"]];

 some data
 </script>

I want data in abc.xyz but I'm unable to do so.

You can use regular expression abc.xyz=(.*?); for extracting the variable value. Also, if you want to make a python list from it, you can use literal_eval() :

from ast import literal_eval
import re

text = """<script>
 some data

 abc.xyz=[["mohit","gupta","456123"]];

 some data
 </script>"""

value = re.search('abc.xyz=(.*?);', text).group(1)
print value, type(value)

value = literal_eval(value)
print value, type(value)

prints:

[["mohit","gupta","456123"]] <type 'str'>
[['mohit', 'gupta', '456123']] <type 'list'>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM