The scrapy error I get is:
File "/anaconda/lib/python2.7/site-packages/scrapy/http/response/text.py", line 82, in urljoin
return urljoin(get_base_url(self), url)
File "/anaconda/lib/python2.7/urlparse.py", line 261, in urljoin
urlparse(url, bscheme, allow_fragments)
File "/anaconda/lib/python2.7/urlparse.py", line 143, in urlparse
tuple = urlsplit(url, scheme, allow_fragments)
File "/anaconda/lib/python2.7/urlparse.py", line 182, in urlsplit
i = url.find(':')
AttributeError: 'Selector' object has no attribute 'find'
Scrapy traced the call back to this line in my spider:
for url in links:
link_url = response.urljoin(url)
This line is in a generic parse() method. I have ran the exact same syntax many times before and never encountered an error, and wading through the documentation and source code for urllib did not yield anything.
Any advice would be greatly appreciated!
Activate anaconda python 2.7 environment
Open a scrapy shell with target url www.bing.com
scrapy shell www.bing.com
Import Selector
from scrapy.selector
using:
from scrapy.selector import Selector
Create a Selector
object from your response
selector_obj = Selector(response=response)
Use response.urljoin
to join the Selector
object
response.urljoin(selector_obj)
Check the url
variable using type()
or other technique, make sure you had extract the string you desired properly
for url in links: link_url = response.urljoin(url)
Use python 3.x instead of python 2.7 , when scrapy runs with python 3.x , the error message will be much clear and easy to understand. (Here is the same error in python36 environment)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.