I wrote a robot in Python languages with using Selinux and Chromedriver to make a request to the web page and return the results via Python Flask as a webservice. But the problem here is that it works on Windows without any problems. but it's crashed on Centos and Ubuntu Os. After checking, I saw that I could not get the destination site in Linux versions of either Centos or Ubuntu via the Wget or Curl commands, etc. The sites are displayed in the browser but are not available through the terminal. website url is: "https://www.prodirectrunning.com/p/nike-air-zoom-tempo-next-percent-barely-volt-black-volt-hyper-orange-mens-shoes-243713/"
I tested it on Linux Mint 20
(based on Ubuntu 20.04
)
Python module requests
doesn't work with standard headers but if I use header 'User-Agent': 'Mozilla/5.0'
then it works.
import requests
import lxml.html
headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://www.prodirectrunning.com/p/nike-air-zoom-tempo-next-percent-barely-volt-black-volt-hyper-orange-mens-shoes-243713/'
r = requests.get(url, headers=headers)
print(r.status_code)
print(r.url)
print(r.history)
soup = lxml.html.fromstring(r.text)
items = soup.xpath('//img/@src')
for i in items:
print(i)
The same is with curl
- it needs header User-Agent: Mozilla/5.0
to work.
curl -H 'User-Agent: Mozilla/5.0' https://www.prodirectrunning.com/p/nike-air-zoom-tempo-next-percent-barely-volt-black-volt-hyper-orange-mens-shoes-243713/
The same is with wget
- it needs header User-Agent: Mozilla/5.0
to work.
wget --header='User-Agent: Mozilla/5.0' https://www.prodirectrunning.com/p/nike-air-zoom-tempo-next-percent-barely-volt-black-volt-hyper-orange-mens-shoes-243713/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.