简体   繁体   中英

Using Python to use a website's search function

I am trying to use a search function of a website with this code structure:

<div class='search'>
<div class='inner'>
<form accept-charset="UTF-8" action="/gr/el/products" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x2713;" /></div>
<label for='query'>Ενδιαφέρομαι για...</label>
<fieldset>
<input class="search-input" data-search-url="/gr/el/products/autocomplete.json" id="text_search" name="query" placeholder="Αναζητήστε προϊόν" type="text" />
<button type='submit'>Αναζήτηση</button>
</fieldset>
</form>
</div>
</div>

with this python script:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'}



payload = {
    'query':'test'
}

r = requests.get('http://www.pharmacy295.gr',data = payload ,headers = headers)

soup = BeautifulSoup(r.text,'lxml')
products = soup.findAll('span', {'class':'name'})
print(products)

This code came as a result of extensive searches on this website on how to do this task, however I never seem to manage to get any search results - just the main page of the website.

Add products to your url and it will work fine, the method is get in the form and the form shows also the url. If you are unsure crack open use the developer console on firefox or chrome you can see exactly how the the request is made

payload = {
    'query':'neutrogena',

}

r = requests.get('http://www.pharmacy295.gr/products',data = payload ,headers = headers)

soup = BeautifulSoup(r.text,'lxml')
products = soup.findAll('span', {'class':'name'})
print(products)

Output:

[<span class="name">NEUTROGENA - Hand &amp; Nail Cream - 75ml</span>, <span class="name">NEUTROGENA - Hand Cream (Unscented) - 75ml</span>, <span class="name">NEUTROGENA - PROMO PACK 1+1 \u0394\u03a9\u03a1\u039f  Lip Moisturizer - 4,8gr</span>, <span class="name">NEUTROGENA - Lip Moisturizer with Nordic Berry - 4.9gr</span>]

Also if you prefer you can get the data as json:

In [13]: r = requests.get('http://www.pharmacy295.gr/el/products/autocomplete.json',data = payload ,headers = headers)

In [14]: print(r.json())
[{u'title': u'NEUTROGENA - Hand & Nail Cream - 75ml', u'discounted_price': u'5,31 \u20ac', u'photo': u'/system/uploads/asset/data/12584/tiny_108511.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/7547', u'price': u'8,17 \u20ac'}, {u'title': u'NEUTROGENA - Hand Cream (Unscented) - 75ml', u'discounted_price': u'4,03 \u20ac', u'photo': u'/system/uploads/asset/data/4689/tiny_102953.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/3958', u'price': u'6,20 \u20ac'}, {u'title': u'NEUTROGENA - PROMO PACK 1+1 \u0394\u03a9\u03a1\u039f  Lip Moisturizer - 4,8gr', u'discounted_price': u'3,91 \u20ac', u'photo': u'/system/uploads/asset/data/5510/tiny_118843.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/4644', u'price': u'4,60 \u20ac'}, {u'title': u'NEUTROGENA - Lip Moisturizer with Nordic Berry - 4.9gr', u'discounted_price': u'2,91 \u20ac', u'photo': u'/system/uploads/asset/data/12761/tiny_126088.jpg', u'brand': u'NEUTROGENA ', u'path': u'/products/7548', u'price': u'4,48 \u20ac'}]

Firstly the URL is wrong. You are using http://www.pharmacy295.gr but you should be using http://www.pharmacy295.gr/gr/el/products . This URL can actually be simplified to http://www.pharmacy295.gr/products .

Also are making a GET request so, rather than data=payload , try params=payload .

data is for POST requests.

Here is the documentation for requests.get() .

Do a r = requests.post('http://www.pharmacy295.gr',data = payload ,headers = headers)

GET requests also ignore the data...

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'}

payload = {
    'query':'test',

}

r = requests.get('http://www.pharmacy295.gr/products',data = payload ,headers = headers)

soup = BeautifulSoup(r.text,'lxml')
products = soup.findAll('span', {'class':'name'})
print(products)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM