简体   繁体   中英

Python3: nothing happens when submitting a form via MechanicalSoup

I need to perform some scraping on a website after submitting a search form. The problem is that when I do this via the browser, the page doesn't reload nor I get redirected anywhere: the result are displayed below the search form without any change to the link, although I can see them in the "new" page html. But when I use following code I can't see the "new" page html which should be in the response (the link provided is the one I'm actually trying to work with):

import mechanicalsoup

def fetchfile(query):

    url = "http://www.italgiure.giustizia.it/sncass/"

    browser = mechanicalsoup.Browser()
    page = browser.get(url)
    search_form = page.soup.find("form", {"id": "z-form"})
    search_form.find("input", {"id":"searchterm"})["value"] = query
    response = browser.submit(search_form, page.url)

    print(response) # the response is 200, so it should be a good sign

    # actual parsing will come later...
    print("1235" in response.text) # quick-check to see if there is what I'm looking for, but I get False

    # in fact this...
    print(page.text == response.text) # ...gives me True

fetchfile("1235/2012")

I can't understand what am I missing. I'd rather not use selenium. Any clues?

I just finished struggling with the same problem. I'm also fairly new to Python, so let me attempt to explain.

You're "finding" the elements on the page, but you need to take the result from your form search and turn it in to a Form object, then you can set the values of the form object and submit it. The reason you're not getting anything back after you submit it because none of your form values actually get set, you're just doing the search. I know this question is old, but hopefully this will help others too. I don't know what the actual value of "query" is supposed to be, so I can't verify it works, but in my program this is the method I used.

import mechanicalsoup
import html5lib
from bs4 import BeautifulSoup

def fetchfile(query):

    url = "http://www.italgiure.giustizia.it/sncass/"

    browser = mechanicalsoup.Browser()
    page = browser.get(url)

    # Using page.find() with the appropriate attributes is also useful
    # for forms without names
    FORM = mechanicalsoup.Form(page.find('form', attrs={'id': 'z-form'}))

    FORM["searchterm"] = query

    # You can verify the form values are set by doing this:
    print("Form values: ", vars(FORM))

    response = browser.submit(FORM, url)

    print(response) # the response is 200, so it should be a good sign
    Results = browser.get_current_page()
    print("Results: ", Results)

    # actual parsing will come later...
    # quick-check to see if there is what I'm looking for, but I get False
    # print("1235" in response.text) 

    # in fact this...
    print(page.text == response.text) # ...gives me True

# fetchfile("1235/2012")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM