简体   繁体   中英

How can I scrape information from HowLongToBeat.com? It doesn't use a variable in the URL

I'm trying to scrape information from How Long to Beat , how can I make a request for a search without having to put the search-term in the URL?

EDIT for clarity:

The problem I face is that the site doesn't use something like http://www.howlongtobeat.com/search.php?s= search-term , therefore I cannot do something like

url         = 'http://www.howlongtobeat.com/search.php?s='
search_term = raw_input("Search: ")

r = requests.get(url + search_term)

In other words, when you type the search-term in the search dialog, the site doesn't refresh nor show a change in the URL so I can't find a way to search from outside the site.

I'm sorry if I made grammar mistakes, english is not my first language.

This is because the page is driven by AJAX requests - it updates automatically without redirecting you to visible URL.

If you open developer tools in your browser (F12) and navigate to Network panel, you will see that there are indeed requests sent to the server. I typed "test2" and got following:

Firefox中开发人员工具的屏幕截图

As you see, request is sent to a URL that looks like this: http://www.howlongtobeat.com/search_main.php?t=games&page=1&sorthead=popular&sortd=Normal%20Order&plat=&detail=0 . I typed "test2", but it's nowhere to be seen.

That's because it was sent using POST request , eg the parameters were embedded in the HTTP request itself, not the URL. When I navigated to "Params" tab in the Developer Tools, indeed I could see my input:

queryString: "test2"

So in order to use this search form, you should send a POST request to that URL containing variable "queryString" filled with whatever value you need.

I strongly encourage asking the site owners' about an API, though. Using publicly available form engines that are designed to be used by end-users in automated fashion is considered unethical.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM