脚本无法获取链接中所有可用的名称

Question

我正在尝试获取以下链接中所有可用旅馆的名称。 问题是名称是动态生成的，这就是我无法使用get请求获取它们的原因。 但是，当我发布带有适当有效负载的post请求时，便可以从其着陆页中获取名称。 单击“ show more records按钮时出现问题，因为我看到一个额外的字段'lr': '87'正在添加有效载荷中的'lr': '87' ，而我无法以正确的方式使用它。

网站地址

当我在点击递增的数字show more records按钮都喜欢87 ， 227 ， 384 ， 457等。

这是我尝试解析的内容（为前几个名称工作）：

import requests
from bs4 import BeautifulSoup

url = 'http://hosteldunia.com/controller/search2.php'

payload={
    'address': 'hyderabad',
    'forWhom': 'Men',
    'accomodationType': 'undefined',
    'min': '2000',
    'max': '20000',
    'filter': 'single|doubleShare|tripleShare|fourShare|fiveShare'
}
session = requests.Session()
r = session.post(url,data=payload)
soup = BeautifulSoup(r.text,'lxml')
for item in soup.select("h5.hover-title-top"):
    print(item.text)

如何使用请求从该链接获取所有名称？

Answer 1

事实证明，这是一个挑战，我不得不查看javascript代码才能找到答案。

响应包含一个类为“ more”的div，其id为下一个lr。 我敢打赌他们没有代码审查:)

import requests
from bs4 import BeautifulSoup



def get_next_batch(lr):
    url = 'http://hosteldunia.com/controller/search2.php'

    payload = {
        'address': 'hyderabad',
        'forWhom': 'Men',
        'accomodationType': 'undefined',
        'min': '2000',
        'max': '20000',
        'filter': 'single|doubleShare|tripleShare|fourShare|fiveShare',
        'lr': lr
    }
    session = requests.Session()
    r = session.post(url, data=payload)
    soup = BeautifulSoup(r.text, 'html.parser')

    for item in soup.select("h5.hover-title-top"):
        print(item.text)

    next_lr = soup.select(".more")[0]['id']
    return next_lr

lr = None
#loads next batches
lr = get_next_batch(lr)
lr = get_next_batch(lr)
lr = get_next_batch(lr)

脚本无法获取链接中所有可用的名称

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-08-03 11:12:00

脚本无法获取链接中所有可用的名称

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-08-03 11:12:00

解决方案1
2 已采纳 2019-08-03 11:12:00