在搜索欄中輸入值並從網頁下載輸出

Question

我正在嘗試搜索網頁 ( http://www.phillyhistory.org/historicstreets/ )。 我認為相關的源html是這樣的：

<input name="txtStreetName" type="text" id="txtStreetName">

您可以在網站上查看其余的源 html。 我想進入該文本框並輸入街道名稱並下載輸出（即在頁面的搜索框中輸入“Jefferson”並查看 Jefferson 的歷史街道名稱）。 我嘗試過使用 requests.post，並嘗試在 url 中輸入 ?get=Jefferson 來測試它是否可以正常工作。 任何人有任何想法如何獲得此頁面？ 謝謝，

卡梅倫

我目前嘗試過的代碼（一些導入未使用，因為我計划解析等）：

import requests
from bs4 import BeautifulSoup
import csv
from string import ascii_lowercase
import codecs
import os.path
import time


arrayofstreets = []



arrayofstreets = ['Jefferson']

for each in arrayofstreets:
    url = 'http://www.phillyhistory.org/historicstreets/default.aspx'
    payload = {'txtStreetName': each}
    r = requests.post(url, data=payload).content
    outfile = "raw/" + each + ".html"
    with open(outfile, "w") as code:
        code.write(r)
    time.sleep(2)

這不起作用，只給了我下載的默認網頁（即傑斐遜未在搜索欄中輸入並檢索。

Answer 1

我猜你對 'requests.post' 的引用與 python 的請求模塊有關。

由於您沒有指定要從搜索結果中抓取的內容，我將簡單地給您一個片段來獲取給定搜索查詢的 html：

import requests

query = 'Jefferson'

url = 'http://www.phillyhistory.org/historicstreets/default.aspx'
post_data = {'txtStreetName': query}

html_result =  requests.post(url, data=post_data).content

print html_result

如果您需要進一步處理 html 文件以提取一些數據，我建議您使用Beautiful Soup模塊來執行此操作。

更新版本：

    #!/usr/bin/python
import requests
from bs4 import BeautifulSoup
import csv
from string import ascii_lowercase
import codecs
import os.path
import time

def get_post_data(html_soup, query):
    view_state = html_soup.find('input', {'name': '__VIEWSTATE'})['value']
    event_validation = html_soup.find('input', {'name': '__EVENTVALIDATION'})['value']
    textbox1 = ''
    btn_search = 'Find'
    return {'__VIEWSTATE': view_state,
            '__EVENTVALIDATION': event_validation,
            'Textbox1': '',
            'txtStreetName': query,
            'btnSearch': btn_search
            }

arrayofstreets = ['Jefferson']


url = 'http://www.phillyhistory.org/historicstreets/default.aspx'
html = requests.get(url).content
for each in arrayofstreets:
        payload = get_post_data(BeautifulSoup(html, 'lxml'), each)
        r = requests.post(url, data=payload).content
        outfile = "raw/" + each + ".html"
        with open(outfile, "w") as code:
            code.write(r)
            time.sleep(2)

我/你的第一個版本的問題是我們沒有發布所有必需的參數。 要找出您需要發送的內容，請在瀏覽器中打開網絡監視器（在 Firefox 中為 Ctrl+Shitf+Q）並像往常一樣進行搜索。 如果您在網絡日志中選擇 POST 請求，則在右側您應該會看到“參數選項卡”，其中您的瀏覽器發送了帖子參數。

在搜索欄中輸入值並從網頁下載輸出

問題描述

1 個解決方案

解決方案1
2 已采納 2016-06-20 15:59:56

在搜索欄中輸入值並從網頁下載輸出

問題描述

1 個解決方案

解決方案1 2 已采納 2016-06-20 15:59:56

解決方案1
2 已采納 2016-06-20 15:59:56