使用python登錄網站和網頁抓取

Question

我正試圖找出為我的研究項目抓取房地產網站https://www.brickz.my/的方法。 我一直在硒和美湯之間嘗試，並決定選擇美湯對我來說是最好的方法，因為每個房地產的 url 結構允許我的代碼輕松快速地瀏覽網站

我正在嘗試為每個房地產建立一個數據庫交易。 如果沒有登錄，則只會顯示特定物業的 10 筆最新交易。 通過登錄，我可以訪問特定類型財產的整個交易。 這是例子

我嘗試使用 python 中的請求登錄，但它在沒有登錄的情況下不斷將我帶到頁面並最終，我只是設法取消了 10 個最新交易而不是整個交易。 這是我在 python 中的登錄代碼示例

import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.brickz.my/login/", auth=
('email', 'password'))

headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 5.1.1; SM-G928X Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36'}

soup = BeautifulSoup(page.content, 'html.parser')
#I put one of the property url to be scrapped inside response
response = get("https://www.brickz.my/transactions/residential/kuala- 
           lumpur/titiwangsa/titiwangsa-sentral-condo/non-landed/?range=2012+Oct-", 
           headers = headers)

這是我用來刮桌子的東西

  table = BeautifulSoup(response.text, 'html.parser')
  table_rows = table.find_all('tr')

  names = []   
  for tr in table_rows:
      td = tr.find_all('td')
      row = [i.text for i in td]
      names.append(row)

我如何才能成功登錄並訪問整個交易？ 我聽說過 Mechanize 庫，但它不適用於 python 3。

如果我的問題不清楚，我很抱歉，這是我第一次發帖，幾個月前我才學會使用 python。

Answer 1

一個簡單的 HTTP 跟蹤將顯示使用email和pw作為表單參數對https://www.brickz.my/login/進行 POST。

這轉化為這個請求命令：

session = requests.Session()
resp = session.post('https://www.brickz.my/login/', data={'email': '<youremail>', 'pw': '<yourpassword'})
if resp.ok:
    print("You should now be logged in")

# then use session to request the site, like 
# resp = session.get("https://www.brickz.my/whatever")

警告：未經測試，因為我在那里沒有帳戶。

Answer 2

試試下面的代碼。 打印時您看到了什么（更改email和password ）？ 結果不打印Logout嗎？

import requests
from bs4 import BeautifulSoup

URL = "https://www.brickz.my/login/"

payload = {
'email': 'your_email',
'pw': 'your_password',
'submit': 'Submit'
}

with requests.Session() as s:
    s.headers = {"User-Agent":"Mozilla/5.0"}
    s.post(URL,data=payload)
    res = s.get("https://www.brickz.my/")
    soup = BeautifulSoup(res.text,"lxml")
    for items in soup.select("select#menu_select .nav2"):
        data = [' '.join(item.text.split()) for item in items.select("option")[-1:]]
        print(data)

使用python登錄網站和網頁抓取

問題描述

2 個解決方案

解決方案1
1 2018-03-29 16:10:09

解決方案2
1 已采納 2018-03-29 18:39:43

使用python登錄網站和網頁抓取

問題描述

2 個解決方案

解決方案1 1 2018-03-29 16:10:09

解決方案2 1 已采納 2018-03-29 18:39:43

解決方案1
1 2018-03-29 16:10:09

解決方案2
1 已采納 2018-03-29 18:39:43