簡體   English   中英

python 通過登錄網站進行網頁抓取

[英]python web-scraping through a login website

尋求一些幫助來抓取需要登錄的網站。 本質上,該網站是獲取交易卡價格(我相信來自 ebay),但其格式允許在 ebays 網站上搜索超過 90 天。 Login url is https://members.pwccmarketplace.com/login The url I search from is https://members.pwccmarketplace.com/ I searched the previous posts and found one I thought I could try replicate but to no success. 下面是代碼,無論它是否可以工作,任何幫助都將不勝感激。

#https://stackoverflow.com/questions/47438699/scraping-a-website-with-python-3-that-requires-login
import requests
from lxml import html
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
from datetime import datetime
from datetime import date
import pandas as pd
import numpy as np
from time import sleep
from random import randint
from urllib.parse import quote

Product_name = []
Price = []
Date_sold = []

url = "https://www.pwccmarketplace.com/login"
values = {"email": "xyz@abc.com",
          "password": "password"}

session = requests.Session()

r = session.post(url, data=values)

Search_name = input("Search for: ")
Exclude_terms = input("Exclude these terms (- infront of all, no spaces): ")
qstr = quote(Search_name)
qstrr = quote(Exclude_terms)
Number_pages = int(input("Number of pages you want searched (Number -1): "))

pages = np.arange(1, Number_pages)

for page in pages:

    params = {"Category": 6, "deltreeid": 6, "do": "Delete Tree"}
    url = "https://www.pwccmarketplace.com/market-price-research?q=" + qstr + "+" + qstrr + "&year_min=2004&year_max=2020&price_min=0&price_max=10000&sort_by=date_desc&sale_type=auction&items_per_page=250&page=" + str(page)

    result = session.get(url, data=params)

    soup = BeautifulSoup(result.text, "lxml")

    search = soup.find_all('tr')

    sleep(randint(2,10))

    for container in search:

代碼繼續,但與這個問題無關。

當您執行POST https://members.pwccmarketplace.com/login時,有效負載中會發送一個令牌。 此令牌位於input標簽中,可以使用 beautifulsoup 刮取:

import requests
from bs4 import BeautifulSoup

session = requests.Session()

email = "your@email.com"
password = "your_password"

r = session.get("https://members.pwccmarketplace.com/login")

soup = BeautifulSoup(r.text, "html.parser")
token = soup.find("input", { "name": "_token"})["value"]

r = session.post(
    "https://members.pwccmarketplace.com/login",
    data = {
        "_token": token,
        "redirect": "",
        "email": email,
        "password": password,
        "remember": "true"
    }
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM