簡體   English   中英

Web抓取多個站點-Python

[英]Webscraping Multiple Sites - Python

我有以下代碼,並且在這里所有人的幫助下,它工作得很好。 我試圖搜索一個相關的線程來回答我所遇到的問題,但是找不到,所以就到這里了。

如何在此代碼中添加多個站點,以便將其適當地打印到csv文件中?

這是我要添加的一些網站(不只是多余的3個),謝謝您的幫助。

' https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL '

' https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL '

' https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL '

下面是代碼:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


#setting my_url to the wesite
my_url = 'https://www.publicstorage.com/north-carolina/self-storage-
charlotte-nc/28206-self-storage/2334?
lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1'

#Opening up connection, grabbing the page
uClient = uReq(my_url)

#naming uClient to page_html
page_html = uClient.read()

#closing uClient
uClient.close()

#this does my html parsing
page_soup = soup(page_html, "html.parser")

#setting container to capture where the actual info is using inspect element
#grabs each product
containers = page_soup.findAll("li",{"class":"srp_res_row plp"})
store_locator = page_soup.findAll("div", {"itemprop":"address"})

filename = "product.csv"
f = open(filename, "w")

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, 
street_address, store_city\n"

f.write(headers)

for container in containers:
    for store_location in store_locator:
        street_address = store_location.findAll("span", 
{"itemprop":"streetAddress"})
        store_city = store_location.findAll("span", 
{"itemprop":"addressLocality"})
    title_container = container.div.div
    unit_size = title_container.text
    size_dim = container.findAll("div", {"class":"srp_label srp_font_14"})
    unit_container = container.li
    unit_type = unit_container.text
    online_price = container.findAll("div", {"class":"srp_label alt-price"})
    reg_price = container.findAll("div", {"class":"reg-price"})


    for item in zip(unit_size,size_dim,unit_container,online_price,reg_price,street_address,stor
e_city):
        csv=item[0] + "," + item[1].text + "," + item[2] + "," + 
item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text 
+ "\n"
        f.write(csv)

這是html腳本;

 <li class="srp_res_row plp"> <div class="srp_res_clm srp_clm160"> <div class="srp_label plp">Small</div> <div class="srp_v-space_3"></div> <div class="srp_label srp_font_14" style="padding-left: 5px;">5' x 10'</div> <div class="srp_v-space_3"></div> </div> <div class="srp_res_clm srp_clm120"> <ul class="srp_list"> <li>Outside unit/Drive-up access</li> </ul> </div> <div class="srp_res_clm srp_clm90"> <div class="srp_label">$1<span class="srp_label_symbol">†</span></div> <div class="srp_v-space_10">1st Month</div> </div> <div class="srp_res_clm srp_clm90"> <div class="srp_label alt-price">$56/mo.</div> <div class="online-special">Online Special<span class="srp_label_symbol">†</span></div> <div class="srp_v-space_15"></div> <div class="reg-price">$70 In-store</div> </div> <div class="srp_res_clm srp_clm100 srp_vcenter"><a class="srp_continue unit-no-deposit" data-deposit-amount="0" data-deposit-days="0" data-features="Outside unit/Drive-up access" data-marketing-size="5x10" data-ppk="altproduct_price" data-promotionid="132" data-siteid="2334" data-size-description="5' x 10'" data-sizeid="613573" data-wc2-unit="false" href="/ReservationDetails.aspx?st=2334&amp;sz=613573&amp;key=[rnd]&amp;location=&amp;plp=1&amp;rk=&amp;ismi=1&amp;sp=Charlotte%7c35.2270869%7c-80.8431267&amp;clp=1"><img alt="Continue" src="/images/srp-cont-new-80.png" style="width: 80px; height: 32px"/></a></div> </li> 

碼:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

# setting my_url to the wesite
urls = ['https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28206-self-storage/2334?lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1'
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL'
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL'
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL']

filename = "product.csv"
open(filename, 'w').close()
f = open(filename, "a")
num = 0

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city\n"

f.write(headers)

for my_url in urls:
    # Opening up connection, grabbing the page
    uClient = uReq(my_url)

    # naming uClient to page_html
    page_html = uClient.read()

    # closing uClient
    uClient.close()

    # this does my html parsing
    page_soup = soup(page_html, "html.parser")

    # setting container to capture where the actual info is using inspect element
    # grabs each product
    containers = page_soup.findAll("li", {"class": "srp_res_row plp"})
    store_locator = page_soup.findAll("div", {"itemprop": "address"})

    f.write("website " + str(num) + ": \n")
    for container in containers:
        for store_location in store_locator:
            street_address = store_location.findAll("span", {"itemprop": "streetAddress"})
            store_city = store_location.findAll("span", {"itemprop": "addressLocality"})
            title_container = container.div.div
            unit_size = title_container.text
            size_dim = container.findAll("div", {"class": "srp_label srp_font_14"})
            unit_container = container.li
            unit_type = unit_container.text
            online_price = container.findAll("div", {"class": "srp_label alt-price"})
            reg_price = container.findAll("div", {"class": "reg-price"})

        for item in zip(unit_size, size_dim, unit_container, online_price, reg_price, street_address, store_city):
            csv = item[0] + "," + item[1].text + "," + item[2] + "," + item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text + "\n"
            f.write(csv)
    num += 1

輸出(product.csv的內容):

unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city
website 0: 
S,5' x 10',Outside unit/Drive-up access,$55/mo.,$68 In-store,1001 N Tryon St,Charlotte
M,5' x 15',Outside unit/Drive-up access,$68/mo.,$84 In-store,1001 N Tryon St,Charlotte
M,10' x 10',Outside unit/Drive-up access,$101/mo.,$126 In-store,1001 N Tryon St,Charlotte
L,10' x 15',Outside unit/Drive-up access,$154/mo.,$187 In-store,1001 N Tryon St,Charlotte
L,10' x 25',Outside unit/Drive-up access,$167/mo.,$208 In-store,1001 N Tryon St,Charlotte
L,10' x 20',Outside unit/Drive-up access,$172/mo.,$209 In-store,1001 N Tryon St,Charlotte
L,15' x 20',Outside unit/Drive-up access,$193/mo.,$241 In-store,1001 N Tryon St,Charlotte
website 1: 
S,5' x 5',Outside unit/Drive-up access,$50/mo.,$60 In-store,3710 Monroe Road,Charlotte
S,5' x 10',Outside unit/Drive-up access,$53/mo.,$66 In-store,3710 Monroe Road,Charlotte
S,10' x 5',Outside unit/Drive-up access,$55/mo.,$68 In-store,3710 Monroe Road,Charlotte
M,10' x 10',Outside unit/Drive-up access,$97/mo.,$118 In-store,3710 Monroe Road,Charlotte
L,10' x 15',Outside unit/Drive-up access,$100/mo.,$124 In-store,3710 Monroe Road,Charlotte
L,10' x 20',Outside unit/Drive-up access,$128/mo.,$159 In-store,3710 Monroe Road,Charlotte
M,10' x 10',Climate Controlled,$129/mo.,$157 In-store,3710 Monroe Road,Charlotte
L,20' x 30',Outside unit/Drive-up access,$292/mo.,$356 In-store,3710 Monroe Road,Charlotte
website 2: 
S,5' x 10',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte
S,10' x 5',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte
S,5' x 5',Outside unit/Drive-up access,$42/mo.,$53 In-store,5301 N Sharon Amity Rd,Charlotte
M,10' x 10',Outside unit/Drive-up access,$80/mo.,$99 In-store,5301 N Sharon Amity Rd,Charlotte
L,10' x 15',Outside unit/Drive-up access,$87/mo.,$108 In-store,5301 N Sharon Amity Rd,Charlotte
L,10' x 20',Outside unit/Drive-up access,$100/mo.,$124 In-store,5301 N Sharon Amity Rd,Charlotte
L,20' x 10',Outside unit/Drive-up access,$100/mo.,$125 In-store,5301 N Sharon Amity Rd,Charlotte
M,10' x 10',Climate Controlled,$112/mo.,$139 In-store,5301 N Sharon Amity Rd,Charlotte
L,10' x 25',Outside unit/Drive-up access,$121/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte
L,20' x 10',Climate Controlled,$123/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte
L,20' x 20',Outside unit/Drive-up access,$135/mo.,$168 In-store,5301 N Sharon Amity Rd,Charlotte
website 3: 
S,3' x 3',Inside unit/1st Floor,$17/mo.,$22 In-store,4730 N Tryon St,Charlotte
S,5' x 5',Outside unit/Drive-up access,$35/mo.,$43 In-store,4730 N Tryon St,Charlotte
S,5' x 10',Outside unit/Drive-up access,$39/mo.,$49 In-store,4730 N Tryon St,Charlotte
S,10' x 5',Outside unit/Drive-up access,$40/mo.,$50 In-store,4730 N Tryon St,Charlotte
M,5' x 15',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte
M,20' x 5',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte
M,10' x 10',Outside unit/Drive-up access,$66/mo.,$82 In-store,4730 N Tryon St,Charlotte
L,10' x 15',Outside unit/Drive-up access,$84/mo.,$105 In-store,4730 N Tryon St,Charlotte
L,10' x 20',Outside unit/Drive-up access,$136/mo.,$169 In-store,4730 N Tryon St,Charlotte

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM