簡體   English   中英

如何使用美麗的湯提取數據

[英]How to extract data using beautiful soup

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://locations.atipt.com/'
headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://locations.atipt.com/al')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('ul',class_='list-unstyled')
productlinks=[]
for links in tra:
    for link in links.find_all('a',href=True):
        comp=baseurl+link['href']
        productlinks.append(comp)

for link in productlinks:
    r =requests.get(link,headers=headers)
    soup=BeautifulSoup(r.content, 'html.parser')
    tag=soup.find_all('div',class_='listing content-card')
    for pro in tag:
        tup=pro.find('a',class_='name').find_all('p')
        for i in tup:
            print(i.get_text())

我試圖提取數據,但會提供我什么我嘗試從提取數據p tag ,這些是我嘗試提取數據從頁面p標簽檢查https://locations.atipt.com/al/alabaster

到目前為止,使用 css 選擇器從 p 標簽獲取數據的工作解決方案如下:

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl = 'https://locations.atipt.com/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r = requests.get('https://locations.atipt.com/al')
soup = BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('ul', class_='list-unstyled')
productlinks = []
for links in tra:
    for link in links.find_all('a', href=True):
        comp = baseurl+link['href']
        productlinks.append(comp)

for link in productlinks:
    r = requests.get(link, headers=headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    tag = ''.join([x.get_text(strip=True).replace('\xa0','') for x in soup.select('div.listing.content-card div:nth-child(2)>p')])
    print(tag)

輸出:

634 1st Street NSte 100Alabaster, AL35007
9256 Parkway ESte ABirmingham, AL352061940 28th Ave SBirmingham, AL352095431 Patrick WaySte 101Birmingham, AL35235833 St. Vincent's DrSte 100Birmingham, AL352051401 Doug Baker BlvdSte 104Birmingham, AL35242
1877 Cherokee Ave SWCullman, AL350551301-A Bridge Creek Dr NECullman, AL35055
1821 Beltline Rd SWSte BDecatur, AL35601
4825 Montgomery HwySte 103Dothan, AL36303
550 Fieldstown RdGardendale, AL35071323 Fieldstown Rd, Ste 105Gardendale, AL35071
2804 John Hawkins PkwySte 104Hoover, AL35244
700 Pelham Rd NorthJacksonville, AL36265
1811 Hwy 78 ESte 108 & 109Jasper, AL35501-4081
76359 AL-77Ste CLincoln, AL35096
1 College DriveStation #14Livingston, AL35470
106 6th Street SouthSte AOneonta, AL35121-1823
50 Commons WaySte DOxford, AL36203
301 Huntley PkwyPelham, AL35124
41 Eminence WaySte BPell City, AL35128
124 W Grand AveSte A-4Rainbow City, AL35906
1147 US-231Ste 9 & 10Troy, AL36081
7201 Happy Hollow RdTrussville, AL35173
100 Rice Mine Road LoopSte 102Tuscaloosa, AL354061451 Dr. Edward Hillard DrSte 130Tuscaloosa, AL35401
3735 Corporate Woods DrSte 109Vestavia, AL35242-2296
636 Montgomery HwyVestavia Hills, AL352161539 Montgomery HwySte 111Vestavia Hills, AL35216

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM