简体   繁体   English

特定类中的Python webscraping find_all(“ a”)

[英]Python webscraping find_all(“a”) in specific class

I am new in web scraping and I working on my little easy project. 我是网络抓取的新手,正在从事我的简单项目。

The task is getting names of "cameras" , their "prices" and "quick specs" 任务是获取"cameras"名称,其"prices""quick specs"

(from: https://www.dpreview.com/products/cameras/all?page=1 ). (来自: https : //www.dpreview.com/products/cameras/all?page=1 )。

The last mentioned I can get when I 'click' on the camera to route me to the new URL. 当我在相机上单击以将我路由到新URL时,我得到的最后提到的内容。

When I inspected the page I have to get URL from there, however with just: 当我检查页面时,我必须从那里获取URL,但是只需:

for link in soup.find_all("a"):
    print(link.get("href"))

I'll get all the links (so some extras, like logins, social media, etc): 我将获得所有链接(因此,还有一些其他信息,例如登录名,社交媒体等):

What I would like to do is get before mentioned just from the specific class. 我想做的只是从特定班级得到的。

Can you help me with it? 你能帮我吗? (or at least point me to the tutorial where this is discussed?). (或者至少让我指向讨论此内容的教程?)。

I am using BeautifulSoup in Python3. BeautifulSoup in Python3.使用BeautifulSoup in Python3.

You can find the necessary information by first anchoring your search to the td product listings with class="product" : 通过首先将搜索锚定到class="product"td产品列表中,可以找到必要的信息:

import requests, typing
class Camera(typing.NamedTuple):
  info:typing.List[str]
  quicklook:str
  price:str

from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.dpreview.com/products/cameras/all?page=1').text, 'html.parser')
headers = [['div', 'name'], ['div', 'specs'], ['div', 'prices']]
vals = [[(lambda x:getattr(x, 'text', 'N/A') if b != 'name' else [getattr(x, 'text', 'N/A'), i.a['href']])(i.find(a, {'class':b})) for a, b in headers] for i in d.find_all('td', {'class':'product'})]
final_result = [Camera(*i) for i in vals]

Output: 输出:

[Camera(info=['Fujifilm X-T3', 'https://www.dpreview.com/products/fujifilm/slrs/fujifilm_xt3'], quicklook='26 megapixels | 3″ screen | APS-C sensor', price='$1,499.00 - $2,898.00'), Camera(info=['Canon EOS R', 'https://www.dpreview.com/products/canon/slrs/canon_eos_r'], quicklook='30 megapixels | 3.2″ screen | Full frame sensor', price='Check prices'), Camera(info=['Sony Cyber-shot DSC-HX99', 'https://www.dpreview.com/products/sony/compacts/sony_dschx99'], quicklook='18 megapixels | 3″ screen | 24 – 720 mm (30×)', price='Check prices'), Camera(info=['Sony Cyber-shot DSC-HX95', 'https://www.dpreview.com/products/sony/compacts/sony_dschx95'], quicklook='18 megapixels | 3″ screen | 24 – 720 mm (30×)', price='Check prices'), Camera(info=['Nikon D3500', 'https://www.dpreview.com/products/nikon/slrs/nikon_d3500'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$496.95 - $596.95'), Camera(info=['Nikon Z6', 'https://www.dpreview.com/products/nikon/slrs/nikon_z6'], quicklook='25 megapixels | 3.2″ screen | Full frame sensor', price='$1,996.95 - $3,443.90'), Camera(info=['Nikon Z7', 'https://www.dpreview.com/products/nikon/slrs/nikon_z7'], quicklook='46 megapixels | 3.2″ screen | Full frame sensor', price='$3,396.95 - $4,843.90'), Camera(info=['Panasonic Lumix DC-LX100 II', 'https://www.dpreview.com/products/panasonic/compacts/panasonic_dclx100ii'], quicklook='17 megapixels | 3″ screen | 24 – 75 mm (3.1×)', price='$997.99 - $1,095.98'), Camera(info=['Leica M10-P', 'https://www.dpreview.com/products/leica/slrs/leica_m10_p'], quicklook='24 megapixels | 3″ screen | Full frame sensor', price='Check prices'), Camera(info=['Canon PowerShot SX740 HS', 'https://www.dpreview.com/products/canon/compacts/canon_sx740hs'], quicklook='21 megapixels | 3″ screen | 24 – 960 mm (40×)', price='$399.00'), Camera(info=['Fujifilm XF10', 'https://www.dpreview.com/products/fujifilm/compacts/fujifilm_xf10'], quicklook='24 megapixels | 3″ screen', price='$499.00'), Camera(info=['Sony Cyber-shot DSC-RX100 V(A)', 'https://www.dpreview.com/products/sony/compacts/sony_dscrx100m5a'], quicklook='20 megapixels | 3″ screen | 24 – 70 mm (2.9×)', price='$898.00 - $1,096.00'), Camera(info=['Nikon Coolpix P1000', 'https://www.dpreview.com/products/nikon/compacts/nikon_cpp1000'], quicklook='16 megapixels | 3.2″ screen | 24 – 3000 mm (125×)', price='$996.95 - $1,041.90'), Camera(info=['Fujifilm instax mini 90 NEO CLASSIC', 'https://www.dpreview.com/products/fujifilm/compacts/fujifilm_instax_mini_90'], quicklook='N/A', price='$112.00 - $121.30'), Camera(info=['Leica C-Lux', 'https://www.dpreview.com/products/leica/compacts/leica_c-lux_2018'], quicklook='20 megapixels | 3″ screen | 24 – 360 mm (15×)', price='Check prices'), Camera(info=['Sony Cyber-shot DSC-RX100 VI', 'https://www.dpreview.com/products/sony/compacts/sony_dscrx100m6'], quicklook='20 megapixels | 3″ screen | 24 – 200 mm (8.3×)', price='$1,198.00 - $1,265.16'), Camera(info=['Fujifilm X-T100', 'https://www.dpreview.com/products/fujifilm/slrs/fujifilm_xt100'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$599.00 - $899.00'), Camera(info=['Panasonic Lumix DC-TS7 (Lumix DC-FT7)', 'https://www.dpreview.com/products/panasonic/compacts/panasonic_dcts7'], quicklook='20 megapixels | 3″ screen | 28 – 128 mm (4.6×)', price='$447.99'), Camera(info=['GoPro Hero (2018)', 'https://www.dpreview.com/products/gopro/actioncams/gopro_hero_2018'], quicklook='10 megapixels | Compact sensor', price='$189.90 - $233.39'), Camera(info=['Sony Alpha a7 III', 'https://www.dpreview.com/products/sony/slrs/sony_a7iii'], quicklook='24 megapixels | 3″ screen | Full frame sensor', price='$1,998.00 - $4,396.00'), Camera(info=['Canon EOS M50 (EOS Kiss M)', 'https://www.dpreview.com/products/canon/slrs/canon_eosm50'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$629.00 - $948.48'), Camera(info=['Canon EOS Rebel T7 (EOS 2000D)', 'https://www.dpreview.com/products/canon/slrs/canon_eos2000d'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='Check prices'), Camera(info=['Canon EOS 4000D', 'https://www.dpreview.com/products/canon/slrs/canon_eos4000d'], quicklook='18 megapixels | 2.7″ screen | APS-C sensor', price='Check prices'), Camera(info=['Pentax K-1 Mark II', 'https://www.dpreview.com/products/pentax/slrs/pentax_k1ii'], quicklook='36 megapixels | 3.2″ screen | Full frame sensor', price='$1,896.95 - $2,296.95'), Camera(info=['Fujifilm X-H1', 'https://www.dpreview.com/products/fujifilm/slrs/fujifilm_xh1'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$2,648.00 - $3,548.00'), Camera(info=['Panasonic Lumix DC-GX9', 'https://www.dpreview.com/products/panasonic/slrs/panasonic_dcgx9'], quicklook='20 megapixels | 3″ screen | Four Thirds sensor', price='Check prices'), Camera(info=['Panasonic Lumix DC-ZS200 (Lumix DC-TZ200)', 'https://www.dpreview.com/products/panasonic/compacts/panasonic_dczs200'], quicklook='20 megapixels | 3″ screen | 24 – 360 mm (15×)', price='Check prices'), Camera(info=['Olympus PEN E-PL9', 'https://www.dpreview.com/products/olympus/slrs/olympus_epl9'], quicklook='16 megapixels | 3″ screen | Four Thirds sensor', price='$599.00 - $699.00'), Camera(info=['Panasonic Lumix DC-GF10 (GF90)', 'https://www.dpreview.com/products/panasonic/slrs/panasonic_dcgf10'], quicklook='16 megapixels | 3″ screen | Four Thirds sensor', price='Check prices'), Camera(info=['Fujifilm X-A5', 'https://www.dpreview.com/products/fujifilm/slrs/fujifilm_xa5'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$599.00 - $799.00'), Camera(info=['Fujifilm FinePix XP130', 'https://www.dpreview.com/products/fujifilm/compacts/fujifilm_xp130'], quicklook='16 megapixels | 3″ screen | 28 – 140 mm (5×)', price='$169.00 - $179.00'), Camera(info=['Panasonic Lumix DC-GH5S', 'https://www.dpreview.com/products/panasonic/slrs/panasonic_dcgh5s'], quicklook='10 megapixels | 3.2″ screen | Four Thirds sensor', price='$2,297.99 - $3,395.98'), Camera(info=['Leica CL', 'https://www.dpreview.com/products/leica/slrs/leica_cl'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$3,995.00'), Camera(info=['Panasonic Lumix DC-G9', 'https://www.dpreview.com/products/panasonic/slrs/panasonic_dcg9'], quicklook='20 megapixels | 3″ screen | Four Thirds sensor', price='$1,497.99 - $3,995.98'), Camera(info=['Rylo Camera', 'https://www.dpreview.com/products/rylo/actioncams/rylo_camera'], quicklook='N/A', price='$497.85 - $499.00'), Camera(info=['Xiaomi Mi Sphere 3.5K', 'https://www.dpreview.com/products/xiaomi/actioncams/xiaomi_mi_sphere_3p5k'], quicklook='16 megapixels | 2 lens(es) | Compact sensor', price='Check prices'), Camera(info=['Sony Alpha a7R III', 'https://www.dpreview.com/products/sony/slrs/sony_a7riii'], quicklook='42 megapixels | 3″ screen | Full frame sensor', price='$3,213.00 - $3,997.00'), Camera(info=['Canon PowerShot G1 X Mark III', 'https://www.dpreview.com/products/canon/compacts/canon_g1xiii'], quicklook='24 megapixels | 3″ screen | 24 – 72 mm (3×)', price='$1,099.00'), Camera(info=['GoPro Hero6 Black', 'https://www.dpreview.com/products/gopro/actioncams/gopro_hero6_black'], quicklook='12 megapixels', price='$401.97 - $412.98'), Camera(info=['Sony Cyber-shot DSC-RX10 IV', 'https://www.dpreview.com/products/sony/compacts/sony_dscrx10iv'], quicklook='20 megapixels | 3″ screen | 24 – 600 mm (25×)', price='$1,698.00 - $1,788.93'), Camera(info=['Fujifilm X-E3', 'https://www.dpreview.com/products/fujifilm/slrs/fujifilm_xe3'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$899.00 - $1,449.23'), Camera(info=['Ricoh Theta V', 'https://www.dpreview.com/products/ricoh/actioncams/ricoh_theta_v'], quicklook='12 megapixels | 2 lens(es) | Compact sensor', price='$396.99 - $616.98'), Camera(info=['Olympus OM-D E-M10 III', 'https://www.dpreview.com/products/olympus/slrs/oly_em10iii'], quicklook='16 megapixels | 3″ screen | Four Thirds sensor', price='$549.00 - $799.00'), Camera(info=['Sony DSC-RX0', 'https://www.dpreview.com/products/sony/compacts/sony_dscrx0'], quicklook='15 megapixels | 1.5″ screen', price='$598.00 - $646.00'), Camera(info=['Canon EOS M100', 'https://www.dpreview.com/products/canon/slrs/canon_eosm100'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$449.00 - $679.00'), Camera(info=['Nikon D850', 'https://www.dpreview.com/products/nikon/slrs/nikon_d850'], quicklook='45 megapixels | 3.2″ screen | Full frame sensor', price='$3,296.95 - $6,896.90'), Camera(info=['Leica TL2', 'https://www.dpreview.com/products/leica/slrs/leica_tl2'], quicklook='24 megapixels | 3.7″ screen | APS-C sensor', price='$2,195.00'), Camera(info=['Canon EOS 6D Mark II', 'https://www.dpreview.com/products/canon/slrs/canon_eos6dmkii'], quicklook='26 megapixels | 3″ screen | Full frame sensor', price='$1,599.00 - $2,848.00'), Camera(info=['Canon EOS Rebel SL2 (EOS 200D / Kiss X9)', 'https://www.dpreview.com/products/canon/slrs/canon_eos200d'], quicklook='24 megapixels | 3″ screen | APS-C sensor', price='$549.00 - $948.00'), Camera(info=['Nikon Coolpix W300', 'https://www.dpreview.com/products/nikon/compacts/nikon_cpw300'], quicklook='16 megapixels | 3″ screen | 24 – 120 mm (5×)', price='$386.95')]

Python beautifulsoup find_all 找不到<div class="“" ”></div><div id="text_translate"><p>我正在尝试使用 beautifulsoup 来查找 HTML 标签中的内容。 但是当标签是/ <strong>div class=" "</strong> /时,就不行了。 双引号中有<strong>空格</strong>时不能正确识别。</p><p> 这是我的代码:</p><pre> from bs4 import BeautifulSoup if __name__ == "__main__": soup = BeautifulSoup(open("1946.html", encoding='utf-8'), 'lxml') for k in (soup.find_all('div', class_=" ")): print(k)</pre><p> 谢谢你的帮助。</p></div> - Python beautifulsoup find_all can‘t find <div class=“ ”>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Webscraping beautifulsoup 避免在 find_all() 中重复 - Python Webscraping beautifulsoup avoid repetition in find_all() WebScraping 使用 BS4,AttributeError 使用 Find_all - WebScraping With BS4, AttributeError with Find_all Web抓取名称-find_all上的NoneType错误 - WebScraping for names - NoneType error on find_all 没有结果为特定类中的文本调用find_all - No results calling find_all for text in a specific class Python beautifulsoup find_all 找不到<div class="“" ”></div><div id="text_translate"><p>我正在尝试使用 beautifulsoup 来查找 HTML 标签中的内容。 但是当标签是/ <strong>div class=" "</strong> /时,就不行了。 双引号中有<strong>空格</strong>时不能正确识别。</p><p> 这是我的代码:</p><pre> from bs4 import BeautifulSoup if __name__ == "__main__": soup = BeautifulSoup(open("1946.html", encoding='utf-8'), 'lxml') for k in (soup.find_all('div', class_=" ")): print(k)</pre><p> 谢谢你的帮助。</p></div> - Python beautifulsoup find_all can‘t find <div class=“ ”> BeautifulSoup webscraping find_all():找到完全匹配 - BeautifulSoup webscraping find_all( ): finding exact match BeautifulSoup webscraping find_all():自定义 function 不工作 - BeautifulSoup webscraping find_all( ): custom function not working BeautifulSoup webscraping find_all( ): 排除元素附加为最后一个元素 - BeautifulSoup webscraping find_all( ): excluded element appended as last element Webscraping - Beautifulsoup4 - 在 find_all 循环中访问索引项 - Webscraping - Beautifulsoup4 - Accessing indexed item in a find_all loop 忽略一个 div class in BeautifulSoup find_all in Python 3 - Ignore one div class in BeautifulSoup find_all in Python 3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM