簡體   English   中英

如何從BeautifulSoup請求中提取兩個項目

[英]How can I abstract two items out of a BeautifulSoup requests

我用beautifulsoup,Python抓取了一些數據。

我如何抽象接下來的兩件事:

  1. “數據ASIN =”
  2. “數據索引=”

我想擁有值:

  1. data-asin = B07F7XYMNN
  2. 數據索引= “ 1”

最好是將其存儲在Excel文件中:

DataAsin         DataIndex
B07F7XYMNN         1

我嘗試了兩件事:

soup.select_one( '數據ASIN =')

soup.select_all( '數據ASIN =')

但是沒有結果...任何提示? 謝謝!

<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 s-result-item sg-col-4-of-28 sg-col-4-of-16 AdHolder sg-col sg-col-4-of-20 sg-col-4-of-32" data-asin="B07F7XYMNN" data-index="1"><div class="sg-col-inner">
<div class="rush-component s-expand-height" data-component-props='{"percentageShownToFire":"50","batchable":true,"requiredElementSelector":".s-image","url":"https://www.amazon.com/gp/sponsored-products/logging/log-action.html?qualifier=1563190090&amp;id=7171083254752902&amp;widgetName=sp_atf&amp;adId=200011353751711&amp;eventType=1&amp;adIndex=1"}' data-component-type="s-impression-logger">
<div class="rush-component s-expand-height" data-component-type="sp-sponsored-result">
<div class="s-expand-height s-include-content-margin s-border-bottom">
<div class="a-section a-spacing-medium">
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-micro s-min-height-extra-large">
</div>
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none">
<span class="rush-component" data-component-type="s-product-image">
<a class="a-link-normal" href="/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_beauty_sr_pg1_2?ie=UTF8&amp;adId=A0414635KJLKI9OVJ7G1&amp;url=%2FBraun-Electric-Integrated-Precision-Rechargeable%2Fdp%2FB07F7XYMNN%2Fref%3Dsr_1_2_sspa%3Fkeywords%3Dshaver%2Bfor%2Bmen%26qid%3D1563190090%26s%3Dbeauty%26smid%3DATVPDKIKX0DER%26sr%3D1-2-spons%26psc%3D1&amp;qualifier=1563190090&amp;id=7171083254752902&amp;widgetName=sp_atf">
<div class="a-section aok-relative s-image-square-aspect">
<img alt="Braun Series 9 Men's Electric Foil Shaver with Wet &amp; Dry Integrated Precision Trimmer &amp; Rechargeable and Cordless Razor with Clean&amp;Charge Station, 9296cc" class="s-image" data-image-index="1" data-image-latency="s-product-image" data-image-load="" data-image-source-density="1" src="https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL320_.jpg" srcset="https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL320_.jpg 1x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL480_QL65_.jpg 1.5x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL640_QL65_.jpg 2x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL800_QL65_.jpg 2.5x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL960_QL65_.jpg 3x"/>
</div>
</a>
</span>
</div>
</div></div>
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-spacing-micro"><span class="a-size-base a-color-secondary">Sponsored</span>
<span class="a-declarative" data-a-popover='{"dataStrategy":"preload","name":"sp-info-popover-B07F7XYMNN","position":"triggerVertical"}' data-action="a-popover">
<span class="aok-inline-block s-info-icon"></span>
</span>
<div class="a-popover-preload" id="a-popover-sp-info-popover-B07F7XYMNN">
<span>These are ads for products you'll find on Amazon.com. </span><div class="a-row"><span>Clicking an ad will take you to the product's page.</span>
<a class="a-link-normal" href="https://advertising.amazon.com/products-self-serve?ref_=ext_amzn_wtsp">
<span>Learn more about Sponsored Products.</span>
</a>
</div><div class="a-row a-spacing-top-small"><span></span>
<span class="a-declarative" data-a-modal='{"dataStrategy":"ajax","header":"Share your feedback","url":"/gp/sponsored-products/lazyLoad/handler/sp-feedback-handler.html?pl=lGYCqimspuI8VXecjLGWqJk0mp95lFL311Xf4ldbtnjk4nK0QUzfwAqSEdCqCBIFmfkIRt5i4ILO%0AL0I8KDFWHyp096Sld8%2F8Yici8bhUWoJV1WGxn9Vw907Psjpbk53yxpBuxPEYEeiiZpWJgc5NRdpK%0A96HXgZ46neOYiu28ag4ngild3Pi1Qtn8bcRJpMGFUWob%2FCbbPS1t8WJPCIbM%2FatJPGZjxWdOf1dS%0Am8zquvCigwoMEBGZ31SH2vlM128dYXSovT3tsUG4xeBUC8IGABlDY%2BTxWiYifqhtQS%2BY6AlrUqPF%0A0DkeMkBfmwPimWrSKbwhPY12HAdt26hWwfYKHOJSYk1TSOPdtTMZoq2weXQc%2FxSeT3mSEKQVK8MY%0Ap1SSHP5aKBNsCXVTQDRhm%2BHq8tJbsu60RaVnaCKREQgvEOwb5fiXgJUEnKb004pEb4nO7xylj5Ph%0AoLwZVagU1dnXWjjuawVRhXXPHG5jacuJbtuahR6wptTJSVTa57Gsuf5ZrZ8q35tClQ0qTa%2FE4PqJ%0ASlnKYE%2FQ2RQhE74DBGUrYo2ACenKDiCDnpOeM7amUlAuFcxgbwThYDzl1S4R3RZyUA%3D%3D"}' data-action="a-modal">
<a class="a-link-normal" href="#">

使用以下CSS選擇器。

from bs4 import BeautifulSoup

data='''<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 s-result-item sg-col-4-of-28 sg-col-4-of-16 AdHolder sg-col sg-col-4-of-20 sg-col-4-of-32" data-asin="B07F7XYMNN" data-index="1"><div class="sg-col-inner">
<div class="rush-component s-expand-height" data-component-props='{"percentageShownToFire":"50","batchable":true,"requiredElementSelector":".s-image","url":"https://www.amazon.com/gp/sponsored-products/logging/log-action.html?qualifier=1563190090&amp;id=7171083254752902&amp;widgetName=sp_atf&amp;adId=200011353751711&amp;eventType=1&amp;adIndex=1"}' data-component-type="s-impression-logger">
<div class="rush-component s-expand-height" data-component-type="sp-sponsored-result">
<div class="s-expand-height s-include-content-margin s-border-bottom">
<div class="a-section a-spacing-medium">
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-micro s-min-height-extra-large">
</div>
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none">
<span class="rush-component" data-component-type="s-product-image">
<a class="a-link-normal" href="/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_beauty_sr_pg1_2?ie=UTF8&amp;adId=A0414635KJLKI9OVJ7G1&amp;url=%2FBraun-Electric-Integrated-Precision-Rechargeable%2Fdp%2FB07F7XYMNN%2Fref%3Dsr_1_2_sspa%3Fkeywords%3Dshaver%2Bfor%2Bmen%26qid%3D1563190090%26s%3Dbeauty%26smid%3DATVPDKIKX0DER%26sr%3D1-2-spons%26psc%3D1&amp;qualifier=1563190090&amp;id=7171083254752902&amp;widgetName=sp_atf">
<div class="a-section aok-relative s-image-square-aspect">
<img alt="Braun Series 9 Men's Electric Foil Shaver with Wet &amp; Dry Integrated Precision Trimmer &amp; Rechargeable and Cordless Razor with Clean&amp;Charge Station, 9296cc" class="s-image" data-image-index="1" data-image-latency="s-product-image" data-image-load="" data-image-source-density="1" src="https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL320_.jpg" srcset="https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL320_.jpg 1x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL480_QL65_.jpg 1.5x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL640_QL65_.jpg 2x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL800_QL65_.jpg 2.5x, https://m.media-amazon.com/images/I/81Y8IzpMY2L._AC_UL960_QL65_.jpg 3x"/>
</div>
</a>
</span>
</div>
</div></div>
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-spacing-micro"><span class="a-size-base a-color-secondary">Sponsored</span>
<span class="a-declarative" data-a-popover='{"dataStrategy":"preload","name":"sp-info-popover-B07F7XYMNN","position":"triggerVertical"}' data-action="a-popover">
<span class="aok-inline-block s-info-icon"></span>
</span>
<div class="a-popover-preload" id="a-popover-sp-info-popover-B07F7XYMNN">
<span>These are ads for products you'll find on Amazon.com. </span><div class="a-row"><span>Clicking an ad will take you to the product's page.</span>
<a class="a-link-normal" href="https://advertising.amazon.com/products-self-serve?ref_=ext_amzn_wtsp">
<span>Learn more about Sponsored Products.</span>
</a>
</div><div class="a-row a-spacing-top-small"><span></span>
<span class="a-declarative" data-a-modal='{"dataStrategy":"ajax","header":"Share your feedback","url":"/gp/sponsored-products/lazyLoad/handler/sp-feedback-handler.html?pl=lGYCqimspuI8VXecjLGWqJk0mp95lFL311Xf4ldbtnjk4nK0QUzfwAqSEdCqCBIFmfkIRt5i4ILO%0AL0I8KDFWHyp096Sld8%2F8Yici8bhUWoJV1WGxn9Vw907Psjpbk53yxpBuxPEYEeiiZpWJgc5NRdpK%0A96HXgZ46neOYiu28ag4ngild3Pi1Qtn8bcRJpMGFUWob%2FCbbPS1t8WJPCIbM%2FatJPGZjxWdOf1dS%0Am8zquvCigwoMEBGZ31SH2vlM128dYXSovT3tsUG4xeBUC8IGABlDY%2BTxWiYifqhtQS%2BY6AlrUqPF%0A0DkeMkBfmwPimWrSKbwhPY12HAdt26hWwfYKHOJSYk1TSOPdtTMZoq2weXQc%2FxSeT3mSEKQVK8MY%0Ap1SSHP5aKBNsCXVTQDRhm%2BHq8tJbsu60RaVnaCKREQgvEOwb5fiXgJUEnKb004pEb4nO7xylj5Ph%0AoLwZVagU1dnXWjjuawVRhXXPHG5jacuJbtuahR6wptTJSVTa57Gsuf5ZrZ8q35tClQ0qTa%2FE4PqJ%0ASlnKYE%2FQ2RQhE74DBGUrYo2ACenKDiCDnpOeM7amUlAuFcxgbwThYDzl1S4R3RZyUA%3D%3D"}' data-action="a-modal">
<a class="a-link-normal" href="#">'''

soup=BeautifulSoup(data,'html.parser')

items=soup.select('[data-asin][data-index]')

for item in items:
    print(item['data-asin'])
    print(item['data-index'])

輸出:

B07F7XYMNN
1

或者您可以使用。

print(soup.select_one('[data-asin][data-index]')['data-asin'])

print(soup.select_one('[data-asin][data-index]')['data-index'])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM