繁体   English   中英

尝试使用 beautifulsoup 刮取产品细节,如品牌和风味

[英]trying to scrape products details like brand and flavour using beautifulsoup

任何人都可以帮助我使用 beautifulsoup 将 Flavor 和品牌详细信息作为键值对进行抓取。 我是新来的:

所需的 output 将是

口味 - 青苹果

品牌 - Carabau

html 看起来像这样: Html 代码 -

<tr class="a-spacing-small">
<td class="a-span3">
    <span class="a-size-base a-text-bold">Flavour</span>
</td>

<td class="a-span9">
    <span class="a-size-base">Green Apple</span>
</td>
<tr class="a-spacing-small">
<td class="a-span3">
    <span class="a-size-base a-text-bold">Brand</span>
</td>

<td class="a-span9">
    <span class="a-size-base">Carabau</span>
</td>
from bs4 import BeautifulSoup

html = '''
    <tr class="a-spacing-small">
    <td class="a-span3">
        <span class="a-size-base a-text-bold">Flavour</span>
    </td>
    
    <td class="a-span9">
        <span class="a-size-base">Green Apple</span>
    </td>
    <tr class="a-spacing-small">
    <td class="a-span3">
        <span class="a-size-base a-text-bold">Brand</span>
    </td>
    
    <td class="a-span9">
        <span class="a-size-base">Carabau</span>
    </td>
    '''

soup = BeautifulSoup(html,'html.parser')
first_element = soup.find_all('td', {'class': 'a-span3'})
second_element = soup.find_all('td', {'class': 'a-span9'})

for first_attribute,second_attribute in zip(first_element,second_element):
    print("{} - {}".format(first_attribute.text.strip(),second_attribute.text.strip()))

Can be done using BeautifulSoup, this will get you the desired output, if you are reading HTML from a URL, you would need to apply some changes by replacing the HTML with fetched content raw content.

你可以这样做。

  • Select 表行<tr>使用.find_all() 这将为您提供<tr>标签列表。
  • 对于每个<tr> ,获取它的文本并以您需要的方式打印它们。

这是完整的代码:

from bs4 import BeautifulSoup

s = """
<tr class="a-spacing-small">
<td class="a-span3">
    <span class="a-size-base a-text-bold">Flavour</span>
</td>

<td class="a-span9">
    <span class="a-size-base">Green Apple</span>
</td>
<tr class="a-spacing-small">
<td class="a-span3">
    <span class="a-size-base a-text-bold">Brand</span>
</td>

<td class="a-span9">
    <span class="a-size-base">Carabau</span>
</td>
"""
soup = BeautifulSoup(s, 'lxml')
for tr in soup.find_all('tr'):
    print(' - '.join(list(tr.stripped_strings)))

Output:

Flavour - Green Apple
Brand - Carabau

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM