简体   繁体   English

使用python和漂亮的汤从HTML获取结构化数据

[英]Get Structured Data from HTML using python and beautiful soup

I an new to python. 我是python的新手。 I want to get the result of the code as below: 我想得到如下代码的结果:

Score      Postive        Negative
  5         good            bad
  7       interesting
  3                       horrible

But my code output nothing.Please where is the problem? 但是我的代码什么也没输出,请问问题出在哪里?

from bs4 import BeautifulSoup
text = """
... <body>
        <div class="review">
        <p class="pos">good</p>
        <p class="neg">bad</p>
    </div>
    <div class="review">
        <p class="pos">interesting</p>
    </div>
    <div class="review">
        <p class="neg">horrible</p>
    </div>
... </body>"""
soup = BeautifulSoup(text)
for parent in soup.find_all('div', attrs={'class': 'review'}):   
if parent.findNextSiblings('p', attrs={'class': 'pos'}):
    postive.append(parent.get_text())
else:
    postive.append("")
if parent.findNextSiblings('p', attrs={'class': 'neg'}): 
    negtive.append(parent.get_text())
else:
    negtive.append("")

p tags are not siblings of the div tag with class review , they are children: p标签不是带有class reviewdiv标签的兄弟姐妹,它们是孩子:

positive = []
negative = []
for div in soup.find_all('div', attrs={'class': 'review'}):
    pos = div.find('p', {'class': 'pos'})
    positive.append(pos.get_text() if pos else '')

    neg = div.find('p', {'class': 'neg'})
    negative.append(neg.get_text() if neg else '')

print positive
print negative

Prints: 打印:

[u'good', u'interesting', ''] 
[u'bad', '', u'horrible']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM