美丽的汤没有回报

Question

I have the following html code and i use beautiful soup to extract information. 我有以下html代码，我用漂亮的汤来提取信息。 I want to get for example Relationship status: Relationship 例如，我想获取关系状态：关系

<table class="box-content-list" cellspacing="0">
            <tbody>
             <tr class="first">
              <td>
                   <strong>
                    Relationship status:
                   </strong>
               Relationship
              </td>
             </tr>
             <tr class="alt">
              <td>
               <strong>
                Living:
              </strong>
               With partner
              </td>
             </tr>

I have created the following code: 我创建了以下代码：

xs = [x for x in soup.findAll('table', attrs = {'class':'box-content-list'})]       
    for x in xs:
        #print x
        sx = [s for s in x.findAll('tr',attrs={'class':'first'})]
        for s in sx:
            td_tabs = [td for td in s.findAll('td')]
            for td in td_tabs:
                title = td.findNext('strong')
                #print str(td)
                status = td.findNextSibling()
                print title.string
                print status

but the result i get is Relations status: and the print status is printing None. 但是我得到的结果是“关系”状态：并且打印状态为“打印无”。 What i am doing wrong? 我做错了什么？

Answer 1

There is a special method get_text (or getText in old BeautifulSoup versions) to get the content of intricated tags. 有一个特殊的方法get_text （或旧的BeautifulSoup版本中的getText ）来获取复杂标签的内容。 With your example: 以您的示例为例：

>>> example.td.get_text(' ', strip=True)
'Relationship status: Relationship'

The first parameter is the separator to use. 第一个参数是要使用的分隔符。

Answer 2

First of all, there is no need for all the list comprehensions; 首先，不需要所有列表理解。 yours do nothing but copy the results, you can safely do without them. 您除了复制结果外什么也不做，可以放心使用。

There is no next sibling in your column (there is only one <td> tag), so it returns None . 您的列中没有下一个兄弟姐妹（只有一个 <td>标签），因此它返回None 。 You wanted to get the .next attribute from the title (the <strong> tag) instead: 您想从标题（ <strong>标记）获取.next属性：

for table in soup.findAll('table', attrs = {'class':'box-content-list'}):
    for row in table.findAll('tr',attrs={'class':'first'}):
        for col in row.findAll('td'):
            title = col.strong
            status = title.nextSibling
            print title.text.strip(), status.strip()

which prints: 打印：

Relationship status: Relationship

for your example. 举个例子

美丽的汤没有回报

问题描述

2 个解决方案

解决方案1
3 2013-04-12 10:37:52

解决方案2
1 2013-04-12 10:14:22

美丽的汤没有回报

问题描述

2 个解决方案

解决方案1 3 2013-04-12 10:37:52

解决方案2 1 2013-04-12 10:14:22

解决方案1
3 2013-04-12 10:37:52

解决方案2
1 2013-04-12 10:14:22