Beautifulsoup在表中获得价值

Question

我正在尝试抓取http://www.co.jefferson.co.us/ats/displaygeneral.do?sch=000104并获取“所有者名称”，我所拥有的作品却很丑陋，而不是最好的我确定，所以我正在寻找更好的方法。 这是我所拥有的：

soup = BeautifulSoup(url_opener.open(url))            
x = soup('table', text = re.compile("Owner Name"))
print 'And the owner is', x[0].parent.parent.parent.tr.nextSibling.nextSibling.next.next.next

相关的HTML是

<td valign="top">
    <table border="1" cellpadding="1" cellspacing="0" align="right">
    <tbody><tr class="tableheaders">
    <td>Owner Name(s)</td>
    </tr>

    <tr>

    <td>PILCHER DONALD L                         </td>
    </tr>

    </tbody></table>
</td>

哇，关于beautifulsoup的问题很多，我浏览了一下，但没有找到对我有帮助的答案，希望这不是重复的问题

Answer 1

（编辑：显然，OP发布的HTML位于-实际上，没有tbody标签可以查找，即使他明确指出要包含在该HTML中也是如此。因此，改用table代替tbody ）。

由于可能需要多个表行（例如，查看给定的表行的兄弟URL，最后一个数字4更改为5），因此建议进行如下循环：

# locate the table containing a cell with the given text
owner = re.compile('Owner Name')
cell = soup.find(text=owner).parent
while cell.name != 'table': cell = cell.parent
# print all non-empty strings in the table (except for the given text)
for x in cell.findAll(text=lambda x: x.strip() and not owner.match(x)):
  print x

这对于页面结构的细微变化是相当健壮的：找到了感兴趣的单元格之后，它将循环其父级直到找到table标记，然后遍历该表中所有非空（或只是空白）的可导航字符串， owner标头。

Answer 2

这是Beautifulaup讨论小组的Aaron DeVore的回答，对我来说很好。

soup = BeautifulSoup(...)
label = soup.find(text="Owner Name(s)")

需要Tag.string才能到达实际的名称字符串

name = label.findNext('td').string

如果您要进行大量操作，甚至可以进行列表理解。

names = [unicode(label.findNext('td').string) for label in
soup.findAll(text="Owner Name(s)")]

Answer 3

这是一个微小的改进，但是我不知道该如何摆脱这三个父母。

x[0].parent.parent.parent.findAll('td')[1].string

Beautifulsoup在表中获得价值

问题描述

3 个解决方案

解决方案1
5 已采纳 2009-11-30 00:36:16

解决方案2
3 2009-11-30 20:23:16

解决方案3
1 2009-11-30 00:08:25

Beautifulsoup在表中获得价值

问题描述

3 个解决方案

解决方案1 5 已采纳 2009-11-30 00:36:16

解决方案2 3 2009-11-30 20:23:16

解决方案3 1 2009-11-30 00:08:25

解决方案1
5 已采纳 2009-11-30 00:36:16

解决方案2
3 2009-11-30 20:23:16

解决方案3
1 2009-11-30 00:08:25