findAll函数BeautifulSoup

Question

I have been trying to parse text elements stored in between <td> tags, for example: 我一直试图解析存储在<td>标签之间的文本元素，例如：

<tr>
<td>Trading Hours</td>
<td><b>Monday</b> <br />
London - 23:00 Sunday - 23:00 Monday<br />
New York - 18:00 Sunday - 18:00 Monday<br />
Chicago - 17:00 Sunday - 17:00 Monday<br />
<br />
<b>Tuesday-Friday</b> <br />
London - 01:00 - 23:00<br />
New York - 20:00 - 18:00<br />
Chicago - 19:00 - 17:00<br />
</td>
</tr>

In this simple example, there only 2 <td> tags and suppose a variable tr stores entire block of html code. 在这个简单的示例中，只有2个<td>标记，并且假设变量tr存储了整个html代码块。 My logic for extracting text is as follow (without any <tr> or <br> tags): 我提取文本的逻辑如下（没有任何<tr>或<br>标记）：

for td in tr.findAll('td'):
    row.append((td.find('td', text = True)).strip().strip('\n'))

Problem: My for loop recognizes the first <td> tag, but not the second. 问题：我的for循环可识别第一个<td>标记，但不能识别第二个。 How can I improve this? 我该如何改善？

Answer 1

text=True tells BeautifulSoup to look for elements with text. text=True告诉BeautifulSoup寻找带有文本的元素。 If you want to get the text, you need to use .get_text() : 如果要获取文本，则需要使用.get_text() ：

td.find('td', text=True).get_text(strip=True)

findAll函数BeautifulSoup

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-06-16 19:11:43

findAll函数BeautifulSoup

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-06-16 19:11:43

解决方案1
1 已采纳 2013-06-16 19:11:43