[英]Remove newlines in beautiful soup
In BeautifulSoup, I have the following: 在BeautifulSoup中,我具有以下内容:
>>> tr = soup.find_all('tr')[1]
<tr>
<td>Adaptive Systems Seminar (HOC+WPO)</td>
<td>wo</td>
<td>13:00</td>
<td>17:00</td>
<td>4:00</td>
<td>22-29, 32-36</td>
<td>MANDERICK BERNARD</td>
<td> </td>
</tr>
However, I'm just interested in the text. 但是,我只是对文本感兴趣。 So I do
所以我做
>>> tr(text=True)
[u'\n', u'Adaptive Systems Seminar (HOC+WPO)', u'\n', u'wo', u'\n', u'13:00', u'\n', u'17:00', u'\n', u'4:00', u'\n', u'22-29, 32-36', u'\n', u'MANDERICK BERNARD', u'\n', u'\xa0', u'\n']
I'd like to get the list above, but without all the newlines . 我想要上面的列表,但是没有所有的换行符 。 I've read the documentation but I can't find anything about it.
我已经阅读了文档,但找不到任何相关信息。
One option would be to find all td
elements inside and use get_text()
: 一种选择是找到其中的所有
td
元素并使用get_text()
:
In [4]: [td.get_text(strip=True) for td in soup.select("tr > td")]
Out[4]:
[u'Adaptive Systems Seminar (HOC+WPO)',
u'wo',
u'13:00',
u'17:00',
u'4:00',
u'22-29, 32-36',
u'MANDERICK BERNARD',
u'']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.