美麗的湯：從HTML獲取文本數據

Question

這是我的html代碼，現在我想要使用美麗的湯從以下html代碼中提取數據

<tr class="tr-option">
<td class="td-option"><a href="">A.</a></td>
<td class="td-option">120 m</td>
<td class="td-option"><a href="">B.</a></td>
<td class="td-option">240 m</td>
<td class="td-option"><a href="">C.</a></td>
<td class="td-option" >300 m</td>
<td class="td-option"><a href="">D.</a></td>
<td class="td-option" >None of these</td>
</tr>

這是我美麗的湯碼

soup = BeautifulSoup(html_doc)
for option in soup.find_all('td', attrs={'class':"td-option"}):
    print option.text

以上代碼的輸出：

A.
120 m
B.
240 m
C.
300 m
D.
None of these

但我想要跟隨輸出

A.120 m
B.240 m
C.300 m
D.None of these

我該怎么辦？

Answer 1

由於find_all返回一個選項列表，您可以使用列表find_all來獲得您期望的答案

>>> a_list = [ option.text for option in soup.find_all('td', attrs={'class':"td-option"}) ]
>>> new_list = [ a_list[i] + a_list[i+1] for i in range(0,len(a_list),2) ]
>>> for option in new_list:
...     print option
... 
A.120 m
B.240 m
C.300 m
D.None of these

它能做什么？

[ a_list[i] + a_list[i+1] for i in range(0,len(a_list),2) ]從a_list獲取相鄰元素並附加它們。

Answer 2

soup = BeautifulSoup(html_doc) 
options = soup.find_all('td', attrs={'class': "td-option"}) 
texts = [o.text for o in options] 
lines = [] 
# Add every two-element pair as a concatenated item
for a, b in zip(texts[0::2], texts[1::2]): 
    lines.append(a + b)
for l in lines:
    print(l)

給

A.120 m
B.240 m
C.300 m
D.None of these

美麗的湯：從HTML獲取文本數據

問題描述

2 個解決方案

解決方案1
2 已采納 2015-06-04 07:14:22

解決方案2
0 2015-06-04 07:14:49

美麗的湯：從HTML獲取文本數據

問題描述

2 個解決方案

解決方案1 2 已采納 2015-06-04 07:14:22

解決方案2 0 2015-06-04 07:14:49

解決方案1
2 已采納 2015-06-04 07:14:22

解決方案2
0 2015-06-04 07:14:49