简体   繁体   English

使用 BeautifulSoup 标记后无法立即获取文本

[英]Can't get text immediately after </span> tag using BeautifulSoup

Currently, in my code I break down a larger soup to get all the 'td' tags with this code:目前,在我的代码中,我分解了一个更大的汤以使用此代码获取所有“td”标签:

floorplans_all = sub_soup.findAll('td', {"data-label":"Rent"})
floorplan_soup = soup(floorplans_all[0].prettify(), "html.parser")
rent_span = floorplan_soup.findAll('span', {"class":"sr-only"})

print(floorplans_all)

and end up with the following:最终得到以下结果:

<td data-label="Rent" data-selenium-id="Rent_6">
    <span class="sr-only">
      Monthly Rent
     </span>
     $2,335 -
     <span class="sr-only">
      to
     </span>
     $5,269
    </td>

Printing rent_span looks like this:打印rent_span 如下所示:

  [<span class="sr-only">
  Monthly Rent
 </span>, <span class="sr-only">
  to
 </span>]

I can't seem to get "$2,335 -" and "$5,269" from above.我似乎无法从上面得到“$2,335 -”和“$5,269”。 I have been trying to walk down the HTML tree, but I'm not able to get the text between the tags.我一直试图走下 HTML 树,但我无法获取标签之间的文本。

The td element has five children: td元素有五个子元素:

  • A text node containing only whitespace仅包含空格的文本节点
  • A span node containing “Monthly Rent”包含“月租”的span节点
  • A text node containing “$2,335 -”包含“$2,335 -”的文本节点
  • A span node containing “to”包含“to”的span节点
  • A text node containing “$5,269”包含“$5,269”的文本节点

You can iterate those children by using the children attribute:您可以使用children属性迭代这些孩子:

soup = BeautifulSoup(text, 'html.parser')

for child in soup.td.children:
    print(repr(child))
'\n'
<span class="sr-only">
      Monthly Rent
     </span>
'\n     $2,335 -\n     '
<span class="sr-only">
      to
     </span>
'\n     $5,269\n    '

If you want to explicitly look for the text nodes, you could search for the span nodes and get the next sibling each time:如果要显式查找文本节点,则可以搜索span节点并每次获取下一个兄弟节点:

>>> [span.next_sibling.string.strip() for span in soup.td.find_all(class_='sr-only')]
['$2,335 -', '$5,269']
soup = BeautifulSoup(res, 'html.parser')

row = soup.find('td', {'data-label': "Rent"})
for all in row.find_all('span'):
    print(all.text.strip())

an output will be look like this output 看起来像这样

Monthly Rent
$2,335
 $5,269

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM