简体   繁体   English

Xpath如何获取标签中的所有文本

[英]Xpath how to get all text in the tag

I have this html code: 我有这个html代码:

<div id="m0" style="visibility:visible; display:block;">
 <table class="fl">
  <tr bgcolor="white"><td class="v px3"></td>
   <td class="ch">
     <a title="Id: NetViet" class="A3">NetViet</a></td>
   </tr>

<div id="m1" style="visibility:visible; display:block;">
 <table class="fl">
  <td class="ch">
   <A class="A3" title="Id: Kino Polska Muzyka" HREF="http://www.kinopolskamuzyka.pl/" TARGET="_blank">Kino Polska Muzyka</A>
 </tr>
  <td class="ch">
   <i>HBO3 HD</i></td>
 </tr>
  <td class="ch"> Faktura</td>
 </tr>

My xpath is : tree.xpath('//div[@id="%s"]/table[@class= "fl"]/tr/td[@class="ch"]/a/text()'%div) 我的xpath是: tree.xpath('//div[@id="%s"]/table[@class= "fl"]/tr/td[@class="ch"]/a/text()'%div)

but it does not give me all the channels. 但这并不能给我所有渠道。 I want to get all text in <td class="ch"> , the result that i want is: 我想获取<td class="ch">所有文本,我想要的结果是:

[['NetViet'],['Kino Polska Muzyka','HB03','Faktura']]

Any idea? 任何想法? Thanks in advance. 提前致谢。

除了弄乱了html结构之外,还要从xpath中删除“ tr”和“ a”节点,因为并非每个“ td”都被它们包围。

Why not use css selectors to target td tag elements with that class? 为什么不使用CSS选择器将带有该类的td标签元素作为目标? For this type of selection is it likely faster than xpath. 对于这种类型的选择,它可能比xpath更快。

from bs4 import BeautifulSoup as bs

html = '''
<div id="m0" style="visibility:visible; display:block;">
 <table class="fl">
  <tr bgcolor="white"><td class="v px3"></td>
   <td class="ch">
     <a title="Id: NetViet" class="A3">NetViet</a></td>
   </tr>

<div id="m1" style="visibility:visible; display:block;">
 <table class="fl">
  <td class="ch">
   <A class="A3" title="Id: Kino Polska Muzyka" HREF="http://www.kinopolskamuzyka.pl/" TARGET="_blank">Kino Polska Muzyka</A>
 </tr>
  <td class="ch">
   <i>HBO3 HD</i></td>
 </tr>
  <td class="ch"> Faktura</td>
 </tr>
 '''

soup = bs(html, 'lxml')
items = [item.text.strip() for item in soup.select('td.ch')]
print(items)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM