[英]lambda in python and beautiful soup is not retrieving text
I am trying to scrape 1520-00-087-7637 from this html 我正在尝试从此html刮取1520-00-087-7637
<tr>
<td class="text-center" style="width: 10%">
<img class="img-thumbnail" src="/Files/image/placeholder100.png" style="width: 100px">
</td>
<td class="text-center" nowrap="" style="vertical-align: middle; width: 10%"><a href="/NSN/1520-00-087-7637">1520-00-087-7637</a></td>
<td class="text-center" nowrap="" style="vertical-align: middle; width: 10%"><a href="/PartNumber/UH1H">UH1H</a></td>
<td class="text-center" nowrap="" style="vertical-align: middle; width: 10%"><a href="/CAGE/97499">97499</a></td>
<td class="text-center" style="vertical-align: middle; width: 10%"><a href="/CAGE/97499"><img class="img-thumbnail" src="/Files/cage/90/97499.jpg" title="CAGE 97499" alt="CAGE 97499"></a></td>
<td nowrap="" style="vertical-align: middle">
<h4>  BOSS, MAN</h4>
<p>
<em>    Alternate References: <a href="/NSN/1520-00-087-7637">1520-00-087-7637</a>, <a href="/NSN/1520-00-087-7637">000877637</a></em>
</p>
</td>
so I try to use this to get 1520-00-087-7637 but all i get from the output is none. 所以我尝试用它来获得1520-00-087-7637,但我从输出中得到的全是无。
page_soup1 = soup(page_html1, "html.parser")
tablecontainer = page_soup1.find_all("tr")
for container in tablecontainer:
Z = container1.find('a', {'href': lambda x : x.startswith('/NSN/')})
print(Z)
what am I doing wrong and how can I fix this 我在做什么错,我该如何解决
I tried print(Z.get_text()) and Z.text none of them seems to be working. 我试过print(Z.get_text())和Z.text似乎都不起作用。 how can i get the text value? 我如何获得文本值?
Here it is. 这里是。 Let me know if you have any issues with it. 让我知道您是否有任何问题。 It seems that I need to walk you through. 看来我需要引导您。
from lxml.html import fromstring
tree = fromstring(html)
for item in tree.cssselect(".text-center+td h4"):
print(item.text_content())
Result: 结果:
BOSS, MAN
And to get the data from a container: 并从容器中获取数据:
html='''
<tr>
<td class="text-center" style="width: 10%">
<img class="img-thumbnail" src="/Files/image/placeholder100.png" style="width: 100px">
</td>
<td class="text-center" nowrap="" style="vertical-align: middle; width: 10%"><a href="/NSN/1520-00-087-7637">1520-00-087-7637</a></td>
<td class="text-center" nowrap="" style="vertical-align: middle; width: 10%"><a href="/PartNumber/UH1H">UH1H</a></td>
<td class="text-center" nowrap="" style="vertical-align: middle; width: 10%"><a href="/CAGE/97499">97499</a></td>
<td class="text-center" style="vertical-align: middle; width: 10%"><a href="/CAGE/97499"><img class="img-thumbnail" src="/Files/cage/90/97499.jpg" title="CAGE 97499" alt="CAGE 97499"></a></td>
<td nowrap="" style="vertical-align: middle">
<h4>  BOSS, MAN</h4>
<p>
<em>    Alternate References: <a href="/NSN/1520-00-087-7637">1520-00-087-7637</a>, <a href="/NSN/1520-00-087-7637">000877637</a></em>
</p>
</td>
</tr>
'''
from lxml.html import fromstring
tree = fromstring(html)
for item in tree.cssselect("tr"):
number = item.cssselect(".text-center a[href^='/NSN/']")[0].text
name = item.cssselect(".text-center+td h4")[0].text_content()
print(number, name)
Result: 结果:
1520-00-087-7637 BOSS, MAN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.