简体   繁体   English

xpath获取Python中的元素列表

[英]xpath to get lists of element in Python

I am trying to scrape lists of elements from a page that looks like this: 我正在尝试从看起来像这样的页面中抓取元素列表:

<div class="container">
    <b>1</b>
    <b>2</b>
    <b>3</b>
</div>
<div class="container">
    <b>4</b>
    <b>5</b>
    <b>6</b>
</div>

I would like to get lists or tuples using xpath: [1,2,3],[4,5,6]... 我想使用xpath获取列表或元组:[1,2,3],[4,5,6] ...

Using for loop on the page I get either the first element of each list or all numbers as one list. 在页面上使用for循环,我得到每个列表的第一个元素或所有数字作为一个列表。

Could you please help me to solve the exercise? 你能帮我解决这个问题吗? Thank you in advance for any help! 预先感谢您的任何帮助!

For web-scraping of static pages bs4 is best package to work with. 对于静态页面的Web抓取,最好使用bs4软件包。 and using bs4 you can achieve your goal as easy as below: 并使用bs4可以轻松实现目标:

from bs4 import BeautifulSoup
source = """<div class="container">
    <b>1</b>
    <b>2</b>
    <b>3</b>
</div>
<div class="container">
    <b>4</b>
    <b>5</b>
    <b>6</b>
</div>"""
soup = BeautifulSoup(source, 'html.parser')  # parse content/ page source
soup.find_all('div', {'class': 'container'})  # find all the div element (second argument is optional mentioned to scrape/find only element with attribute value)
print([[int(x.text) for x in i.find_all('b')] for i in soup.find_all('div', {'class': 'container'})])  # get list of all div's number list as you require

Output: 输出:

[[1, 2, 3], [4, 5, 6]]

you could use this xpath expression, which will give you two strings 您可以使用此xpath表达式,这将给您两个字符串

.//*[@class='container']    ➡ '1 2 3', '4 5 6'

if you would prefer 6 strings 如果您希望使用6弦

.//*[@class='container']/b  ➡ '1','2','3','4','5','6'

to get exactly what you are looking for though you would have to separate the xpath expressions 尽管您必须分离xpath表达式才能获得所需的确切信息

.//*[@class='container'][1]/b  ➡ '1','2','3'
.//*[@class='container'][2]/b  ➡ '4','5','6'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM