简体   繁体   中英

xpath to get lists of element in Python

I am trying to scrape lists of elements from a page that looks like this:

<div class="container">
    <b>1</b>
    <b>2</b>
    <b>3</b>
</div>
<div class="container">
    <b>4</b>
    <b>5</b>
    <b>6</b>
</div>

I would like to get lists or tuples using xpath: [1,2,3],[4,5,6]...

Using for loop on the page I get either the first element of each list or all numbers as one list.

Could you please help me to solve the exercise? Thank you in advance for any help!

For web-scraping of static pages bs4 is best package to work with. and using bs4 you can achieve your goal as easy as below:

from bs4 import BeautifulSoup
source = """<div class="container">
    <b>1</b>
    <b>2</b>
    <b>3</b>
</div>
<div class="container">
    <b>4</b>
    <b>5</b>
    <b>6</b>
</div>"""
soup = BeautifulSoup(source, 'html.parser')  # parse content/ page source
soup.find_all('div', {'class': 'container'})  # find all the div element (second argument is optional mentioned to scrape/find only element with attribute value)
print([[int(x.text) for x in i.find_all('b')] for i in soup.find_all('div', {'class': 'container'})])  # get list of all div's number list as you require

Output:

[[1, 2, 3], [4, 5, 6]]

you could use this xpath expression, which will give you two strings

.//*[@class='container']    ➡ '1 2 3', '4 5 6'

if you would prefer 6 strings

.//*[@class='container']/b  ➡ '1','2','3','4','5','6'

to get exactly what you are looking for though you would have to separate the xpath expressions

.//*[@class='container'][1]/b  ➡ '1','2','3'
.//*[@class='container'][2]/b  ➡ '4','5','6'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM