简体   繁体   中英

selenium, xpath: How to select a node within node?

I have a webpage that have a structure like this:

<div class="l_post j_l_post l_post_bright "...>
    ...
    <div class="j_lzl_c_b_a core_reply_content">
       <li class="lzl_single_post j_lzl_s_p first_no_border" ...>
         <div class="lzl_cnt">
         content
         </div>
       </li>
       <li class="lzl_single_post j_lzl_s_p first_no_border" ...>
       ...
       </li>
    </div>

</div>
<div class="l_post j_l_post l_post_bright "...>
...(contain content, same as above)
</div>
...

Currently I could select all the content in one step like this:

for i in driver.find_elements_by_xpath('//*[@class="lzl_cnt"]'):
    print(i.text)

But as you could see, the webpage consist of repetitive blocks that contain the contents that I need, therefore I want to get those contents separately along with other information that differs between those repetitive blocks( <div class="l_post j_l_post l_post_bright "...>...</div> ), moreover, I want those contents within <li class ="lzl_single_post"...> to be separated so as to be easier for me to process the contents later . I tried this:

items = []

# get each blocks
for sel in driver.find_elements_by_xpath('//div[@class="l_post j_l_post l_post_bright  "]'):
    name = sel.find_element_by_css_selector('.d_name').text
    try: content = sel.find_element_by_css_selector('.j_d_post_content').text
    except: content = '',
    try: 
        reply = []
        # get each post within specific block
        for i in sel.find_elements_by_xpath('//*[@class="lzl_cnt"]'):
            reply.append(i.text)
    except: reply = []
    items.append({'name': name, 'content': content, 'reply': reply})

But the result shows that I am getting all the replies on the webpage every time the outer for-loop runs instead of a set of replies for each individual block that I wanted

Any suggestions?

Just add . (context pointer) to XPath as

sel.find_elements_by_xpath('.//*[@class="lzl_cnt"]')

Note that //*[@class="lzl_cnt"] means all nodes in DOM with "lzl_cnt" class name while .//*[@class="lzl_cnt"] means all nodes that are descendant of sel with "lzl_cnt" class name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM