简体   繁体   中英

Pull data from a div class Python Selenium

I'm trying to pull a specific number out of a div class in Python Selenium but can't figure out how to do it. I'd want to get the "post_parent" ID 947630 as long as it matches the "post_name" number starting 09007 .

I'm looking to do this across multiple "post_name" classes, so I'd feed it something like this: search_text = "0900766b80090cb6" , but there will be multiple in the future so it has to read the "post_name" first then pull the "post_parent" if that makes sense.

Appreciate any advice anyone has to offer.

    <div class="hidden" id="inline_947631">
    <div class="post_title">Interface Converter</div>
    <div class="post_name">0900766b80090cb6</div>
    <div class="post_author">28</div>
    <div class="comment_status">closed</div>
    <div class="ping_status">closed</div>
    <div class="_status">inherit</div>
    <div class="jj">06</div>
    <div class="mm">07</div>
    <div class="aa">2001</div>
    <div class="hh">15</div>
    <div class="mn">44</div>
    <div class="ss">17</div>
    <div class="post_password"></div>
    <div class="post_parent">947630</div>
    <div class="page_template">default</div>
    <div class="tags_input" id="rs-language-code_947631">de</div>
    </div>

If you see <div class="post_name">0900766b80090cb6</div> this and <div class="post_parent">947630</div> are siblings nodes to each other.

You can use xpath -> following-sibling like this:

Code:

search_text = "0900766b80090cb6"
post_parent_num = driver.find_element(By.XPATH, f"//div[@class='post_name' and text()='{search_text}']//following-sibling::div[@class='post_parent']").text
print(post_parent_num)

or Using ExplicitWait:

search_text = "0900766b80090cb6"
post_parent_num = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, f"//div[@class='post_name' and text()='{search_text}']//following-sibling::div[@class='post_parent']"))).get_attribute('innerText')
print(post_parent_num)

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Update:

NoSuchElementException:

Please check in the dev tools (Google chrome) if we have unique entry in HTML-DOM or not.

xpath that you should check:

//div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent']

Steps to check:

Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired element is getting highlighted with 1/1 matching node.

If this is unique //div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent'] then you need to check for the below conditions as well.

  1. Check if it's in any iframe/frame/frameset .

    Solution: switch to iframe/frame/frameset first and then interact with this web element.

  2. Check if it's in any shadow-root .

    Solution: Use driver.execute_script('return document.querySelector to have returned a web element and then operates accordingly.

  3. Make sure that the element is rendered properly before interacting with it. Put some hardcoded delay or Explicit wait and try again.

    Solution: time.sleep(5) or

    WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent']"))).text

  4. If you have redirected to a new tab/ or new windows and you have not switched to that particular new tab/new window , otherwise you will likely get NoSuchElement exception.

    Solution: switch to the relevant window/tab first.

  5. If you have switched to an iframe and the new desired element is not in the same iframe context then first switch to default content and then interact with it.

    Solution: switch to default content and then switch to respective iframe.

I don't see any specific relation between "post_parent" ID 947630 and "post_name" number starting 09007 . Moreover, the parent <div> is having class="hidden" .

However, to pull the specific number you can use either of the following locator strategies :

  • Using css_selector :

     print(driver.find_element(By.CSS_SELECTOR, "div[id^='inline'] div.post_parent").text)
  • Using xpath :

     print(driver.find_element(By.XPATH, "//div[starts-with(@id, 'inline_')]//div[@class='post_parent']").text)

Ideally you need to induce WebDriverWait for the presence_of_element_located() and you can use either of the following locator strategies :

  • Using CSS_SELECTOR :

     print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div[id^='inline'] div.post_parent"))).text)
  • Using XPATH :

     print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//div[starts-with(@id, 'inline_')]//div[@class='post_parent']"))).text)
  • Note : You have to add the following imports:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

You can create a method and use the following xpath to get the post_parent text based on post_name text.

def getPostPatent(postname):
    element=driver.find_element(By.XPATH,"//div[@class='post_name' and starts-with(text(),'{}')]/following-sibling::div[@class='post_parent']".format(postname))
    print(element.get_attribute("textContent"))

getPostPatent('09007') 

This will return value if it is matches the text starts-with('09007')

It seems parent class is hidden you need to use textContent to get the value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM