简体   繁体   中英

Selenium, how to extract text between two div tags

I am new to using Selenium to perform web automation on websites and I am trouble with extracting text between two div tags.

Here is a snip-bit of the HTML code that I am trying to extract the text from.

 ...
<tr>
    <td width="150">
    <a href="https://rads.stackoverflow.com/amzn/click/com/B0099RGRT8" rel="nofollow noreferrer">
    <img height="90" border="0" width="90" alt="iOttie Easy Flex2 Windshield Dashboard Car Mount H&hellip by iOttie" src="http://ecx.images-amazon.com/images/I/51mf6Ry9J2L._SL500_SS90_.jpg">
    </a>
    <div class="xxsmall" style="margin-top: 5px">
        <a href="https://rads.stackoverflow.com/amzn/click/com/B0099RGRT8" rel="nofollow noreferrer">iOttie Easy Flex2 Windshield Dashboard Car Mount Holder Desk Stand for iPhone 5 4S 4 3GS Samsung Gal&amp;hellip</a>
        by iOttie
    </div>
    </td>
    <td style="padding-left: 10px;">
        <div>
            <div>
                <span style="margin-left:-5px; vertical-align: -1">

                </span>
                <b>
                <a href="http://www.amazon.com/gp/cdp/member-reviews/A2UQ07EFPSX78X/ref=cm_pdp_rev_title_1?ie=UTF8&sort_by=MostRecentReview#R12ATB4KTIWFV8">Bought for my wife, now I want one. Excellent Product.</a>
                </b>
                ,
                <span class="nowrap">November 30, 2012</span>
            </div>
            <div style="margin-top: 5px;">
                I bought this mount for my wife, the feedback from her was is that it was really nice and easy to use even while driving.
                <br>
                <br>
                So I "borrowed" it for a couple days, and now I am going to get one for myself. I am using it with an iPhone, but it would work fine with phones of all sizes, which is nice. If my phone size ever changes the mount will accommodate different sizes phones.
                <br>
                <br>
                The phone is very easy to insert and remove , even while driving.
                <br>
                The mount is easy to position but not loose enough that it doesn't hold the position you want.
                <br>
                <br>
                I was very impressed with the windshield mount, it is not just a typical suction cup mount. (Which always at some point…
                <a href="http://www.amazon.com/gp/cdp/member-reviews/A2UQ07EFPSX78X/ref=cm_pdp_rev_more?ie=UTF8&sort_by=MostRecentReview#R12ATB4KTIWFV8">Read more</a>
            </div>
        </div>
    </td>
</tr>
...

The other div tags actually contain other text as well.

What I wanted to extract from this is: I bought this mount for my wife, the feedback from her was is that it was really nice and easy to use even while driving.

            I bought this mount for my wife, the feedback from her was is that it was really nice and easy to use even while driving.

            So I "borrowed" it for a couple days, and now I am going to get one for myself. I am using it with an iPhone, but it would work fine with phones of all sizes, which is nice. If my phone size ever changes the mount will accommodate different sizes phones.

            The phone is very easy to insert and remove , even while driving.

            The mount is easy to position but not loose enough that it doesn't hold the position you want.

            I was very impressed with the windshield mount, it is not just a typical suction cup mount. (Which always at some point…

This is my code:

String review;
try {
    review = WebElement.bucketElement.findElement(By.xpath("./td/div")).getText();
} catch (NoSuchElementException nsee) {
    review = "NA";
}

This actually extracts all the text from all the inner most div tags which is not what I want. I can target specific div tags with ./td/div/div[3] but I can't get the text in-between the div tags.

Any thoughts?

Thanks

You can use regular expresions as a workaround:

String review;
try {
    review = WebElement.bucketElement.findElement(By.xpath("./td/div")).getText();
    review.replaceAll("(<.+>)", "");
} catch (NoSuchElementException nsee) {
    review = "NA";
}

Regex removes all tags and inner elements text. Only first level text left. It means if you have:

some strange<div>other text</div> text result string will be: some strange text

If you need more complex regular expresion here is useful link to test it .

使用/ td / div / div [3]查找元素后,如果在此webelement中执行getText(),它将在此div / element中返回文本。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM