简体   繁体   中英

How to select multiple text nodes as a single string using xpath expression?

I am pretty new to xpath and I am trying to scrape a website using xpath expression in scrapy. The structure of the page that I am trying to scrape is-

...
<div class="article-body">
<p class="body">Text1</p>
<p class="body">Text2</p>
<p class="body">Text3</p>
...

The xpath that I am trying is-

//div[@class="article-body"]/p/text()

But all I get is Text1 in my database. Instead of this, I want the output as-

Text1.Text2.Text3

I think I should use concat or string-join or some function like that. But I am unable to work it out. Since I have to pass this xpath expression as an argument in scrapy, I need to have it as a single expression only.

I tried feeding the concat function into my django-scraper as-

concat(//div[@class="article-body"]/p)

But it threw this error at me-

File "C:\Anaconda2\lib\site-packages\scrapy\selector\unified.py", line 100, in xpath raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))

I got this same error when I tried (there is no other <p> element on the page)-

concat(//p)

or

string-join(//p)

However, when I am trying, string(//p) I am getting Text1 in my database.

have you try this :-

concat(//div[@class="article-body"]/p)

String values = myTestDriver.findElement(By.xpath("concat(//div[@class="article-body"]/p)"));

OR

You need to do something like this

    ArrayList<String> name;
    String name1;
    List<WebElement> options = myTestDriver.findElements(By.xpath("//div[@class="article-body"]/p"));
    System.out.println(options.size());
    for(int i=0 ; i<options.size() ; i++){
        System.out.println(options.get(i).getText());
        name1 = options.get(i).getText();
        name.add(name1);
    }

Now you can perform concatination

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM