I am pretty new to xpath and I am trying to scrape a website using xpath expression in scrapy. The structure of the page that I am trying to scrape is-
...
<div class="article-body">
<p class="body">Text1</p>
<p class="body">Text2</p>
<p class="body">Text3</p>
...
The xpath that I am trying is-
//div[@class="article-body"]/p/text()
But all I get is Text1
in my database. Instead of this, I want the output as-
Text1.Text2.Text3
I think I should use concat
or string-join
or some function like that. But I am unable to work it out. Since I have to pass this xpath expression as an argument in scrapy, I need to have it as a single expression only.
I tried feeding the concat
function into my django-scraper as-
concat(//div[@class="article-body"]/p)
But it threw this error at me-
File "C:\Anaconda2\lib\site-packages\scrapy\selector\unified.py", line 100, in xpath raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))
I got this same error when I tried (there is no other <p>
element on the page)-
concat(//p)
or
string-join(//p)
However, when I am trying, string(//p)
I am getting Text1
in my database.
have you try this :-
concat(//div[@class="article-body"]/p)
String values = myTestDriver.findElement(By.xpath("concat(//div[@class="article-body"]/p)"));
OR
You need to do something like this
ArrayList<String> name;
String name1;
List<WebElement> options = myTestDriver.findElements(By.xpath("//div[@class="article-body"]/p"));
System.out.println(options.size());
for(int i=0 ; i<options.size() ; i++){
System.out.println(options.get(i).getText());
name1 = options.get(i).getText();
name.add(name1);
}
Now you can perform concatination
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.