简体   繁体   English

如何使用xpath表达式选择多个文本节点作为单个字符串?

[英]How to select multiple text nodes as a single string using xpath expression?

I am pretty new to xpath and I am trying to scrape a website using xpath expression in scrapy. 我是xpath的新手,我正试图在scrapy中使用xpath表达式抓取一个网站。 The structure of the page that I am trying to scrape is- 我试图抓的页面结构是 -

...
<div class="article-body">
<p class="body">Text1</p>
<p class="body">Text2</p>
<p class="body">Text3</p>
...

The xpath that I am trying is- 我正在尝试的xpath是 -

//div[@class="article-body"]/p/text()

But all I get is Text1 in my database. 但我得到的只是我数据库中的Text1 Instead of this, I want the output as- 而不是这个,我希望输出为 -

Text1.Text2.Text3

I think I should use concat or string-join or some function like that. 我想我应该使用concatstring-join或类似的功能。 But I am unable to work it out. 但我无法解决这个问题。 Since I have to pass this xpath expression as an argument in scrapy, I need to have it as a single expression only. 由于我必须将此xpath表达式作为scrapy中的参数传递,因此我需要将其作为单个表达式。

I tried feeding the concat function into my django-scraper as- 我尝试将concat功能输入我的django-scraper中 -

concat(//div[@class="article-body"]/p)

But it threw this error at me- 但它把这个错误扔给了我 -

File "C:\Anaconda2\lib\site-packages\scrapy\selector\unified.py", line 100, in xpath raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))

I got this same error when I tried (there is no other <p> element on the page)- 我尝试时遇到同样的错误(页面上没有其他<p>元素) -

concat(//p)

or 要么

string-join(//p)

However, when I am trying, string(//p) I am getting Text1 in my database. 但是,当我尝试时, string(//p)我在我的数据库中获取Text1

have you try this :- 你试试这个: -

concat(//div[@class="article-body"]/p)

String values = myTestDriver.findElement(By.xpath("concat(//div[@class="article-body"]/p)"));

OR 要么

You need to do something like this 你需要做这样的事情

    ArrayList<String> name;
    String name1;
    List<WebElement> options = myTestDriver.findElements(By.xpath("//div[@class="article-body"]/p"));
    System.out.println(options.size());
    for(int i=0 ; i<options.size() ; i++){
        System.out.println(options.get(i).getText());
        name1 = options.get(i).getText();
        name.add(name1);
    }

Now you can perform concatination 现在你可以进行连接了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM