简体   繁体   English

如何使用 XPath 捕获部分 ap 标签

[英]How do I use XPath to capture a portion of a p tag

I've seen plenty of similar questions on here but still can't get my particular use case working.我在这里看到了很多类似的问题,但仍然无法让我的特定用例正常工作。 XPath newby here. XPath newby 在这里。

I have the following snippet of ruby code我有以下 ruby 代码片段

        post_html = Nokogiri::HTML(post.raw)
        @restricted_file_types = SiteSetting.file_attachment_whispers_file_extensions.split('|')
        Rails.logger.info "searching for restricted extensions #{@restricted_file_types.inspect}"
        Rails.logger.info "post is: #{post_html}"


        # trying to get links
        tags = post_html.xpath("//p[contains(., '\[.*')]")
        Rails.logger.info "tag from regex: #{tags}"

        tags.each do |attachment| 
            Rails.logger.info "p tag found in parsing"
            Rails.logger.info "#{attachment}"
            does_contain = @restricted_file_types.any? {
                |extension| attachment.include?(extension)
            }

            Rails.logger.info 'checking for restricted'
            if contains_restricted?(attachment)
                Rails.logger.info 'contains restricted'
                links.push(attachment)
                node = post_html.create_element 'p'# create paragraph element
                node.inner_html = SiteSetting.file_attachment_whispers_message
                attachment.replace '[color=red]' + node + '[/color]' # replace found link with paragraph
                post.raw = post_html
                post.save!
                hasUpdated = true
            end
        end

post.raw will contain something of a similar structure to this example. post.raw 将包含与此示例类似的结构。

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>this is a dog that borks. Image of dog attached.

[3408.zip|attachment](upload://szj2cSHQnbp8zsuBeC7ZsZ9mllg.zip) (19.7 KB)</p></body></html>

My goal is to replace the link to a file with a message.我的目标是用消息替换文件的链接。 To do that i was trying to capture just that portion of the html to change it all and then rewrite it.为此,我试图仅捕获 html 的那一部分来更改它,然后重写它。 I'm getting stuck on how to properly use xpath in this situation.在这种情况下,我对如何正确使用 xpath 感到困惑。 post_html.xpath("//p[contains(., '[.*')]") does not work. post_html.xpath("//p[contains(., '[.*')]") 不起作用。

I don't fully understand what the first parameter in xpath does since I got that from another example.我不完全理解 xpath 中的第一个参数是做什么的,因为我是从另一个例子中得到的。 Nor do I understand how this comes back with nothing matched.我也不明白这是如何在没有任何匹配的情况下返回的。 Can someone explain what exactly that is doing and why it is not working?有人可以解释它到底在做什么以及为什么它不起作用吗?

Thanks:)谢谢:)

You could use something like:你可以使用类似的东西:

concat(substring-before(//p,"["),"I love dogs")

First, for each p element we extract everything before the "[" with substring-before function.首先,对于每个 p 元素,我们提取“[”之前的所有内容,并在 function 之前使用子字符串。 Then we "paste" the message we want with concat.然后我们用 concat “粘贴”我们想要的消息。

Output: Output:

this is a dog that borks. Image of dog attached. I love dogs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM