简体   繁体   English

用于在两个字符之间查找字符串的正则表达式模式-但第二个字符首次出现

[英]Regex pattern for finding string between two characters - but first occurrence of the second character

I want a regex to find string between two characters but only from start delimiter to first occurrence of end delimiter 我希望正则表达式在两个字符之间找到字符串,但仅从开始定界符到第一次出现结束定界符

I want to extract story from the lines of following format 我想从以下格式的行中提取故事

<metadata name="user" story="{some_text_here}" \/>

So I want to extract only : {some_text_here} 所以我只想提取: {some_text_here}

For that I am using the following regex: 为此,我使用以下正则表达式:

<metadata name="user" story="(.*)" \/>

And java code: 和Java代码:

public static void main(String[] args) throws IOException {
        String regexString = "<metadata name="user" story="(.*)" \/>";
        String filePath = "C:\\Desktop\\temp\\test.txt";
        Pattern p = Pattern.compile(regexString);
        Matcher m;
        try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
            String line;
            while ((line = br.readLine()) != null) {
                m = p.matcher(line);
                if (m.find()) {                     
                    System.out.println(m.group(1));
                }
            }
        }

    }

This regex mostly works fine but surprisingly if the line is: 这个正则表达式通常可以正常工作,但是如果该行是:

<metadata name="user" story="My name is Nick" extraStory="something" />

Running the code also filters My name is Nick" extraStory="something where as I only want to make sure that I get My name is Nick 运行代码还会过滤My name is Nick" extraStory="something ,我只想确保我得到的My name is Nick

Also I want to make sure that there is actually no information between story="My name is Nick" and before /> 我也想确保在story="My name is Nick"/>之前之间没有任何信息

<metadata name="user" story="([^"]*)" \/>

[^"]* will match everything except the ". [^“] *将匹配除”之外的所有内容。 In this case the string 在这种情况下,字符串

<metadata name="user" story="My name is Nick" extraStory="something" />

will not be matched. 将不匹配。

The following XPath should solve your problem : 以下XPath应该可以解决您的问题:

//metadata[@name='user' and @story and count(@*) = 2]/@story

It address the story attribute of any metadata node in the document whose name attribute is user , which also has a story attribute but no others (attributes count is 2). 它处理文档中name属性为user的任何metadata节点的story属性,该节点也具有story属性,但没有其他属性(属性计数为2)。

(Note : //metadata[@name='user' and count(@*)=2]/@story would be enough since it would be impossible to address the story attribute of a metadata node whose second attribute isn't story ) (注意: //metadata[@name='user' and count(@*)=2]/@story就足够了,因为不可能解决第二个属性不是storymetadata节点的story属性)

In Java code, supposing you are handling an instance of org.w3c.dom.Document and already have an instance of XPath available, the code would be the following : 在Java代码中,假设您正在处理org.w3c.dom.Document的实例,并且已经有可用的XPath实例,则代码如下:

xPath.evaluate("//metadata[@name='user' and @story and count(@*) = 2]/@story", xmlDoc);

You can try the XPath here or the Java code here . 您可以尝试的XPath 这里或Java代码在这里

Just use Jsoup . 只需使用Jsoup即可 right tool for the problem :). 解决问题的正确工具:)。

its this easy : 这很容易:

String html; //read html file

Document document = Jsoup.parse(html);

String story = document.select("metadata[name=user]").attr("story");

System.out.println(story);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM