简体   繁体   English

使用正则表达式从SPARQL查询中提取信息

[英]Extracting information from a SPARQL query using regular expressions

I am having a hard time creating a regular expression that extracts the namespaces from this SPARQL query: 我很难创建一个从此SPARQL查询中提取名称空间的正则表达式:

SELECT * 
WHERE {
    ?Vehicle rdf:type umbel-sc:CompactCar ;
             skos:subject <http://dbpedia.org/resource/Category:Vehicles_with_CVT_transmission>;
             dbp-prop:assembly ?Place.
    ?Place geo-ont:parentFeature dbpedia:United_States .
}

I need to get: 我需要得到:

"rdf", "umbel-sc", "skos", "dbp-prop", "geo-ont", "dbpedia"

I need a expression like this: 我需要这样的表达:

\\s+([^\\:]*):[^\\s]+

But the above one does not work, because it also eats spaces before reaching : . 但以上方法无效,因为它在到达之前还会占用空间: What am I doing wrong? 我究竟做错了什么?

I don't know the details of SPARQL syntax, but I would imagine that it is not a regular language so regular expressions won't be able to do this perfectly. 我不知道SPARQL语法的详细信息,但我可以想象这不是一种正则语言,因此正则表达式将无法完美地做到这一点。 However you can get pretty close if you search for something that looks like a word and is surrounded by space on the left and a colon on the right. 但是,如果您搜索的东西看起来像一个单词,并且左边被空格包围,右边被冒号包围,您会变得非常接近。

This method might be good enough for a quick solution or if your input format is known and sufficiently restricted. 如果您的输入格式已知并且受到足够的限制,则此方法对于快速解决方案可能足够好。 For a more general solution suggest you look for or create a proper parser for the SPARQL language. 有关更通用的解决方案,建议您寻找或为SPARQL语言创建合适的解析器。

With that said, try this: 话虽如此,请尝试以下操作:

string s = @"SELECT * 
WHERE {
    ?Vehicle rdf:type umbel-sc:CompactCar ;
    skos:subject <http://dbpedia.org/resource/Category:Vehicles_with_CVT_transmission>;
    dbp-prop:assembly ?Place.
    ?Place geo-ont:parentFeature dbpedia:United_States .
}";

foreach (Match match in Regex.Matches(s, @"\s([\w-]+):"))
{
    Console.WriteLine(match.Groups[1].Value);
}

Result: 结果:

rdf
umbel-sc
skos
dbp-prop
geo-ont
dbpedia

So I need a expression like this: 所以我需要一个这样的表达式:

 \\\\s+([^\\\\:]*):[^\\\\s]+ 

But the above one does not work, because it also eats spaces before reaching ":". 但是上述方法不起作用,因为它在到达“:”之前也会占用空间。

The regular expression will eat those spaces, yes, but the group captured by your parenthesis won't contain it. 正则表达式会占用这些空间,是的,但是括号中捕获的组将不包含它。 Is that a problem? 那是问题吗? You can access this group by reading from Groups[1].Value in the Match object returned from Regex.Match . 您可以通过读取Regex.Match返回的Match对象中的Groups[1].Value来访问该组。

If you really need the regex to not match these spaces, you can use a so-called look-behind assertion : 如果您确实需要正则表达式匹配这些空格,则可以使用所谓的后向断言

(?<=\s)([^:]*):[^\s]+

As an aside, you don't need to double all your backslashes. 顺便说一句,您不需要将所有的反斜杠加倍。 Use a verbatim string instead, like this: 请使用逐字字符串 ,如下所示:

Regex.Match(input, @"(?<=\s)([^:]*):[^\s]+")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM