简体   繁体   English

Ruby Regex匹配某些字符前后的字符串

[英]Ruby Regex matching string before and after certain characters

I've got a string like this: 我有这样的字符串:

<block trace="true" name="AssignResources: Append Resources">

I need to get the word (or the characters to next whitespace) after < 我需要在<之后加上单词(或下一个空格中的字符) < (in this case block ) and the words before = (here trace and name ). (在这种情况下为block )和=之前的单词(此处为tracename )。

I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block . 我尝试了几种正则表达式模式,但是所有尝试均返回包含“定界符”字符的单词,例如;block

I'm sure it's not that hard, but I've not found the solution yet. 我确定这并不难,但是我还没有找到解决方案。

Anybody's got a hint? 有人暗示吗?
Thanks. 谢谢。

Btw: I want to replace the pattern matches with gsub . 顺便说一句:我想用gsub替换模式匹配。

EDIT: 编辑:

Solved it with following regexes: 使用以下正则表达式解决了它:

1) /\\s(\\w+)="(.*?)"/ matches all attr and their values in $1 and $2. 1)/ /\\s(\\w+)="(.*?)"/ )="(. /\\s(\\w+)="(.*?)"/ )" /\\s(\\w+)="(.*?)"/匹配所有attr及其在$ 1和$ 2中的值。

2) /<!--.*-->/ matches comments 2)/& /<!--.*-->/匹配注释

3) /&lt;([\\/|!|\\?]?)([A-Za-z0-9]+)[^\\s|&gt;|\\/]*/ matches all tag names, wheter they're in a closing tag, self closing tag, <?xml> -tag or DTD-tag. 3) /&lt;([\\/|!|\\?]?)([A-Za-z0-9]+)[^\\s|&gt;|\\/]*/匹配所有标签名称,放在结束标记,自结束标记, <?xml> -tag或DTD-tag中。 $1 includes optional prefixed / ! or ? $1包括可选的前缀/ ! or ? / ! or ? or nothing and $2 contains the tagname 或什么也没有,并且$2包含标记名

你可以试试:

&lt;([^ ]*)\s([^=]*)=

Its looks so much like parsing HTML with regex to me 它看起来非常像用正则表达式解析HTML

Ruby has very good html parser called Nokogiri Ruby有一个非常好的html解析器,叫做Nokogiri

And Here is howto for that 这是如何做的

require 'nokogiri'

html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')

html.xpath("//*").each do |s|
    puts s.node_name #block
    puts s.keys #trace, name
    puts s.values #true, AssignResources: Append Resources
end
'&lt;block trace="true" name="AssignResources: Append Resources"&gt;'[/&lt;(\w+)/, 1]
#=> "block"

If you pass a regex and an index i to String#[] , it'll return the value of the ith capturing group. 如果将正则表达式和索引i传递给String#[] ,它将返回第i个捕获组的值。

Edit: 编辑:

In 1.9 you can use /(?<=&lt;)\\w+/ to require the presence of the &lt; 在1.9中,您可以使用/(?<=&lt;)\\w+/来要求&lt; without matching it. 没有匹配。 In 1.8 there is no way to do that. 在1.8中,没有办法做到这一点。 The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this: 最好的办法是将不想替换的零件放在捕获组中,然后按如下方式访问替换组:

"lo&lt;la li".gsub(/(&lt;)(\w+)/, '\1 --\2--')
 #=> "lo&lt; --la-- li"
&lt;block trace="true" name="AssignResources: Append Resources"&gt;

&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;

#result:

$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources

Update: I don't know ruby, but based on the description of gsub here , I believe that something like the following should do the trick. 更新:我不知道红宝石,但是根据这里gsub的描述,我相信类似以下的方法可以解决问题。

str = '&lt;block trace="true" name="AssignResources: Append Resources"&gt;'
repl = str.gsub(/&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;/, 
    "tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl

Most probably you should go with Nokigiri or something similar. 很可能您应该选择Nokigiri或类似的东西。 I couldn't fit it in one gsub but in two: 我不能将它放在一个gsub中,而只能放在两个中:

>> m,r=0,["&lt;blockie ", " tracie=", " namie="]
>> s.gsub(/&lt;.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "&lt;blockie tracie="true" namie="AssignResources: Append Resources"&gt;"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM