红宝石参数化正则表达式

Question

I have a string like "{some|words|are|here}" or "{another|set|of|words}" 我有一个类似“ {some | words | are | here}”或“ {another | set | of | words}”的字符串

So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket. 因此，一般而言，字符串由一个大括号，由管道分隔的单词和一个大括号组成。

What is the most efficient way to get the selected word of that string ? 获取该字符串的所选单词的最有效方法是什么？

I would like do something like this: 我想做这样的事情：

@my_string = "{this|is|a|test|case}"
@my_string.get_column(0) # => "this"
@my_string.get_column(2) # => "is"
@my_string.get_column(4) # => "case"

What should the method get_column contain ? 方法get_column应该包含什么？

Answer 1

So this is the solution I like right now: 所以这是我现在喜欢的解决方案：

class String
  def get_column(n)
    self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
  end
end

We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column. 我们使用正则表达式来确保字符串格式正确，同时获取正确的列。

Explanation of regex: 正则表达式的解释：

\\A is the beginnning of the string and \\Z is the end, so this regex matches the enitre string. \\A是字符串的开头， \\Z是结尾，因此此正则表达式与enitre字符串匹配。
Since curly braces have a special meaning we escape them as \\{ and \\} to match the curly braces at the beginning and end of the string. 由于花括号具有特殊含义，因此我们将它们用\\{和\\}进行转义，以匹配字符串开头和结尾处的花括号。
next, we want to skip the first n columns - we don't care about them. 接下来，我们要跳过前n列-我们不在乎它们。
- A previous column is some number of letters followed by a vertical bar, so we use the standard \\w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. 前一列是一些字母，后跟竖线，因此我们使用标准\\w来匹配类似单词的字符（包括数字和下划线，但为什么不能）和*来匹配任意数量的字母。 Vertical bar has a special meaning, so we have to escape it as \\| 竖线有特殊含义，因此我们必须将其转义为\\| . 。 Since we want to group this, we enclose it all inside non-capturing parens (?:\\w*\\|) (the ?: makes it non-capturing). 由于我们希望将其分组，因此将其全部封装在非捕获的parens (?:\\w*\\|) （ ?:使它不捕获）。
- Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. 现在我们有n列前面的列，因此我们使用regex计数让正则表达式匹配列模式n次-只需在模式后的花括号中放置一个数字即可。 We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times. 我们使用标准的字符串替换，因此我们只需输入{#{n}}即可表示“与先前的模式完全匹配n次。
the first non skipped column after that is the one we care about, so we put that in capturing parens: (\\w*) 之后的第一个非跳过列是我们关心的列，因此我们将其放在捕获括号中： (\\w*)
then we skip the rest of the columns, if any exist: (?:\\|\\w*)* . 然后我们跳过其余的列（如果存在）： (?:\\|\\w*)* 。

Capturing the column puts it into $1 , so we return that value if the regex matched. 捕获列会将其放入$1 ，因此如果正则表达式匹配，我们将返回该值。 If not, we return nil, since this String has no n th column. 如果不是，则返回nil，因为此String没有第n列。

In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\\na newline or two?}" ), then just replace all the \\w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar. 通常，如果您想在栏目中不只包含单词（例如"{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\\na newline or two?}" ）），那么只需用[^|{}]替换正则表达式中的所有\\w ，这样您就可以使每一列都包含除大括号或竖线以外的任何内容。

Here's my previous solution 这是我以前的解决方案

class String
  def get_column(n)
    raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
    self[1 .. -2].split('|')[n]
  end
end

We use a similar regex to make sure the String contains a set of columns or raise an error. 我们使用类似的正则表达式来确保String包含一组列或引发错误。 Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n] ). 然后，我们从前面和后面剥离花括号（使用self[1 .. -2]限制到从第一个字符开始到最后一个倒数第二个子字符串），使用竖线字符（使用.split('|') ）分隔列.split('|')创建一个列数组），然后找到第n列（使用[n]使用标准Array查找）。

I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column. 我只是想过，只要我使用正则表达式来验证字符串，就最好使用它来捕获列。

红宝石参数化正则表达式

问题描述

1 个解决方案

解决方案1
2 已采纳 2010-05-07 14:37:57

红宝石参数化正则表达式

问题描述

1 个解决方案

解决方案1 2 已采纳 2010-05-07 14:37:57

解决方案1
2 已采纳 2010-05-07 14:37:57