[英]Ruby parsing and regex
Picked up Ruby recently and have been fiddling around with it. 最近选择了Ruby,并一直在摆弄它。 I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.
我想学习如何使用正则表达式或其他Ruby技巧来检查给定文本行中的某些单词,空格字符,有效格式等。
Let's say I have an order list that looks strictly like this in this format: 假设我有一个严格按照以下格式显示的订单清单:
cost: 50 items: book,lamp
One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that. 分号后有一个空格,每个逗号后没有空格,末尾没有尾随空格,诸如此类。 How can I check for errors in this format using Ruby?
如何使用Ruby检查这种格式的错误? This for example should fail my checks:
例如,这会使我的检查失败:
cost: 60 items:shoes,football
My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. 我的目标是用“”分割字符串,然后检查第一个单词是否是“ cost:”,如果第二个单词是数字,依此类推,但是我意识到分割“”并不能帮助我检查多余的空格,因为它会把它吃光。 Also doesn't help me check for trailing whitespaces.
也没有帮助我检查尾随空格。 How do I go about doing this?
我该怎么做呢?
You could use the following regular expression. 您可以使用以下正则表达式。
r = /
\A # match beginning of string
cost:\s # match "cost:" followed by a space
\d+\s # match > 0 digits followed by a space
items:\s # match "items:" followed by a space
[[:alpha:]]+ # match > 0 lowercase or uppercase letters
(?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase
# letters in a non-capture group (?: ... )
* # perform the match on non-capture group >= 0 times
\z # match the end of the string
/x # free-spacing regex definition mode
"cost: 50 items: book,lamp" =~ r #=> 0 (a match, beginning at index 0)
"cost: 50 items: book,lamp,table" =~ r #=> 0 (a match, beginning at index 0)
"cost: 60 items:shoes,football" =~ r #=> nil (no match)
The regex can can of course be written in the normal manner: 正则表达式当然可以以正常方式编写:
r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/
or 要么
r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/
though a whitespace character ( \\s
) cannot be replaced by a space in the free-spacing mode definition ( \\x
). 空格字符(
\\s
)不能用自由间距模式定义( \\x
)中的空格代替。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.