简体   繁体   English

Ruby解析和正则表达式

[英]Ruby parsing and regex

Picked up Ruby recently and have been fiddling around with it. 最近选择了Ruby,并一直在摆弄它。 I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line. 我想学习如何使用正则表达式或其他Ruby技巧来检查给定文本行中的某些单词,空格字符,有效格式等。

Let's say I have an order list that looks strictly like this in this format: 假设我有一个严格按照以下格式显示的订单清单:

cost: 50 items: book,lamp

One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that. 分号后有一个空格,每个逗号后没有空格,末尾没有尾随空格,诸如此类。 How can I check for errors in this format using Ruby? 如何使用Ruby检查这种格式的错误? This for example should fail my checks: 例如,这会使我的检查失败:

cost:     60 items:shoes,football   

My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. 我的目标是用“”分割字符串,然后检查第一个单词是否是“ cost:”,如果第二个单词是数字,依此类推,但是我意识到分割“”并不能帮助我检查多余的空格,因为它会把它吃光。 Also doesn't help me check for trailing whitespaces. 也没有帮助我检查尾随空格。 How do I go about doing this? 我该怎么做呢?

You could use the following regular expression. 您可以使用以下正则表达式。

r = /
    \A                # match beginning of string     
    cost:\s           # match "cost:" followed by a space
    \d+\s             # match > 0 digits followed by a space
    items:\s          # match "items:" followed by a space
    [[:alpha:]]+      # match > 0 lowercase or uppercase letters
    (?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase 
                      # letters in a non-capture group (?: ... )
    *                 # perform the match on non-capture group >= 0 times
    \z                # match the end of the string
    /x                # free-spacing regex definition mode

"cost: 50 items: book,lamp"         =~ r #=> 0   (a match, beginning at index 0)
"cost: 50 items: book,lamp,table"   =~ r #=> 0   (a match, beginning at index 0)
"cost:     60 items:shoes,football" =~ r #=> nil (no match)

The regex can can of course be written in the normal manner: 正则表达式当然可以以正常方式编写:

r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/

or 要么

r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/

though a whitespace character ( \\s ) cannot be replaced by a space in the free-spacing mode definition ( \\x ). 空格字符( \\s )不能用自由间距模式定义( \\x )中的空格代替。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM