Ruby解析和正则表达式

Question

Picked up Ruby recently and have been fiddling around with it. 最近选择了Ruby，并一直在摆弄它。 I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line. 我想学习如何使用正则表达式或其他Ruby技巧来检查给定文本行中的某些单词，空格字符，有效格式等。

Let's say I have an order list that looks strictly like this in this format: 假设我有一个严格按照以下格式显示的订单清单：

cost: 50 items: book,lamp

One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that. 分号后有一个空格，每个逗号后没有空格，末尾没有尾随空格，诸如此类。 How can I check for errors in this format using Ruby? 如何使用Ruby检查这种格式的错误？ This for example should fail my checks: 例如，这会使我的检查失败：

cost:     60 items:shoes,football

My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. 我的目标是用“”分割字符串，然后检查第一个单词是否是“ cost：”，如果第二个单词是数字，依此类推，但是我意识到分割“”并不能帮助我检查多余的空格，因为它会把它吃光。 Also doesn't help me check for trailing whitespaces. 也没有帮助我检查尾随空格。 How do I go about doing this? 我该怎么做呢？

Answer 1

You could use the following regular expression. 您可以使用以下正则表达式。

r = /
    \A                # match beginning of string     
    cost:\s           # match "cost:" followed by a space
    \d+\s             # match > 0 digits followed by a space
    items:\s          # match "items:" followed by a space
    [[:alpha:]]+      # match > 0 lowercase or uppercase letters
    (?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase 
                      # letters in a non-capture group (?: ... )
    *                 # perform the match on non-capture group >= 0 times
    \z                # match the end of the string
    /x                # free-spacing regex definition mode

"cost: 50 items: book,lamp"         =~ r #=> 0   (a match, beginning at index 0)
"cost: 50 items: book,lamp,table"   =~ r #=> 0   (a match, beginning at index 0)
"cost:     60 items:shoes,football" =~ r #=> nil (no match)

The regex can can of course be written in the normal manner: 正则表达式当然可以以正常方式编写：

r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/

or 要么

r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/

though a whitespace character ( \\s ) cannot be replaced by a space in the free-spacing mode definition ( \\x ). 空格字符（ \\s ）不能用自由间距模式定义（ \\x ）中的空格代替。

Ruby解析和正则表达式

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-09-05 05:26:18

Ruby解析和正则表达式

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-09-05 05:26:18

解决方案1
2 已采纳 2016-09-05 05:26:18