简体   繁体   English

包含我们货币的正则表达式/ grep字符串

[英]Regex/grep strings containing us currency

I have a list of strings, some of which contain dollar figures. 我有一个字符串列表,其中一些包含美元数字。 For example: 例如:

'$34232 foo    \n  bar'

is there an [r] command that can return to me only the strings which contain dollar amounts in them? 是否有[r]命令只能向我返回其中包含美元金额的字符串?

Thank you! 谢谢!

Use \\\\$ to protect the $ which otherwise means "end of string": 使用\\\\$保护$ ,否则表示“字符串结尾”:

   grep("\\$[0-9]+",c("123","$567","abc $57","$abc"),value=TRUE)

This will select strings that contain a dollar sign followed by one or more digits (but not eg $abc ). 这将选择包含一个美元符号后跟一个或多个数字(但不包括例如$abc )的字符串。 grep with value=FALSE returns the indices. value=FALSE grep返回索引。 grepl returns a logical vector. grepl返回逻辑向量。 One R-specific point is that you need to specify \\\\$ , not just \\$ (ie an additional backslash is required for protection): \\$ will give you an "unrecognized escape" error. 一个特定于R的点是,您需要指定\\\\$ ,而不仅仅是\\$ (即,需要附加的反斜杠来进行保护): \\$将给您一个“无法识别的转义”错误。

@Cerbrus's answer, '\\\\$[0-9,.]+' , will match slightly more broadly (eg it will match $456.89 or $367,245,100 ). @Cerbrus的答案'\\\\$[0-9,.]+' ,。 '\\\\$[0-9,.]+'会更广泛地匹配(例如,它将匹配$456.89$367,245,100 )。 It will also match some implausible currency strings, eg $45.13.89 or $467.43,2,1 (ie commas should be allowed only for groupings of 3 digits in the dollars segment; there should be only one decimal point separating dollars and cents). 它还将匹配一些难以置信的货币字符串,例如$45.13.89$467.43,2,1 (即,在美元段中仅允许对3位数字的分组使用逗号;美元和美分之间应只保留一个小数点)。 Both of our answers will (incorrectly?) match $45abc . 我们的两个答案都将(不正确吗?)匹配$45abc If you're lucky, your data don't have contain any of these tricky possibilities. 如果幸运的话,您的数据将不包含任何这些棘手的可能性。 Getting this right in general is hard; 通常很难做到这一点。 the answer referred to in the comments ( What is "The Best" US Currency RegEx? ) tries to do this, and as a result has significantly more complex answers, but could be useful if you adapt the answers to R by protecting $ appropriately. 注释中提到的答案( 什么是“最佳”美国货币RegEx? )试图做到这一点,因此答案要复杂得多,但是如果您通过适当地保护$来使答案适应R,则可能会很有用。

Sure there is: 当然有:

'\\$[0-9,.]+'

\\$ //Dollar sign
[0-9,.]+ // One or more numbers, dots, or comma's.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM