[英]Regex/grep strings containing us currency
I have a list of strings, some of which contain dollar figures. 我有一个字符串列表,其中一些包含美元数字。 For example: 例如:
'$34232 foo \n bar'
is there an [r]
command that can return to me only the strings which contain dollar amounts in them? 是否有[r]
命令只能向我返回其中包含美元金额的字符串?
Thank you! 谢谢!
Use \\\\$
to protect the $
which otherwise means "end of string": 使用\\\\$
保护$
,否则表示“字符串结尾”:
grep("\\$[0-9]+",c("123","$567","abc $57","$abc"),value=TRUE)
This will select strings that contain a dollar sign followed by one or more digits (but not eg $abc
). 这将选择包含一个美元符号后跟一个或多个数字(但不包括例如$abc
)的字符串。 grep
with value=FALSE
returns the indices. value=FALSE
grep
返回索引。 grepl
returns a logical vector. grepl
返回逻辑向量。 One R-specific point is that you need to specify \\\\$
, not just \\$
(ie an additional backslash is required for protection): \\$
will give you an "unrecognized escape" error. 一个特定于R的点是,您需要指定\\\\$
,而不仅仅是\\$
(即,需要附加的反斜杠来进行保护): \\$
将给您一个“无法识别的转义”错误。
@Cerbrus's answer, '\\\\$[0-9,.]+'
, will match slightly more broadly (eg it will match $456.89
or $367,245,100
). @Cerbrus的答案'\\\\$[0-9,.]+'
,。 '\\\\$[0-9,.]+'
会更广泛地匹配(例如,它将匹配$456.89
或$367,245,100
)。 It will also match some implausible currency strings, eg $45.13.89
or $467.43,2,1
(ie commas should be allowed only for groupings of 3 digits in the dollars segment; there should be only one decimal point separating dollars and cents). 它还将匹配一些难以置信的货币字符串,例如$45.13.89
或$467.43,2,1
(即,在美元段中仅允许对3位数字的分组使用逗号;美元和美分之间应只保留一个小数点)。 Both of our answers will (incorrectly?) match $45abc
. 我们的两个答案都将(不正确吗?)匹配$45abc
。 If you're lucky, your data don't have contain any of these tricky possibilities. 如果幸运的话,您的数据将不包含任何这些棘手的可能性。 Getting this right in general is hard; 通常很难做到这一点。 the answer referred to in the comments ( What is "The Best" US Currency RegEx? ) tries to do this, and as a result has significantly more complex answers, but could be useful if you adapt the answers to R by protecting $
appropriately. 注释中提到的答案( 什么是“最佳”美国货币RegEx? )试图做到这一点,因此答案要复杂得多,但是如果您通过适当地保护$
来使答案适应R,则可能会很有用。
Sure there is: 当然有:
'\\$[0-9,.]+'
\\$ //Dollar sign
[0-9,.]+ // One or more numbers, dots, or comma's.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.