R从字符串末尾提取第一个模式

Question

I want to extract sizes from strings, which can be: 我想从字符串中提取大小，可以是：

a <- c("xxxxxxx 2.5 oz (23488)",
        "xxxxx /1.36oz",
        "xxxxx/7 days /20 ml")

Result I want: 2.5 oz /1.36oz /20 ml 我想要的结果： 2.5 oz /1.36oz /20 ml

Because strings varies, so I want to extract patterns backward. 因为字符串不同，所以我想向后提取模式。 That is, I want to extract the first appearance of \\\\/*(\\\\d+\\\\.*\\\\d*)\\\\s*[[:alpha:]]+ from the end of the string. 也就是说，我要从字符串的末尾提取\\\\/*(\\\\d+\\\\.*\\\\d*)\\\\s*[[:alpha:]]+的第一个外观。 It will avoid R from taking 23488 from the first string and /7 days from the third string. 这样可以避免R从第一个字符串中获取23488 ，从第三个字符串中获取/7 days 。

Anyone knows how I can achieve this? 有人知道我该如何实现吗？ Thanks! 谢谢！

Answer 1

You may use 您可以使用

> a <- c("xxxxxxx 2.5 oz (23488)",
+         "xxxxx /1.36oz",
+         "xxxxx/7 days /20 ml")
> regmatches(a, regexpr("/?\\d+(?:\\.\\d+)?\\s*\\p{L}+(?!.*\\d(?:\\.\\d+)?\\s*\\p{L}+)", a, perl=TRUE))
[1] "2.5 oz"  "/1.36oz" "/20 ml"

See the regex demo . 参见regex演示。

Details 细节

/? - an optional / -可选的/
\\\\d+ - 1+ digits \\\\d+ -1个以上数字
(?:\\\\.\\\\d+)? - an optional . -可选的. and 1+ digits sequence 和1个以上的数字顺序
\\\\s* - 0+ whitespaces \\\\s* -0+空格
\\\\p{L}+ - 1+ letters \\\\p{L}+ -1个以上字母
(?!.*\\\\d(?:\\\\.\\\\d+)?\\\\s*\\\\p{L}+) - not followed with (?!.*\\\\d(?:\\\\.\\\\d+)?\\\\s*\\\\p{L}+) -不跟
- .* - any 0+ chars, as many as possible .* -尽可能多的0个字符
- \\\\d - a digit \\\\d一个数字
- (?:\\\\.\\\\d+)? - an optional . -可选的. and 1+ digits sequence 和1个以上的数字顺序
- \\\\s* - 0+ whitespaces \\\\s* -0+空格
- \\\\p{L}+ - 1+ letters \\\\p{L}+ -1个以上字母

Answer 2

If you know the name of the units(oz, ml, etc), you could try something like this: 如果您知道单位名称（盎司，毫升等），则可以尝试如下操作：

((\\d*|\\d*\\.\\d{0,2})\\s?(ml|oz|etc))

See working example . 请参阅工作示例。

R从字符串末尾提取第一个模式

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-08-20 22:11:24

解决方案2
1 2018-08-20 22:19:28

R从字符串末尾提取第一个模式

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-08-20 22:11:24

解决方案2 1 2018-08-20 22:19:28

解决方案1
3 已采纳 2018-08-20 22:11:24

解决方案2
1 2018-08-20 22:19:28