简体   繁体   English

特定字符的组合匹配和替换

[英]Combination Regmatches and Replacement for specific Character

I've tried replace character which match with specific character or followed by "BT", but my codes failed.我试过替换与特定字符匹配或后跟“BT”的字符,但我的代码失败了。 This is my codes:这是我的代码:

df <- data.frame(
  exposure = c("123BT", "113BB", "116BB", "117BT")
)

df %>%
  mutate(
    exposure2 = case_when(exposure == regmatches("d+\\BT") ~ paste0("-", exposure),
                     TRUE ~ exposure)
  )

the error is:错误是:

Error: Problem with `mutate()` column `exposure2`.
i `exposure2 = case_when(...)`.
x argument "m" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.

Whereas my target is:而我的目标是:

df <- data.frame(
  exposure = c("123BT", "113BB", "116BB", "117BT"),
exposure2 = c(-123, 113, 116, -117)
)

I recommend you use library stringr , you can extract your numbers with regex (\\d)+ :我建议您使用库stringr ,您可以使用正则表达式(\\d)+提取您的数字:

library(stringr)
library(dplyr)

df %>%
  mutate(
    exposure2 = case_when(str_detect(exposure,"BT") ~ paste0("-", str_extract(exposure, "(\\d)+")),
                          TRUE ~ str_extract(exposure, "(\\d)+"))
  )

Output: Output:

  exposure exposure2
1    123BT      -123
2    113BB       113
3    116BB       116
4    117BT      -117

If you still prefer use regmatches you can get same result with:如果您仍然喜欢使用regmatches ,您可以获得相同的结果:

df %>%
  mutate(
    exposure2 = case_when(exposure %in% regmatches(exposure, regexpr("\\d+BT", exposure)) ~ paste0("-", regmatches(exposure, regexpr("\\d+", exposure))),
                          TRUE ~ regmatches(exposure, regexpr("\\d+", exposure)))
  )

First, a concise solution that you can easily implement in your dplyr::mutate .首先,您可以在dplyr::mutate中轻松实现一个简洁的解决方案。 Using gsub we remove characters and coerce the result as.integer .使用gsub我们删除字符并将结果强制为as.integer The result, we multiply by 1 or -1 according to if the string contains "BT" or not;结果,我们根据字符串是否包含"BT"乘以1-1 for this we use grepl (gives boolean) and add 1L (coerces to integer) to get indices 1 or 2 .为此,我们使用grepl (给出布尔值)并添加1L (强制转换为整数)以获得索引12

c(1, -1)[grepl('BT', df$exposure) + 1L]*as.integer(gsub('\\D', '', df$exposure))
# [1] -123  113  116 -117

Above is the recommended solution.以上是推荐的解决方案。 The solution you envision is much more complex since it processes the information not very efficient.您设想的解决方案要复杂得多,因为它处理信息的效率不是很高。 I implement the logic in a small f unction 1 to demonstrate.我在一个小f 1中实现逻辑来演示。

f <- \(x) {
  rm <- regmatches(x, regexpr("\\d+BT", x))
  o <- gsub('\\D', '', x)
  o <- ifelse(x %in% rm, paste0('-', o), o)
  as.integer(o)
}

f(df$exposure)
# [1] -123  113  116 -117

1 Notes: For regmatches you need matching info, eg from regexpr . 1注意:对于正regexpr regmatches信息。 The regex should actually look sth like "\\d+BT" .正则表达式实际上应该看起来像"\\d+BT"


Data:数据:

df <- structure(list(exposure = c("123BT", "113BB", "116BB", "117BT"
)), class = "data.frame", row.names = c(NA, -4L))
library(readr) 
(-1)^grepl('BT', df$exposure)  * parse_number(df$exposure)
[1] -123  113  116 -117

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM