[英]How to remove square parentheses and text within from strings in R
I have encounter a problem in R language to process a data frame ( test_dataframe
) column ( test_column
) value like below:我在 R 语言中遇到了一个问题来处理数据框 ( test_dataframe
) 列 ( test_column
) 值,如下所示:
Original strings in the column:列中的原始字符串:
test_column
6.77[9]
5.92[10]
2.98[103]
I need to remove square brackets and any character inside square brackets , so the target value is below:我需要删除方括号和方括号内的任何字符,因此目标值如下:
test_column
6.77
5.92
2.98
I tried with gsub
function in R language, but not very lucky to resolve it, could someone help to figure out ?我尝试在 R 语言中使用gsub
函数,但不是很幸运地解决了它,有人可以帮忙弄清楚吗?
I would use:我会用:
input <- c("6.77[9]", "5.92[10]", "2.98[103]")
gsub("\\[.*?\\]", "", input)
[1] "6.77" "5.92" "2.98"
The regex pattern \\[.*?\\]
should match any quoted terms in square brackets, and using gsub
would tell R to replace all such terms.正则表达式\\[.*?\\]
应该匹配方括号中任何引用的术语,并且使用gsub
会告诉 R 替换所有这些术语。
You can use sub
and remove everything after square brackets.您可以使用sub
并删除方括号后的所有内容。
df$test_column <- sub("\\[.*", "", df$test_column)
df
# test_column
#1 6.77
#2 5.92
#3 2.98
You might want to wrap the output from the sub
in as.numeric
.您可能希望将sub
的输出包装在as.numeric
。
If there is always a numeric value ahead as shown in the example you can also use parse_number
如果前面总是有一个数值,如示例中所示,您也可以使用parse_number
readr::parse_number(df$test_column)
#[1] 6.77 5.92 2.98
data数据
df <- structure(list(test_column = c("6.77[9]", "5.92[10]", "2.98[103]"
)), row.names = c(NA, -3L), class = "data.frame")
We can use str_remove
from stringr
我们可以使用str_remove
的stringr
library(stringr)
library(dplyr)
df %>%
mutate(test_column = str_remove(test_column, "\\[.*"))
# test_column
#1 6.77
#2 5.92
#3 2.98
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.