[英]Extract character values from strings of different lengths in a vector in R
I have a vector that looks like the following:我有一个如下所示的向量:
**name**
a1
a2
a3
b1_z
b2_3z
b32z
I would like the output to only include the letters in each of these strings, not any numbers or symbols.我希望输出只包含每个字符串中的字母,而不是任何数字或符号。 Like this:
像这样:
**name**
a
a
a
bz
bz
bz
I have tried using the following code:我尝试使用以下代码:
df$name <- stri_extract_all_regex(df$name, "[a-z]+")
I get this result:我得到这个结果:
**name**
a
a
a
c("b", "z")
c("b", "z")
c("b", "z")
How do I combine the values that are two separate strings into a single string?如何将两个单独字符串的值组合成一个字符串? In particular, how do I do this when some of the values in the vector already contain only one string?
特别是,当向量中的某些值已经只包含一个字符串时,我该怎么做? I am also open to other solutions for extracting characters from strings that get around this issue.
我也愿意接受其他解决方案来从字符串中提取字符来解决这个问题。
Please try gsub
like below请尝试像下面这样的
gsub
df$name <- gsub("[^[:alpha:]]","",df$name)
where non-alphabet characters are replaced by ""
.其中非字母字符被替换为
""
。
We will get我们将获得
> df
name
1 a
2 a
3 a
4 bz
5 bz
6 bz
Data数据
> dput(df)
structure(list(name = c("a1", "a2", "a3", "b1_z", "b2_3z", "b32z"
)), class = "data.frame", row.names = c(NA, -6L))
You can do it like this using the gsub
function:您可以使用
gsub
函数这样做:
vals = c('a1', 'a2', 'b1_z', 'b2_3z')
df = data.frame(vals)
df$name = gsub("[^[:alpha:]]", "", df$vals)
print(df)
Output will look like this:输出将如下所示:
name
1 a
2 a
3 bz
4 bz
An option with str_remove
str_remove
一个选项
library(stringr)
str_remove_all(df$name, "[0-9_]+")
#[1] "a" "a" "a" "bz" "bz" "bz"
df <- structure(list(name = c("a1", "a2", "a3", "b1_z", "b2_3z", "b32z"
)), class = "data.frame", row.names = c(NA, -6L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.