简体   繁体   English

从R中向量中不同长度的字符串中提取字符值

[英]Extract character values from strings of different lengths in a vector in R

I have a vector that looks like the following:我有一个如下所示的向量:

**name**
a1
a2
a3
b1_z
b2_3z
b32z

I would like the output to only include the letters in each of these strings, not any numbers or symbols.我希望输出只包含每个字符串中的字母,而不是任何数字或符号。 Like this:像这样:

**name**
a
a
a
bz
bz
bz

I have tried using the following code:我尝试使用以下代码:

df$name <- stri_extract_all_regex(df$name, "[a-z]+") 

I get this result:我得到这个结果:

**name**
a
a
a
c("b", "z")
c("b", "z")
c("b", "z") 

How do I combine the values that are two separate strings into a single string?如何将两个单独字符串的值组合成一个字符串? In particular, how do I do this when some of the values in the vector already contain only one string?特别是,当向量中的某些值已经只包含一个字符串时,我该怎么做? I am also open to other solutions for extracting characters from strings that get around this issue.我也愿意接受其他解决方案来从字符串中提取字符来解决这个问题。

Please try gsub like below请尝试像下面这样的gsub

df$name <- gsub("[^[:alpha:]]","",df$name)

where non-alphabet characters are replaced by "" .其中非字母字符被替换为""

We will get我们将获得

> df
  name
1    a
2    a
3    a
4   bz
5   bz
6   bz

Data数据

> dput(df)
structure(list(name = c("a1", "a2", "a3", "b1_z", "b2_3z", "b32z"
)), class = "data.frame", row.names = c(NA, -6L))

You can do it like this using the gsub function:您可以使用gsub函数这样做:

vals = c('a1', 'a2', 'b1_z', 'b2_3z')
df = data.frame(vals)

df$name = gsub("[^[:alpha:]]", "", df$vals)
print(df)

Output will look like this:输出将如下所示:

  name
1    a
2    a
3   bz
4   bz

An option with str_remove str_remove一个选项

library(stringr)
str_remove_all(df$name, "[0-9_]+")
#[1] "a"  "a"  "a"  "bz" "bz" "bz"

data数据

df <- structure(list(name = c("a1", "a2", "a3", "b1_z", "b2_3z", "b32z"
)), class = "data.frame", row.names = c(NA, -6L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM