[英]R gsub numbers and space from variables
With gsub I am able to remove the #
from these person variables, however the way I am trying to remove the random number is not correct. 使用gsub,我可以从这些人员变量中删除
#
,但是我尝试删除随机数的方法不正确。 I also would like to remove the space after the persons name as well but keep the space in the middle of the name. 我也想删除人员姓名后的空格,但将空格保留在姓名中间。
c('mike smith #99','John johnson #2','jeff johnson #50') -> person
c(1:99) -> numbers
person <- gsub("#", "", person, fixed=TRUE)
# MY ISSUE
person <- gsub(numbers, "", person, fixed=TRUE)
df <- data.frame(PERSON = person)
Current Results: 当前结果:
PERSON
mike smith 99
John johnson 2
jeff johnson 50
Expected Results: 预期成绩:
PERSON
mike smith
John johnson
jeff johnson
Here's another pattern as an alternative: 这是另一种替代方法:
> gsub("(\\.*)\\s+#.*", "\\1", person)
[1] "mike smith" "John johnson" "jeff johnson"
In the above regex, (\\\\.*)
will match a subgroup of any characters before a space ( \\\\s+
) following by #
symbol and following by anything. 在上面的正则表达式中,
(\\\\.*)
将匹配空格( \\\\s+
)之前的任何字符的子组, \\\\s+
#
符号,后跟任何符号。 Then \\\\1
indicates that gsub
should replace all the original string with that subgroup (\\\\.*)
然后
\\\\1
表示gsub
应该用该子组(\\\\.*)
替换所有原始字符串(\\\\.*)
An easier way to get your desired output is : 一种获得所需输出的简单方法是:
> gsub("\\s+#.*$", "", person)
[1] "mike smith" "John johnson" "jeff johnson"
The above regex \\\\s+#.*$
indicates that everything consisting of space ( \\\\s+
), a #
symbol and everyting else until the end of string ( \\.$
) should be removed. 上面的正则表达式
\\\\s+#.*$
表示应删除所有由空格( \\\\s+
), #
符号和其他所有字符组成的字符串,直到字符串结尾( \\.$
)。
Using str_extract_all
from stringr package 使用
str_extract_all
从stringr包
> library(stringr)
> str_extract_all(person, "[[a-z]]+", simplify = TRUE)
[,1] [,2]
[1,] "mike" "smith"
[2,] "ohn" "johnson"
[3,] "jeff" "johnson"
Also you can use: 您也可以使用:
library(stringi)
stri_extract_all(person, regex="[[a-z]]+", simplify=TRUE)
c('mike smith #99','John johnson #2','jeff johnson #50') -> person
sub("\\s+#.*", "", person)
[1] "mike smith" "John johnson" "jeff johnson"
We can create the pattern with paste
我们可以用
paste
创建图案
pat <- paste0("\\s*#(", paste(numbers, collapse = "|"), ")")
gsub(pat, "", person)
#[1] "mike smith" "John johnson" "jeff johnson"
Note that the above solution was based on creating pattern with 'numbers'. 请注意,以上解决方案基于使用“数字”创建模式。 If it is only to remove the numbers after the
#
including it 如果只是删除包含它的
#
号之后的数字
sub("\\s*#\\d+$", "", person)
#[1] "mike smith" "John johnson" "jeff johnson"
Or another option is 或另一个选择是
unlist(strsplit(person, "\\s*#\\d+"))
NOTE: All the above are base R
methods 注意:以上所有都是
base R
方法
library(tidyverse)
data_frame(person) %>%
separate(person, into = c("person", "notneeded"), "\\s+#") %>%
select(person)
This could alternately be done with read.table
. 也可以使用
read.table
完成此操作。
read.table(text = person, sep = "#", strip.white = TRUE,
as.is = TRUE, col.names = "PERSON")
giving: 赠送:
PERSON
1 mike smith
2 John johnson
3 jeff johnson
An alternative that deletes any sequence of non (lowercase) alphabetic characters at the end of the string. 另一种选择是删除字符串末尾的任何非(小写)字母字符序列。
gsub("[^a-z]+$", "", person)
[1] "mike smith" "John johnson" "jeff johnson"
If you want to allow for words that are all upper case or end with an uppercase character. 如果要允许全部为大写或以大写字符结尾的单词。
gsub("[^a-zA-Z]+$", "", person)
Some names might end with .
有些名称可能以结尾
.
: :
gsub("[^a-zA-Z.]+$", "", person)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.