R gsub数和变量空间

Question

With gsub I am able to remove the # from these person variables, however the way I am trying to remove the random number is not correct. 使用gsub，我可以从这些人员变量中删除# ，但是我尝试删除随机数的方法不正确。 I also would like to remove the space after the persons name as well but keep the space in the middle of the name. 我也想删除人员姓名后的空格，但将空格保留在姓名中间。

c('mike smith #99','John johnson #2','jeff johnson #50') -> person

c(1:99) -> numbers

person <- gsub("#", "", person, fixed=TRUE)

# MY ISSUE
person <- gsub(numbers, "", person, fixed=TRUE)

df <- data.frame(PERSON = person)

Current Results: 当前结果：

PERSON
mike smith 99
John johnson 2
jeff johnson 50

Expected Results: 预期成绩：

PERSON
mike smith
John johnson
jeff johnson

Answer 1

Here's another pattern as an alternative: 这是另一种替代方法：

> gsub("(\\.*)\\s+#.*", "\\1", person)
[1] "mike smith"   "John johnson" "jeff johnson"

In the above regex, (\\\\.*) will match a subgroup of any characters before a space ( \\\\s+ ) following by # symbol and following by anything. 在上面的正则表达式中， (\\\\.*)将匹配空格（ \\\\s+ ）之前的任何字符的子组， \\\\s+ #符号，后跟任何符号。 Then \\\\1 indicates that gsub should replace all the original string with that subgroup (\\\\.*) 然后\\\\1表示gsub应该用该子组(\\\\.*)替换所有原始字符串(\\\\.*)

An easier way to get your desired output is : 一种获得所需输出的简单方法是：

> gsub("\\s+#.*$", "", person)
[1] "mike smith"   "John johnson" "jeff johnson"

The above regex \\\\s+#.*$ indicates that everything consisting of space ( \\\\s+ ), a # symbol and everyting else until the end of string ( \\.$ ) should be removed. 上面的正则表达式\\\\s+#.*$表示应删除所有由空格（ \\\\s+ ）， #符号和其他所有字符组成的字符串，直到字符串结尾（ \\.$ ）。

Using str_extract_all from stringr package 使用str_extract_all从stringr包

> library(stringr)
> str_extract_all(person, "[[a-z]]+", simplify = TRUE)
     [,1]   [,2]     
[1,] "mike" "smith"  
[2,] "ohn"  "johnson"
[3,] "jeff" "johnson"

Also you can use: 您也可以使用：

library(stringi)
stri_extract_all(person, regex="[[a-z]]+", simplify=TRUE)

Answer 2

c('mike smith #99','John johnson #2','jeff johnson #50') -> person
sub("\\s+#.*", "", person)
[1] "mike smith"   "John johnson" "jeff johnson"

Answer 3

We can create the pattern with paste 我们可以用paste创建图案

pat <- paste0("\\s*#(", paste(numbers, collapse = "|"), ")")
gsub(pat, "", person)
#[1] "mike smith"   "John johnson" "jeff johnson"

Note that the above solution was based on creating pattern with 'numbers'. 请注意，以上解决方案基于使用“数字”创建模式。 If it is only to remove the numbers after the # including it 如果只是删除包含它的#号之后的数字

sub("\\s*#\\d+$", "", person)
#[1] "mike smith"   "John johnson" "jeff johnson"

Or another option is 或另一个选择是

unlist(strsplit(person, "\\s*#\\d+"))

NOTE: All the above are base R methods 注意：以上所有都是base R方法

library(tidyverse)
data_frame(person) %>% 
      separate(person, into = c("person", "notneeded"), "\\s+#") %>% 
      select(person)

Answer 4

This could alternately be done with read.table . 也可以使用read.table完成此操作。

read.table(text = person, sep = "#", strip.white = TRUE, 
  as.is = TRUE, col.names = "PERSON")

giving: 赠送：

        PERSON
1   mike smith
2 John johnson
3 jeff johnson

Answer 5

An alternative that deletes any sequence of non (lowercase) alphabetic characters at the end of the string. 另一种选择是删除字符串末尾的任何非（小写）字母字符序列。

gsub("[^a-z]+$", "", person)
[1] "mike smith"   "John johnson" "jeff johnson"

If you want to allow for words that are all upper case or end with an uppercase character. 如果要允许全部为大写或以大写字符结尾的单词。

gsub("[^a-zA-Z]+$", "", person)

Some names might end with . 有些名称可能以结尾. : ：

gsub("[^a-zA-Z.]+$", "", person)

R gsub数和变量空间

问题描述

5 个解决方案

解决方案1
1 2018-10-01 16:12:42

解决方案2
1 已采纳 2018-10-07 15:42:29

解决方案3
0 2018-10-01 16:09:28

解决方案4
0 2018-10-01 16:14:04

解决方案5
0 2018-10-01 16:32:31

R gsub数和变量空间

问题描述

5 个解决方案

解决方案1 1 2018-10-01 16:12:42

解决方案2 1 已采纳 2018-10-07 15:42:29

解决方案3 0 2018-10-01 16:09:28

解决方案4 0 2018-10-01 16:14:04

解决方案5 0 2018-10-01 16:32:31

解决方案1
1 2018-10-01 16:12:42

解决方案2
1 已采纳 2018-10-07 15:42:29

解决方案3
0 2018-10-01 16:09:28

解决方案4
0 2018-10-01 16:14:04

解决方案5
0 2018-10-01 16:32:31