删除字符串中空格后的字符 - R Studio 数据清理

Question

I am attempting to clean some data in R Studio.我正在尝试清理 R Studio 中的一些数据。

Here's an example of my data.这是我的数据示例。

LSOA name:
York 009A
Wychavon 014A
Bath and North East Somerset 001A
Aylesbury Vale 008C
Central Bedfordshire 030C

I want to be able to remove the code from the end of each.我希望能够从每个末尾删除代码。 So that the resulting data looks like this:这样生成的数据如下所示：

LSOA name:
York
Wychavon
Bath and North East Somerset
Aylesbury Vale 
Central Bedfordshire

I am quite new to regex so finding this quite difficult.我对正则表达式很陌生，所以发现这很困难。 From what I can tell, as there is a variable number of words before the code, a simple remove characters after a whitespace is not possible.据我所知，由于代码前有可变数量的单词，因此不可能在空格后简单地删除字符。

Any help would be hugely appreciated!任何帮助将不胜感激！

Answer 1

We can use sub to match one or more spaces followed by one or more digits ( \\d+ ) and an upper case letter ( [AZ] ) at the end ( $ ) of the string and replace it with blank ( "" )我们可以使用sub匹配一个或多个空格后跟一个或多个数字（ \\d+ ）和字符串末尾（ $ ）的大写字母（ [AZ] ），并将其替换为空白（ "" ）

df1$name <- sub("\\s+\\d+[A-Z]$", "", df1$name)

-output -输出

df1
#                          name
#1                         York
#2                     Wychavon
#3 Bath and North East Somerset
#4               Aylesbury Vale
#5         Central Bedfordshire

data数据

df1 <- structure(list(name = c("York 009A", "Wychavon 014A", 
"Bath and North East Somerset 001A", 
"Aylesbury Vale 008C", "Central Bedfordshire 030C")), class = "data.frame",
row.names = c(NA, 
-5L))

Answer 2

You can also use lookahead (?=\\s\\d+) and backreference \\1 :您还可以使用前瞻(?=\\s\\d+)和反向引用\\1 ：

sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York"                         "Wychavon"                     "Bath and North East Somerset" "Aylesbury Vale"              
[5] "Central Bedfordshire"

Another option is str_extract and the nagative character class \\D , which matches any char that is not a digit ( trimws removes the whitespace).另一个选项是str_extract和负字符 class \\D ，它匹配任何不是数字的字符（ trimws删除空格）。

library(stringr)
trimws(str_extract(df1$name, "\\D+"))

删除字符串中空格后的字符 - R Studio 数据清理

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-04-17 20:16:12

data数据

解决方案2
0 2021-04-17 21:14:14

删除字符串中空格后的字符 - R Studio 数据清理

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-04-17 20:16:12

data数据

解决方案2 0 2021-04-17 21:14:14

解决方案1
2 已采纳 2021-04-17 20:16:12

解决方案2
0 2021-04-17 21:14:14