刪除字符串中空格后的字符 - R Studio 數據清理

Question

我正在嘗試清理 R Studio 中的一些數據。

這是我的數據示例。

LSOA name:
York 009A
Wychavon 014A
Bath and North East Somerset 001A
Aylesbury Vale 008C
Central Bedfordshire 030C

我希望能夠從每個末尾刪除代碼。 這樣生成的數據如下所示：

LSOA name:
York
Wychavon
Bath and North East Somerset
Aylesbury Vale 
Central Bedfordshire

我對正則表達式很陌生，所以發現這很困難。 據我所知，由於代碼前有可變數量的單詞，因此不可能在空格后簡單地刪除字符。

任何幫助將不勝感激！

Answer 1

我們可以使用sub匹配一個或多個空格后跟一個或多個數字（ \\d+ ）和字符串末尾（ $ ）的大寫字母（ [AZ] ），並將其替換為空白（ "" ）

df1$name <- sub("\\s+\\d+[A-Z]$", "", df1$name)

-輸出

df1
#                          name
#1                         York
#2                     Wychavon
#3 Bath and North East Somerset
#4               Aylesbury Vale
#5         Central Bedfordshire

數據

df1 <- structure(list(name = c("York 009A", "Wychavon 014A", 
"Bath and North East Somerset 001A", 
"Aylesbury Vale 008C", "Central Bedfordshire 030C")), class = "data.frame",
row.names = c(NA, 
-5L))

Answer 2

您還可以使用前瞻(?=\\s\\d+)和反向引用\\1 ：

sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York"                         "Wychavon"                     "Bath and North East Somerset" "Aylesbury Vale"              
[5] "Central Bedfordshire"

另一個選項是str_extract和負字符 class \\D ，它匹配任何不是數字的字符（ trimws刪除空格）。

library(stringr)
trimws(str_extract(df1$name, "\\D+"))

刪除字符串中空格后的字符 - R Studio 數據清理

問題描述

2 個解決方案

解決方案1
2 已采納 2021-04-17 20:16:12

數據

解決方案2
0 2021-04-17 21:14:14

刪除字符串中空格后的字符 - R Studio 數據清理

問題描述

2 個解決方案

解決方案1 2 已采納 2021-04-17 20:16:12

數據

解決方案2 0 2021-04-17 21:14:14

解決方案1
2 已采納 2021-04-17 20:16:12

解決方案2
0 2021-04-17 21:14:14