除以# 开头的单词外，用于删除数字的正则表达式

Question

I have some strings that can contain letters, numbers and '#' symbol.我有一些可以包含字母、数字和“#”符号的字符串。

I would like to remove digits except for the words that start with '#'我想删除除以“#”开头的单词以外的数字

Here is an example:下面是一个例子：

"table9 dolv5e #10n #dec10 #nov8e 23 hello"

And the expected output is:预期的输出是：

"table dolve #10n #dec10 #nov8e  hello"

How can I do this with regex, stringr or gsub?如何使用 regex、stringr 或 gsub 执行此操作？

Answer 1

How about capturing the wanted and replacing the unwanted with empty (non captured).如何捕获想要的并用空的（未捕获的）替换不需要的。

gsub("(#\\S+)|\\d+","\\1",x)

See demo at regex101 or R demo at tio.run (I have no experience with R)请参阅 regex101 中的演示或tio.run 中的 R 演示（我没有使用 R 的经验）

My Answer is assuming, that there is always whitespace between #foo bar #baz2 .我的答案是假设#foo bar #baz2之间总是有空格。 If you have something like #foo1,bar2:#baz3 4 , use \\w (word character) instead of \\S (non whitespace).如果您有类似#foo1,bar2:#baz3 4 ，请使用\\w （单词字符）而不是\\S （非空格）。

Answer 2

You could split the string on spaces, remove digits from tokens if they don't start with '#' and paste back:您可以在空格上拆分字符串，如果标记不以“#”开头并粘贴回，则从标记中删除数字：

x <- "table9 dolv5e #10n #dec10 #nov8e 23 hello"
y <- unlist(strsplit(x, ' '))
paste(ifelse(startsWith(y, '#'), y, sub('\\d+', '', y)), collapse = ' ')
# output 
[1] "table dolve #10n #dec10 #nov8e  hello"

Answer 3

You use gsub to remove digits, for example:您可以使用 gsub 删除数字，例如：

gsub("[0-9]","","table9")
"table"

And we can split your string using strsplit:我们可以使用 strsplit 拆分您的字符串：

STRING = "table9 dolv5e #10n #dec10 #nov8e 23 hello"
strsplit(STRING," ")
[[1]]
[1] "table9" "dolv5e" "#10n"   "#dec10" "#nov8e" "23"     "hello"

We just need to iterate through STRING, with gsub, applying it only to elements that do not have "#"我们只需要使用 gsub 遍历 STRING，仅将其应用于没有“#”的元素

STRING = unlist(strsplit(STRING," "))
no_hex = !grepl("#",STRING)
STRING[no_hex] = gsub("[0-9]","",STRING[no_hex])
paste(STRING,collapse=" ")
[1] "table dolve #10n #dec10 #nov8e  hello"

Answer 4

Base R solution:基础 R 解决方案：

unlisted_strings <- unlist(strsplit(X, "\\s+"))

Y <- paste0(na.omit(ifelse(grepl("[#]", unlisted_strings),

                           unlisted_strings,

                           gsub("\\d+", "", unlisted_strings))), collapse = " ")

Y

Data:数据：

X <- as.character("table9 dolv5e #10n #dec10 #nov8e 23 hello")

Answer 5

INPUT = "table9 dolv5e #10n #dec10 #nov8e 23 hello";
OUTPUT = INPUT.match(/[^#\d]+(#\w+|[A-Za-Z]+\w*)/gi).join('');

You can remove flags i , cause it was case insensitive您可以删除标志i ，因为它不区分大小写

Use this pattern: [^#\\d]+(#\\w+|[A-Za-Z]+\\w*)使用这种模式： [^#\\d]+(#\\w+|[A-Za-Z]+\\w*)

[^#\\d]+ = character start with no # and digits #\\w+ = find # followed by digit or letter [A-Za-z]+\\w* = find letter followed by letter and/or number ^ | [^#\\d]+ = 字符以无 # 和数字开头#\\w+ = 查找 # 后跟数字或字母[A-Za-z]+\\w* = 查找字母后跟字母和/或数字 ^ | You can change this with \\D+\\S* = find any character not just when the first is letter and not just followed by letter and/or number.您可以使用\\D+\\S* = find 任何字符来更改它，而不仅仅是在第一个是字母时，而不仅仅是后跟字母和/或数字。 I am not put as \\w+\\w* cause \\w same as = [\\w\\d] .我没有被当作\\w+\\w*导致\\w与 = [\\w\\d] 。

I tried the code in JavaScript and it work.我尝试了 JavaScript 中的代码，它工作正常。 If you want match not only followed by letter you can use code如果您不仅要匹配字母，还可以使用代码

除以# 开头的单词外，用于删除数字的正则表达式

问题描述

5 个解决方案

解决方案1
6 已采纳 2019-12-07 11:06:31

解决方案2
5 2019-12-07 10:43:36

解决方案3
1 2019-12-07 10:42:59

解决方案4
0 2019-12-07 10:41:13

解决方案5
0 2019-12-07 11:43:49

除以# 开头的单词外，用于删除数字的正则表达式

问题描述

5 个解决方案

解决方案1 6 已采纳 2019-12-07 11:06:31

解决方案2 5 2019-12-07 10:43:36

解决方案3 1 2019-12-07 10:42:59

解决方案4 0 2019-12-07 10:41:13

解决方案5 0 2019-12-07 11:43:49

解决方案1
6 已采纳 2019-12-07 11:06:31

解决方案2
5 2019-12-07 10:43:36

解决方案3
1 2019-12-07 10:42:59

解决方案4
0 2019-12-07 10:41:13

解决方案5
0 2019-12-07 11:43:49