简体   繁体   English

正则表达式逗号不在两个数字之间

[英]regex commas not between two numbers

I am looking for a regex for gsub to remove all the unwanted commas: 我正在寻找gsub的正则表达式删除所有不需要的逗号:

Data: 数据:

,,,,,,,12345
12345,1345,1354
123,,,,,,
12345,
,12354

Desired result: 期望的结果:

12345
12345,1345,1354
123
12345
12354

This is the progress I have made so far: 这是我迄今取得的进展:

(,(?!\\d+))

You seem to want to remove all leading and trailing commas. 您似乎想要删除所有前导和尾随逗号。

You may do it with 你可以这样做

gsub("^,+|,+$", "", x)

See the regex demo 请参阅正则表达式演示

The regex contans two alternations, ^,+ matches 1 or more commas at the start and ,+$ matches 1+ commas at the end, and gsub replaces these matches with empty strings. 正则表达式包含两个替换, ^,+在开头匹配1个或多个逗号,+$匹配末尾的1个或多个逗号, gsub用空字符串替换这些匹配。

See R demo 见R演示

x <- c(",,,,,,,12345","12345,1345,1354","123,,,,,,","12345,",",12354")
gsub("^,+|,+$", "", x)
## [1] "12345"           "12345,1345,1354" "123"             "12345"          
## [5] "12354"     

You can also use str_extract from stringr . 您也可以使用str_extractstringr Thanks to greedy matching, you don't have to specify how many times a digit occurs, the longest match is automatically chosen: 由于贪婪匹配,您不必指定数字出现的次数,自动选择最长匹配:

library(dplyr)
library(stringr)

df %>%
  mutate(V1 = str_extract(V1, "\\d.+\\d"))

or if you prefer base R : 或者如果您更喜欢base R

df$V1 = regmatches(df$V1, gregexpr("\\d.+\\d", df$V1))

Result: 结果:

               V1
1           12345
2 12345,1345,1354
3             123
4           12345
5           12354

Data: 数据:

df = read.table(text = ",,,,,,,12345
                12345,1345,1354
                123,,,,,,
                12345,
                ,12354")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM