简体   繁体   English

R Regex / gsub:如何折叠字符串中的空格

[英]R Regex / gsub : How to collapse spaces in a string

I have a vector of sentences that were scanned from handwritten documents. 我有一个从手写文件扫描的句子向量。 In the process there were some spacing problems like this: 在这个过程中有一些像这样的间距问题:

 The d og is br own.

I was curious if there was a way to generically take any pattern with '_x_' or space-character-space and collapse the second space like this: 我很好奇是否有办法通常使用'_x_'或空格字符空间采用任何模式并折叠第二个空格,如下所示:

The d og is br own.  --> The dog is br own.

I'm only worried about a single character between the spaces ( '_x_' NOT '_xx_' ). 我只担心空格之间的单个字符( '_x_''_xx_' )。

Any suggestions? 有什么建议?

Maybe 也许

> x<-"The d og is br own."
> gsub(" (.) "," \\1",x)
[1] "The dog is br own."

or 要么

gsub(" ([[:alnum:]]) "," \\1",x)

(.) matches anything ([[:alnum:]]) matches alphanumeric characters only. (.)匹配任何东西([[:alnum:]])仅匹配字母数字字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM