删除（不间断的）字符串中的空格字符

Question

This question seems to make it easy to remove space characters in a string in R. However when I load the following table I'm not able to remove a space between two numbers (eg. 11 846.4 ): 这个问题似乎可以很容易地删除 R 中字符串中的空格字符。但是，当我加载下表时，我无法删除两个数字之间的空格（例如11 846.4 ）：

require(XML)
require(RCurl)
require(data.table)

link2fetch = 'https://www.destatis.de/DE/Themen/Branchen-Unternehmen/Landwirtschaft-Forstwirtschaft-Fischerei/Feldfruechte-Gruenland/Tabellen/ackerland-hauptnutzungsarten-kulturarten.html'

theurl = getURL(link2fetch, .opts = list(ssl.verifypeer = FALSE) ) # important!
area_cult10 = readHTMLTable(theurl, stringsAsFactors = FALSE)
area_cult10 = rbindlist(area_cult10)
    
test = sub(',', '.', area_cult10$V5) # change , to . 
test = gsub('(.+)\\s([A-Z]{1})*', '\\1', test) # remove LETTERS
gsub('\\s', '', test[1]) # remove white space?

Why can't I remove the space in test[1] ?为什么我不能删除test[1]中的空格？ Thanks for any advice?谢谢你的建议？ Can this be something else than a space character.这可以是空格字符以外的东西吗？ Maybe the answer is really easy and I'm overlooking something.也许答案真的很简单，我忽略了一些东西。

Answer 1

You may shorten the test creation to just 2 steps and using just 1 PCRE regex (note the perl=TRUE parameter):您可以将test创建缩短到仅 2 个步骤并仅使用 1 个PCRE正则表达式（注意perl=TRUE参数）：

test = sub(",", ".", gsub("(*UCP)[\\s\\p{L}]+|\\W+$", "", area_cult10$V5, perl=TRUE), fixed=TRUE)

Result:结果：

 [1] "11846.4" "6529.2"  "3282.7"  "616.0"   "1621.8"  "125.7"   "14.2"   
 [8] "401.6"   "455.5"   "11.7"    "160.4"   "79.1"    "37.6"    "29.6"   
[15] ""        "13.9"    "554.1"   "236.7"   "312.8"   "4.6"     "136.9"  
[22] "1374.4"  "1332.3"  "1281.8"  "3.7"     "5.0"     "18.4"    "23.4"   
[29] "42.0"    "2746.2"  "106.6"   "2100.4"  "267.8"   "258.4"   "13.1"   
[36] "23.5"    "11.6"    "310.2"

The gsub regex is worth special attention: gsub正则表达式值得特别注意：

(*UCP) - the PCRE verb that enforces the pattern to be Unicode aware (*UCP) - 强制模式识别 Unicode 的 PCRE 动词
[\\s\\p{L}]+ - matches 1+ whitespace or letter characters [\\s\\p{L}]+ - 匹配 1+ 个空格或字母字符
| - or (an alternation operator) - 或（交替运算符）
\\W+$ - 1+ non-word chars at the end of the string. \\W+$ - 字符串末尾的 1+ 个非单词字符。

Then, sub(",", ".", x, fixed=TRUE) will replace the first , with a .然后， sub(",", ".", x, fixed=TRUE)将用一个替换第一个, . as literal strings, fixed=TRUE saves performance since it does not have to compile a regex.作为文字字符串， fixed=TRUE可以节省性能，因为它不必编译正则表达式。

删除（不间断的）字符串中的空格字符

问题描述

1 个解决方案

解决方案1
6 已采纳 2017-05-02 09:56:12

删除（不间断的）字符串中的空格字符

问题描述

1 个解决方案

解决方案1 6 已采纳 2017-05-02 09:56:12

解决方案1
6 已采纳 2017-05-02 09:56:12