[英]remove (non-breaking) space character in string
This question seems to make it easy to remove space characters in a string in R. However when I load the following table I'm not able to remove a space between two numbers (eg. 11 846.4
): 这个问题似乎可以很容易地删除 R 中字符串中的空格字符。但是,当我加载下表时,我无法删除两个数字之间的空格(例如
11 846.4
):
require(XML)
require(RCurl)
require(data.table)
link2fetch = 'https://www.destatis.de/DE/Themen/Branchen-Unternehmen/Landwirtschaft-Forstwirtschaft-Fischerei/Feldfruechte-Gruenland/Tabellen/ackerland-hauptnutzungsarten-kulturarten.html'
theurl = getURL(link2fetch, .opts = list(ssl.verifypeer = FALSE) ) # important!
area_cult10 = readHTMLTable(theurl, stringsAsFactors = FALSE)
area_cult10 = rbindlist(area_cult10)
test = sub(',', '.', area_cult10$V5) # change , to .
test = gsub('(.+)\\s([A-Z]{1})*', '\\1', test) # remove LETTERS
gsub('\\s', '', test[1]) # remove white space?
Why can't I remove the space in test[1]
?为什么我不能删除
test[1]
中的空格? Thanks for any advice?谢谢你的建议? Can this be something else than a space character.
这可以是空格字符以外的东西吗? Maybe the answer is really easy and I'm overlooking something.
也许答案真的很简单,我忽略了一些东西。
You may shorten the test
creation to just 2 steps and using just 1 PCRE regex (note the perl=TRUE
parameter):您可以将
test
创建缩短到仅 2 个步骤并仅使用 1 个PCRE正则表达式(注意perl=TRUE
参数):
test = sub(",", ".", gsub("(*UCP)[\\s\\p{L}]+|\\W+$", "", area_cult10$V5, perl=TRUE), fixed=TRUE)
Result:结果:
[1] "11846.4" "6529.2" "3282.7" "616.0" "1621.8" "125.7" "14.2"
[8] "401.6" "455.5" "11.7" "160.4" "79.1" "37.6" "29.6"
[15] "" "13.9" "554.1" "236.7" "312.8" "4.6" "136.9"
[22] "1374.4" "1332.3" "1281.8" "3.7" "5.0" "18.4" "23.4"
[29] "42.0" "2746.2" "106.6" "2100.4" "267.8" "258.4" "13.1"
[36] "23.5" "11.6" "310.2"
The gsub
regex is worth special attention: gsub
正则表达式值得特别注意:
(*UCP)
- the PCRE verb that enforces the pattern to be Unicode aware (*UCP)
- 强制模式识别 Unicode 的 PCRE 动词[\\s\\p{L}]+
- matches 1+ whitespace or letter characters [\\s\\p{L}]+
- 匹配 1+ 个空格或字母字符|
- or (an alternation operator) \\W+$
- 1+ non-word chars at the end of the string. \\W+$
- 字符串末尾的 1+ 个非单词字符。 Then, sub(",", ".", x, fixed=TRUE)
will replace the first ,
with a .
然后,
sub(",", ".", x, fixed=TRUE)
将用一个替换第一个,
.
as literal strings, fixed=TRUE
saves performance since it does not have to compile a regex.作为文字字符串,
fixed=TRUE
可以节省性能,因为它不必编译正则表达式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.