使用正則表達式在折疊的單詞之間插入空格

Question

我正在研究R中的等值線，並且需要能夠使用match.map（）匹配狀態名稱。 我正在使用的數據集將多個單詞的名稱粘在一起，如NorthDakota和DistrictOfColumbia。

如何使用正則表達式在低位字母序列之間插入空格？ 我已成功添加了一個空格，但無法保留指示空間位置的字母。

places = c("NorthDakota", "DistrictOfColumbia")
gsub("[[:lower:]][[:upper:]]", " ", places)
[1] "Nort akota"       "Distric  olumbia"

Answer 1

使用括號捕獲匹配的表達式，然后使用\\n （在R中為\\\\n ）來檢索它們：

places = c("NorthDakota", "DistrictOfColumbia")
gsub("([[:lower:]])([[:upper:]])", "\\1 \\2", places)
## [1] "North Dakota"         "District Of Columbia"

Answer 2

您希望使用捕獲組捕獲到匹配的上下文，以便您可以返回替換呼叫中的每個匹配組。 要訪問組，請在兩個反斜杠前面加上\\\\然后是組# 。

> places = c('NorthDakota', 'DistrictOfColumbia')
> gsub('([[:lower:]])([[:upper:]])', '\\1 \\2', places)
# [1] "North Dakota"         "District Of Columbia"

另一種方法是，使用perl=T打開PCRE並使用外觀斷言。

> places = c('NorthDakota', 'DistrictOfColumbia')
> gsub('[a-z]\\K(?=[A-Z])', ' ', places, perl=T)
# [1] "North Dakota"         "District Of Columbia"

說明：

\\K轉義序列重置報告的匹配的起始點，不再包括任何以前消耗的字符。 基本上（ 拋棄與此相匹配的所有內容。 ）

[a-z]       # any character of: 'a' to 'z'
\K          # '\K' (resets the starting point of the reported match)
(?=         # look ahead to see if there is:
  [A-Z]     #   any character of: 'A' to 'Z'
)           # end of look-ahead

使用正則表達式在折疊的單詞之間插入空格

問題描述

2 個解決方案

解決方案1
11 已采納 2014-07-14 15:44:12

解決方案2
11 2014-07-14 15:44:22

使用正則表達式在折疊的單詞之間插入空格

問題描述

2 個解決方案

解決方案1 11 已采納 2014-07-14 15:44:12

解決方案2 11 2014-07-14 15:44:22

解決方案1
11 已采納 2014-07-14 15:44:12

解決方案2
11 2014-07-14 15:44:22