替換 sqldf 中的字符串模式

Question

我有一個如下所示的數據框：

    Col1    Col2   Col3
ten: end       5     10
five: nb       7     11
    12:4      12     10
   13:56      15     16

在 R 中使用sqldf包，我想執行以下操作：

將Col1值替換為： character: space with - 。 破折號在開頭和結尾都有空格。

將Col1值替換為： number:number with - 。 破折號的開頭和結尾沒有空格。

預期輸出：

     Col1    Col2   Col3
ten - end       5     10
five - nb       7     11
     12-4      12     10
    13-56      15     16

以下是使用 sqldf 的示例語法：

df <- sqldf("SELECT *, replace([Col1], [character: space], ' - ') [New Col generated] from df")

df <- sqldf("SELECT *, replace([Col1], [number:number], '-') [New Col generated_num] from df")

我嘗試引用此文檔，但仍然沒有運氣： https : //www.rexegg.com/regex-quickstart.html

Answer 1

1)假設只允許問題中顯示的形式將冒號替換為減號，然后將減號后跟空格替換為空格、減號、空格。

library(sqldf)
sqldf("select *, replace(replace([Col1], ':', '-'), '- ', ' - ') as New from df")

給予：

      Col1 Col2 Col3       New
1 ten: end    5   10 ten - end
2 five: nb    7   11 five - nb
3     12:4   12   10      12-4
4    13:56   15   16     13-56

2）如果我們可以假設唯一的形式是數字：數字或字符：字符並且第二種形式不包含數字。

sqldf("select *, 
  case when strFilter(Col1, '0123456789') = '' 
         then replace(Col1, ':', ' -')
       else replace(Col1, ':', '-')
       end as New
  from df")

給予：

      Col1 Col2 Col3       New
1 ten: end    5   10 ten - end
2 five: nb    7   11 five - nb
3     12:4   12   10      12-4
4    13:56   15   16     13-56

3）這首先檢查數字：數字，然后檢查字符：字符只能是數字或小寫字母的字符。

dig <- "0123456789"
diglet <- "0123456789abcdefghijklmnopqrstuvwxyz"

fn$sqldf("select *,
  case when trim(Col1, '$dig') = ':'
         then replace(Col1, ':', '-')
  when trim(Col1, '$diglet') = ': '
          then replace(Col1, ': ', ' - ')
  else Col1 end as New
  from df")

給予：

      Col1 Col2 Col3       New
1 ten: end    5   10 ten - end
2 five: nb    7   11 five - nb
3     12:4   12   10      12-4
4    13:56   15   16     13-56

4）這個提取x：y並檢查x和y是否是數字，如果是，則進行適當的替換，如果不匹配，則提取x：yz，其中y是空格，如果x和z是數字或小寫，則它執行適當的替換，否則返回 Col1。 dig和diglet來自上面。

fn$sqldf("select *, 
  case when trim(substr(Col1, instr(Col1, ':')-1, 3), '$dig') = ':'
         then replace(Col1, ':', '-')
       when trim(substr(Col1, instr(Col1, ':') -1, 4), '$diglet') = ': '
         then replace(Col1, ': ', ' - ')
       else Col1 end as New
  from df")

筆記

可重現形式的輸入是：

Lines <- "Col1,Col2,Col3
ten: end,5,10
five: nb,7,11
12:4,12,10
13:56,15,16"
df <- read.csv(text = Lines, as.is = TRUE, strip.white = TRUE)

替換 sqldf 中的字符串模式

問題描述

1 個解決方案

解決方案1
2 已采納 2019-04-02 20:48:52

筆記

替換 sqldf 中的字符串模式

問題描述

1 個解決方案

解決方案1 2 已采納 2019-04-02 20:48:52

筆記

解決方案1
2 已采納 2019-04-02 20:48:52