格式化字符串以在R / Shiny中搜索引擎樣式

Question

我正在研究一個看似簡單的問題，盡管這似乎是一個令人討厭的正則表達式計算。

我正在設計一個閃亮的應用程序，允許用戶在數據庫中搜索字符串並計算字符串匹配的數量。

從stringer包中，我的最終呼叫是：

str_count(text, pattern=REGEX(user_input))

我的目標是將用戶輸入轉換為適當的正則表達式-同時允許用戶以標准搜索字詞格式輸入數據。

因此，以下用戶輸入：

artist picasso "picasso painting" france

應該形成以下正則表達式：

artist|picasso|picasso painting|france

由於引號，解決方案知道將“畢加索繪畫”視為一個詞。

任何幫助表示贊賞！

Answer 1

這是基本的R解決方案：

regex.escape <- function(string) {
  gsub("([][{}()+*^${|\\\\?])", "\\\\\\1", string)
}

sort.by.length.desc <- function (v) v[order( -nchar(v)) ] 

s <- "artist picasso \"picasso (painting)\" france zoo"
keys <- c(t(read.table(text=s, header=FALSE)))          # Read in the values
keys <- sort.by.length.desc(keys)                       # Sort the values
pattern = paste(regex.escape(keys), collapse="|")       # Create the pattern
## Test
## cat(pattern, sep="\n")                               # This shows the regex pattern
txt <- "The artist was born in france and named picasso picasso (painting)"
length(unlist(gregexpr(pattern, txt)))                  # Count the number of occurrences
[1] 4

參見R演示。 有4個匹配項，因此輸出為4 。

詳細資料 ：

regex.escape函數轉義正則表達式引擎可能解釋為特殊字符的最重要字符
sort.by.length.desc按字符向量長度降序排列字符向量的項
c(t(read.table(text=s, header=FALSE)))讀取用戶輸入並將其作為字符向量存儲在keys
pattern = paste(regex.escape(keys), collapse="|")創建帶有交替運算符的模式（看起來像picasso \$painting\$|picasso|artist|france|zoo ， cat(pattern, sep="\\n")將結果模式顯示為文字字符串）
length(unlist(gregexpr(pattern, txt)))行使用base R gregexpr函數對匹配的發生進行計數。

Answer 2

通過使用"[^"]*"|\\S+進行全局匹配將其拆分。
盲目刪除前導/尾隨雙引號^"|"$ 。
將匹配項推入數組。
對頂部最長的數組進行排序（降序為？）。
用\\\\$1替換每個元素的元字符([\\[$^()*+|{}-\\\\]) 。
最后，將元素與交替| 。

格式化字符串以在R / Shiny中搜索引擎樣式

問題描述

2 個解決方案

解決方案1
2 已采納 2017-07-21 20:28:50

解決方案2
0

格式化字符串以在R / Shiny中搜索引擎樣式

問題描述

2 個解決方案

解決方案1 2 已采納 2017-07-21 20:28:50

解決方案2 0

解決方案1
2 已采納 2017-07-21 20:28:50

解決方案2
0