R-使用正則表達式和ifelse條件從字符串中分離文本時出錯

Question

我想做的是從有“：”的地方從字符串中剝離文本。

假設我的文字包含：

 text$Text[[3]] = "There is a horror movie running in the iNox theater. : Can we go?"

我想要創建一個數據框，例如：

  Col1                                                    Col2
  There is a horror movie running in the iNox theater.    Can we go?

我正在嘗試使用以下內容：

 df = data.frame(Text = strsplit(text$Text[[3]], 
                 ifelse(":", ":", text$Text[[3]]))[[1]], stringsAsFactors = F)

dat3$Text[[3]]因為文本在行號中。 文本$ Text中的3。

但是上述ifelse()邏輯無效。 在這里，我嘗試使用ifelse條件，以便如果文本中包含“：”，請使用“：”，否則請使用完整的文本。 因此，這意味着如果沒有“：”，則結果將類似於以下內容：

 text$Text[[3]] = "Hi Mom, You there. Can I go to Jimmy's house?"

 Col1                                                 Col2
 Hi Mom, You there. Can I go to Jimmy's house?         NA

如何正確做？

請注意有一個陷阱：

如果文本中有兩個“：”怎么辦？
我只想考慮在前兩行中而不是在文本其余部分中的“：”？

Answer 1

我發現以下內容太復雜了，比我更了解正則表達式的人一定會提出更好的解決方案。

test <- c(
"There is a horror movie running in the iNox theater. : Can we go?",
"Hi Mom, You there. Can I go to Jimmy's house?",
"Hi : How are you : Lets go")

fun <- function(x, pattern = ":"){
    re <- regexpr(pattern, x)
    res <- sapply(seq_along(re), function(i){
        if(re[i] > 0){
            Col1 <- trimws(substring(x[i], 1, re[i] - 1))
            Col2 <- trimws(substring(x[i], re[i] + 1))
        } else {
            Col1 <- x[i]
            Col2 <- NA
        }
        c(Col1 = Col1, Col2 = Col2)
    })
    as.data.frame(t(res))
}

fun(test)

Answer 2

您實際上不需要if語句。 正則表達式旨在處理此類情況。

對於只有一個符號的數據的第一種情況-在此示例中為冒號（“：”）–我們可以使用以下代碼：

x <- "There is a horror movie running in the iNox theater. : Can we go?"

data.frame(Col1=gsub("(.*)+\\s[:]\\s+(.*)","\\1",x), 
           Col2=gsub("(.*)+\\s[:]\\s+(.*)","\\2",x))

輸出：

                                                  Col1            Col2
1 There is a horror movie running in the iNox theater.      Can we go?

現在，假設您的字符串中有多個符號，並且希望能夠將信息保留在第一列的第一個符號之前，並將信息保留在第二列的第一個符號之后。 為此，請嘗試使用“？” 正則表達式符號，如下所示：

x <- "There is a horror movie running in the iNox theater. : Can we go? : Please?"

data.frame(Col1=gsub("\\s\\:.*$","\\1",x), 
           Col2=gsub("^[^:]+(?:).\\s","\\1",x))

輸出：

                                                  Col1                      Col2
1 There is a horror movie running in the iNox theater.      Can we go? : Please?

有關在R中使用正則表達式符號的更多信息，這是一個有用的參考。

Answer 3

test <- "There is a horror movie running in the iNox theater. : Can we go?"
df = data.frame(Col1 = strsplit(test,":")[[1]][1],
                Col2 = strsplit(test,":")[[1]][2],
                stringsAsFactors = F)
df
#                                                   Col1        Col2
#1 There is a horror movie running in the iNox theater.   Can we go?

請注意，strsplit（）輸出的異常第一行由[[1]]組成。 與[R]顯示向量的方式類似，[[1]]表示R正在顯示列表的第一個元素。

Answer 4

您可以使用包縱梁

library(stringr) 
str_split_fixed("HI : How are you : Lets go", ":", 3)

在上面的函數str_split_fixed中， “嗨：您好：如何放手”是您要使用的句子或字符串， “：”是字符串中的分隔符，而3是您希望將字符串拆分為的列數

在您的情況下，最后一個值應為2，因為您想分成兩列

R-使用正則表達式和ifelse條件從字符串中分離文本時出錯

問題描述

4 個解決方案

解決方案1
4 2017-09-23 21:38:07

解決方案2
4 2017-09-23 21:46:15

解決方案3
3 2017-09-23 20:58:29

解決方案4
3 2017-09-23 21:04:58

R-使用正則表達式和ifelse條件從字符串中分離文本時出錯

問題描述

4 個解決方案

解決方案1 4 2017-09-23 21:38:07

解決方案2 4 2017-09-23 21:46:15

解決方案3 3 2017-09-23 20:58:29

解決方案4 3 2017-09-23 21:04:58

解決方案1
4 2017-09-23 21:38:07

解決方案2
4 2017-09-23 21:46:15

解決方案3
3 2017-09-23 20:58:29

解決方案4
3 2017-09-23 21:04:58