R Dataframe列中的單詞

Question

我有一個數據框，列中的單詞由單個空格分隔。 我想把它分成三種類型，如下所示。 數據框如下所示。

Text
one of the
i want to

我想把它分成如下。

Text         split1     split2    split3
one of the    one       one of     of the

我能夠取得第一名。 無法弄清楚其他兩個。

我的代碼得到split1：

new_data$split1<-sub(" .*","",new_data$Text)

找出split2：

df$split2 <- gsub(" [^ ]*$", "", df$Text)

Answer 1

可能會有更優雅的解決方案。 這有兩個選擇：

使用ngrams ：

library(dplyr); library(tm)
df %>% mutate(splits = strsplit(Text, "\\s+")) %>% 
       mutate(split1 = lapply(splits, `[`, 1)) %>% 
       mutate(split2 = lapply(splits, function(words) ngrams(words, 2)[[1]]), 
              split3 = lapply(splits, function(words) ngrams(words, 2)[[2]])) %>% 
       select(-splits)

        Text split1  split2   split3
1 one of the    one one, of  of, the
2  i want to      i i, want want, to

手動提取兩克：

df %>% mutate(splits = strsplit(Text, "\\s+")) %>% 
       mutate(split1 = lapply(splits, `[`, 1)) %>% 
       mutate(split2 = lapply(splits, `[`, 1:2), 
              split3 = lapply(splits, `[`, 2:3)) %>% 
       select(-splits)

        Text split1  split2   split3
1 one of the    one one, of  of, the
2  i want to      i i, want want, to

更新：

使用正則表達式，我們可以使用gsub的后向引用。

Split2：

gsub("((.*)\\s+(.*))\\s+(.*)", "\\1", df$Text)
[1] "one of" "i want"

Split3：

gsub("(.*)\\s+((.*)\\s+(.*))", "\\2", df$Text)
[1] "of the"  "want to"

Answer 2

我們可以試試gsub 。 捕獲一個或多個非空白空間（ \\\\S+ ）作為一個組（在這種情況下有3個單詞），然后在替換中，我們重新排列反向引用並插入一個分隔符（ , ），我們用它來轉換為不同的列與read.table 。

 df1[paste0("split", 1:3)] <- read.table(text=gsub("(\\S+)\\s+(\\S+)\\s+(\\S+)", 
                  "\\1,\\1 \\2,\\2 \\3", df1$Text), sep=",")
df1
#        Text split1 split2  split3
#1 one of the    one one of  of the
#2  i want to      i i want want to

數據

df1 <- structure(list(Text = c("one of the", "i want to")), 
.Names = "Text", class = "data.frame", row.names = c(NA, -2L))

Answer 3

這是一個hackish解決方案。

假設： - 你不關心兩個單詞之間的空格數。

> library(stringr)
> x<-c('one of the','i want to')
> strsplit(gsub('(\\S+)\\s+(\\S+)\\s+(.*)', '\\1  \\1 \\2   \\2 \\3', x), '\\s\\s+')
#[[1]]
#[1] "one"    "one of" "of the"

#[[2]]
#[1] "i"       "i want"  "want to"

R Dataframe列中的單詞

問題描述

3 個解決方案

解決方案1
0 2016-06-03 13:46:02

解決方案2
0 已采納 2016-06-03 15:21:42

數據

解決方案3
0 2016-06-03 15:23:01

R Dataframe列中的單詞

問題描述

3 個解決方案

解決方案1 0 2016-06-03 13:46:02

解決方案2 0 已采納 2016-06-03 15:21:42

數據

解決方案3 0 2016-06-03 15:23:01

解決方案1
0 2016-06-03 13:46:02

解決方案2
0 已采納 2016-06-03 15:21:42

解決方案3
0 2016-06-03 15:23:01