使用tidyr :: separate用sep =“”將一列分成多個列

Question

df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE)

df
 category sequence
1        X     AAT.G
2        Y     CCG-T

我想將列sequence分為5列（每個字符一個）。 我試圖用tidyr::separate做到這一點，但是它在內部使用stringi::stri_split_regex ，它不接受空字符串作為分隔符（盡管sep參數應該使用正則表達式）。

library(tidyr)
separate(df, sequence, into = paste0("V", 1:5), sep="")

Error: Values not split into 5 pieces at 1, 2
In addition: Warning messages:
1: In stringi::stri_split_regex(value, sep, n_max) :
  empty search patterns are not supported
2: In stringi::stri_split_regex(value, sep, n_max) :
  empty search patterns are not supported

預期輸出如下所示：

  category V1 V2 V3 V4 V5
1        X  A  A  T  .  G
2        Y  C  C  G  -  T

Answer 1

你可以用tidyr extract做到這tidyr

library(tidyr)
extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
#  category V1 V2 V3 V4 V5
#1        X  A  A  T  .  G
#2        Y  C  C  G  -  T

或者創建一個分隔符gsub和使用，作為sep的separator

library(dplyr)
library(tidyr)
df %>% 
   mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>% 
   separate(sequence, into=paste0('V', 1:5), sep=",")
#  category V1 V2 V3 V4 V5
#1        X  A  A  T  .  G
#2        Y  C  C  G  -  T

或者您可以使用cSplit

library(splitstackshape)
setnames(cSplit(df, 'sequence', '', stripWhite=FALSE),
             2:6, paste0('V', 1:5))[]
#   category V1 V2 V3 V4 V5
#1:        X  A  A  T  .  G
#2:        Y  C  C  G  -  T

Answer 2

sep可以是整數向量。 使用sep=1:4就足夠了，但是5也可以，而且看起來更好。

df %>% separate(sequence, into = paste0("V", 1:5), sep = 1:5)

給予：

  category V1 V2 V3 V4 V5
1        X  A  A  T  .  G
2        Y  C  C  G  -  T

使用tidyr :: separate用sep =“”將一列分成多個列

問題描述

2 個解決方案

解決方案1
4 已采納 2015-03-10 04:38:58

解決方案2
1 2019-06-15 23:11:32

使用tidyr :: separate用sep =“”將一列分成多個列

問題描述

2 個解決方案

解決方案1 4 已采納 2015-03-10 04:38:58

解決方案2 1 2019-06-15 23:11:32

解決方案1
4 已采納 2015-03-10 04:38:58

解決方案2
1 2019-06-15 23:11:32