如果不同，则用字符串将列分开

Question

I have a data frame, and I want to spli their columns if they contain different strings or words. 我有一个数据框，如果它们包含不同的字符串或单词，我想拆分它们的列。

I am tryng different methods in R and it is not working 我正在尝试使用R中的其他方法，但无法正常工作

My data frame looks like this: 我的数据框如下所示：

df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), y = c("TAP1", "TAP2", "TAP2", "TAP3" ))

And, for example, I am trying with the first column this: 而且，例如，我正在尝试第一列：

df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"))
df %>% separate(x, c("TAP1", "TAP2"), extra = "drop", fill = "right")

but is not working 但不起作用

I am having the next output: 我有下一个输出：

TAP1 TAP2
1 <NA> <NA>
2 TAP1 <NA>
3 TAP1 <NA>
4 TAP2 <NA>

My expected output is: 我的预期输出是：

 TAP1 TAP2
1 <NA> <NA>
2 TAP1 <NA>
3 TAP1 <NA>
4 <NA> TAP2

And I would like to do the same for all columns in the complete data frame where I have different combinations of words like TAP1, TAP2, TAP3 ... etc. 我想对完整数据框中的所有列执行相同的操作，在这些列中，我会使用不同的单词组合，例如TAP1，TAP2，TAP3 ...等。

In this example, the final table taking in to account column x and y would be. 在此示例中，考虑表x和y的最终表将是。

 df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), y = c("TAP1", "TAP2",   "TAP2", "TAP3" ))

  TAP1 TAP2 TAP1.1 TAP2.2 TAP3.3
1 <NA> <NA> TAP1   <NA>   <NA>
2 TAP1 <NA> <NA>   TAP2   <NA>
3 TAP1 <NA> <NA>   TAP2   <NA>
4 <NA> TAP2 <NA>   <NA>   TAP3

Answer 1

We can do this with spread 我们可以通过spread做到这一点

library(tidyverse)
df %>% 
   mutate(n = row_number()) %>% 
   group_by(x) %>% 
   mutate(rn = row_number(), y = x) %>%
   spread(y, x) %>% 
   select(TAP1, TAP2)
# A tibble: 4 x 2
#  TAP1  TAP2 
#  <fct> <fct>
#1 <NA>  <NA> 
#2 TAP1  <NA> 
#3 TAP1  <NA> 
#4 <NA>  TAP2

With multiple columns, we can gather and spread 通过多列，我们可以gather和spread

rownames_to_column(df, 'rn') %>%
   gather(key, val, -rn) %>%
   mutate(val1 = val) %>% 
   unite(val, val,key) %>% 
   group_by(val) %>%    # not really need for this example
   mutate(ind = row_number()) %>% # not needed here though
   spread(val, val1) %>%
   select(starts_with("TAP"))
# A tibble: 4 x 5
# TAP1_x TAP1_y TAP2_x TAP2_y TAP3_y
#  <chr>  <chr>  <chr>  <chr>  <chr> 
#1 <NA>   TAP1   <NA>   <NA>   <NA>  
#2 TAP1   <NA>   <NA>   TAP2   <NA>  
#3 TAP1   <NA>   <NA>   TAP2   <NA>  
#4 <NA>   <NA>   TAP2   <NA>   TAP3

Answer 2

Here's a base solution. 这是一个基本解决方案。 This goes through all possible factors in your column (ie, TAP1 , TAP2 ) and checks whereabouts they're present. 这将遍历您列中的所有可能因素（即TAP1 ， TAP2 ），并检查它们的下落。 For places where they are present, it returns the name of the level. 对于存在它们的地方，它将返回级别的名称。 For places where they're absent, it returns NA . 对于不存在的地方，它将返回NA 。 Then, I repackage the resulting list into a data frame and rename the columns. 然后，我将结果列表重新打包到数据框中，并重命名列。

# Original data frame
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"))

# Repackage
df2 <- data.frame(lapply(levels(df$x), function(x)ifelse(df$x == x, x, NA)))

# Fix names
names(df2) <- levels(df$x)

# Check results
df2
#>   TAP1 TAP2
#> 1 <NA> <NA>
#> 2 TAP1 <NA>
#> 3 TAP1 <NA>
#> 4 <NA> TAP2

^{Created on 2019-05-29 by the reprex package (v0.3.0)} ^{由reprex软件包（v0.3.0）创建于2019-05-29}

In light of your update: 根据您的更新：

# Original data frame
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), 
                 y = c("TAP1", "TAP2",   "TAP2", "TAP3" ))

# Define splitter function
splitter <- function(foo){
  tmp <- data.frame(lapply(levels(foo), function(x)ifelse(foo == x, x, NA)))
  names(tmp) <- levels(foo)
  tmp
}

# Run over data frame and bind together
do.call(cbind, lapply(df, splitter))
#>   x.TAP1 x.TAP2 y.TAP1 y.TAP2 y.TAP3
#> 1   <NA>   <NA>   TAP1   <NA>   <NA>
#> 2   TAP1   <NA>   <NA>   TAP2   <NA>
#> 3   TAP1   <NA>   <NA>   TAP2   <NA>
#> 4   <NA>   TAP2   <NA>   <NA>   TAP3

^{Created on 2019-05-29 by the reprex package (v0.3.0)} ^{由reprex软件包（v0.3.0）创建于2019-05-29}

Same rationale as before, but I define a function that is applied to each column and the results are bound together using do.call and cbind . 与之前相同，但我定义了一个应用于每个列的函数，并且使用do.call和cbind将结果绑定在一起。

Answer 3

A solution using the tidyverse and the dummies package. 使用A溶液tidyverse和dummies包。 df3 is the final output. df3是最终输出。

library(tidyverse)
library(dummies)

df2 <- dummy.data.frame(df) %>% select(-ends_with("NA"))

cols <- str_remove(names(df2), regex("^x|^y"))

df3 <- modify2(df2, cols, ~ifelse(.x == 0, NA, .y))

df3
#   xTAP1 xTAP2 yTAP1 yTAP2 yTAP3
# 1  <NA>  <NA>  TAP1  <NA>  <NA>
# 2  TAP1  <NA>  <NA>  TAP2  <NA>
# 3  TAP1  <NA>  <NA>  TAP2  <NA>
# 4  <NA>  TAP2  <NA>  <NA>  TAP3

如果不同，则用字符串将列分开

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-05-29 15:58:27

解决方案2
1 2019-05-29 16:11:35

解决方案3
1 2019-05-29 16:30:40

如果不同，则用字符串将列分开

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-05-29 15:58:27

解决方案2 1 2019-05-29 16:11:35

解决方案3 1 2019-05-29 16:30:40

解决方案1
3 已采纳 2019-05-29 15:58:27

解决方案2
1 2019-05-29 16:11:35

解决方案3
1 2019-05-29 16:30:40