創建新列以指示列名在另一個字符串向量（使用dplyr，purrr和stringr）中的位置

Question

給定此示例數據：

require(stringr)
require(tidyverse)

labels <- c("foo", "bar", "baz")
n_rows <- 4

df <- 1:n_rows %>%
  map(~ data.frame(
      block_order=paste(sample(labels, size=length(labels), replace=FALSE),
                        collapse="|"))) %>%
  bind_rows()

df
  block_order
1 foo|bar|baz
2 baz|bar|foo
3 foo|baz|bar
4 foo|bar|baz

我想為labels每個字符串生成一列，該列采用|中該字符串的位置值。 -每行中的分隔序列。

所需的輸出：

  block_order foo bar baz
1 foo|bar|baz   1   2   3
2 baz|bar|foo   3   2   1
3 foo|baz|bar   1   3   2
4 foo|bar|baz   1   2   3

我一直在dplyr / purrr設置中嘗試不同的變化，如本例所示，在該示例中，我map了label每個值，然后嘗試使用str_split match獲得其在block_order位置：

labels %>%
  map(~ df %>%
        transmute(!!.x := match(!!.x, str_split(block_order, 
                                                "\\|", 
                                                simplify=TRUE)))) %>%
  bind_cols(df, .)

但這會產生意外的輸出：

  block_order foo bar baz
1 foo|bar|baz   1   5   2
2 baz|bar|foo   1   5   2
3 foo|baz|bar   1   5   2
4 foo|bar|baz   1   5   2

我不確定這些數字代表什么，或者為什么都一樣。

如果有人可以幫助我弄清楚（a）如何在dplyr / purrr框架中實現所需的輸出，以及（b）為什么此處提出的解決方案提供了所需的輸出，我將非常感激。

Answer 1

我們可以用|分割'block_order' ，使用lapply vector s的list ，獲取具有match的索引， rbind vector s並將其分配以創建新列

labels <- c("foo", "bar", "baz")
df[labels] <- do.call(rbind, lapply(strsplit(df$block_order, "|",
         fixed = TRUE), match, table = labels))

或與tidyverse類似的想法

library(tidyverse)
str_split(df$block_order, "[|]") %>%
       map(~ .x %>% 
              match(table= labels)) %>% 
      do.call(rbind, .) %>% 
      as_tibble %>% 
      set_names(labels) %>%
      bind_cols(df, .)
#   block_order foo bar baz
#1 foo|bar|baz   1   2   3
#2 baz|bar|foo   3   2   1
#3 foo|baz|bar   1   3   2
#4 foo|bar|baz   1   2   3

另一種選擇是使用separate_rows ，它重塑為“長”格式和spread回

rownames_to_column(df, 'rn') %>%
    separate_rows(block_order) %>% 
    group_by(rn) %>% 
    mutate(ind = match(block_order, labels), labels = factor(labels, levels = labels)) %>%
    select(-block_order) %>%
    spread(labels, ind) %>% 
    ungroup %>%
    select(-rn) %>% 
    bind_cols(df, .)

Answer 2

除非出於其他原因需要，否則只要為labels每個值labels第一個匹配項的位置，就不必完全分割字符串， regexpr會為您提供。 map平在labels會給出一個列表與一個元素在每個字符串labels （所以它是一個快速迭代），然后你就可以pmap rank在獲得指標。 使用*_dfr版本將結果簡化為數據框並綁定到原始數據，

library(tidyverse)
set.seed(47)

labels <- c("foo", "bar", "baz")
df <- data_frame(block_order = replicate(10, paste(sample(labels), collapse = "|")))

labels %>% 
    map(~regexpr(.x, df$block_order)) %>% 
    pmap_dfr(~set_names(as.list(rank(c(...))), labels)) %>% 
    bind_cols(df, .)
#> # A tibble: 10 x 4
#>    block_order   foo   bar   baz
#>    <chr>       <dbl> <dbl> <dbl>
#>  1 baz|foo|bar    2.    3.    1.
#>  2 baz|bar|foo    3.    2.    1.
#>  3 bar|foo|baz    2.    1.    3.
#>  4 baz|foo|bar    2.    3.    1.
#>  5 foo|bar|baz    1.    2.    3.
#>  6 baz|foo|bar    2.    3.    1.
#>  7 foo|baz|bar    1.    3.    2.
#>  8 bar|baz|foo    3.    1.    2.
#>  9 baz|foo|bar    2.    3.    1.
#> 10 foo|bar|baz    1.    2.    3.

如果您更喜歡stringr / stringi而不是基礎正則表達式，則可以通過將regexpr調用更改為str_locate(df$block_order, .x)[, "start"]或stringi::stri_locate_first_fixed以相同的方式來處理同一stringi::stri_locate_first_fixed 。

Answer 3

我認為這可能有效：

library(tidyr)
library(purrr)
position_counter <- function(...) {
  row = list(...)
  row %>% map(~which(row == .)) %>% setNames(row)
}

df %>%
  separate(block_order, labels) %>% 
  pmap_df(position_counter)

創建新列以指示列名在另一個字符串向量（使用dplyr，purrr和stringr）中的位置

問題描述

3 個解決方案

解決方案1
5 2018-04-23 02:14:35

解決方案2
4 已采納 2018-04-23 02:43:31

解決方案3
1 2018-04-23 02:17:14

創建新列以指示列名在另一個字符串向量（使用dplyr，purrr和stringr）中的位置

問題描述

3 個解決方案

解決方案1 5 2018-04-23 02:14:35

解決方案2 4 已采納 2018-04-23 02:43:31

解決方案3 1 2018-04-23 02:17:14

解決方案1
5 2018-04-23 02:14:35

解決方案2
4 已采納 2018-04-23 02:43:31

解決方案3
1 2018-04-23 02:17:14