简体   繁体   English

根据其他两列中的匹配字符串创建第三列

[英]Creating a third column based from matching strings from two other columns

I am trying to calculate and create a new column for the score correct on a test. 我正在尝试为测试中正确的分数计算并创建一个新列。 Recall.CRESP is a column specifying the correct answers on a test selected through grid coordinates. Recall.CRESP是一列,用于指定通过网格坐标选择的测试的正确答案。 Recall.RESP shows participants response. Recall.RESP显示参与者的响应。

These columns look something like this: 这些列如下所示:

|Recall.CRESP                     |Recall.RESP                      |
|---------------------------------|---------------------------------|           
|grid35grid51grid12grid43grid54   |grid35grid51grid12grid43grid54   |                
|grid11gird42gird22grid51grid32   |grid11gird15gird55grid42grid32   |

So for example in row 1 of this table, the participant got 5/5 correct as the grid coordinates of Recall.CRESP matches with Recall.RESP . 因此,例如在该表的第1行,参与者有5/5正确的网格坐标的Recall.CRESP与匹配Recall.RESP However in row 2, the participant only got 2/5 correct as only the first and the last grid coordinate are identical. 但是,在第2行中,参与者只有2/5正确,因为只有第一个和最后一个网格坐标相同。 The order of the coordinates must match to be correct. 坐标顺序必须匹配才能正确。

My new column should show 5 and 2 for the two rows respectively. 我的新列应分别为两行显示5和2。 I am unsure how to split apart the grid coordinates and also to tell R the order must match to be correct. 我不确定如何分割网格坐标,也不能告诉R必须匹配顺序才能正确。

A nice way to handle this is with list columns, wherein you can store a whole set of responses or values in a way that is easy to iterate over. 列表列是解决此问题的一种好方法,其中您可以以易于迭代的方式存储整套响应或值。 In tidyverse grammar, 在tidyverse语法中

library(tidyverse)

responses <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"), 
                        Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))

scored <- responses %>% 
    mutate_all(~strsplit(.x, '[^^]g[ri]{2}d')) %>%    # split on all but first "grid"/"gird"
    mutate(correct = map2(Recall.CRESP, Recall.RESP, `==`), 
           score = map_int(correct, sum))

scored
#> # A tibble: 2 x 4
#>   Recall.CRESP Recall.RESP correct   score
#>   <list>       <list>      <list>    <int>
#> 1 <chr [5]>    <chr [5]>   <lgl [5]>     5
#> 2 <chr [5]>    <chr [5]>   <lgl [5]>     2

Pull out the individual columns if you'd like a closer look at the data. 如果您想仔细查看数据,请拉出各个列。

You can do this without tidyverse with a simple mapply and custom split_grid function (I assume only the numbers are relevant): 您可以使用简单的mapply和自定义split_grid函数(不使用tidyverse进行此操作(我假设只有数字才有意义):

df <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"),
                 Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))

split_grid <- function(x) {
    unlist(regmatches(x, gregexpr("[[:digit:]]+", x)))
}

compare <- function(x, y) {
    sum(split_grid(x) == split_grid(y))
}

df$Res <- mapply(compare, df$Recall.CRESP, df$Recall.RESP)

# A tibble: 2 x 3
  Recall.CRESP                   Recall.RESP                      Res
  <chr>                          <chr>                          <int>
1 grid35grid51grid12grid43grid54 grid35grid51grid12grid43grid54     5
2 grid11gird42gird22grid51grid32 grid11gird15gird55grid42grid32     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM