[英]Creating a third column based from matching strings from two other columns
I am trying to calculate and create a new column for the score correct on a test. 我正在尝试为测试中正确的分数计算并创建一个新列。
Recall.CRESP
is a column specifying the correct answers on a test selected through grid coordinates. Recall.CRESP
是一列,用于指定通过网格坐标选择的测试的正确答案。 Recall.RESP
shows participants response. Recall.RESP
显示参与者的响应。
These columns look something like this: 这些列如下所示:
|Recall.CRESP |Recall.RESP |
|---------------------------------|---------------------------------|
|grid35grid51grid12grid43grid54 |grid35grid51grid12grid43grid54 |
|grid11gird42gird22grid51grid32 |grid11gird15gird55grid42grid32 |
So for example in row 1 of this table, the participant got 5/5 correct as the grid coordinates of Recall.CRESP
matches with Recall.RESP
. 因此,例如在该表的第1行,参与者有5/5正确的网格坐标的
Recall.CRESP
与匹配Recall.RESP
。 However in row 2, the participant only got 2/5 correct as only the first and the last grid coordinate are identical. 但是,在第2行中,参与者只有2/5正确,因为只有第一个和最后一个网格坐标相同。 The order of the coordinates must match to be correct.
坐标顺序必须匹配才能正确。
My new column should show 5 and 2 for the two rows respectively. 我的新列应分别为两行显示5和2。 I am unsure how to split apart the grid coordinates and also to tell R the order must match to be correct.
我不确定如何分割网格坐标,也不能告诉R必须匹配顺序才能正确。
A nice way to handle this is with list columns, wherein you can store a whole set of responses or values in a way that is easy to iterate over. 列表列是解决此问题的一种好方法,其中您可以以易于迭代的方式存储整套响应或值。 In tidyverse grammar,
在tidyverse语法中
library(tidyverse)
responses <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"),
Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))
scored <- responses %>%
mutate_all(~strsplit(.x, '[^^]g[ri]{2}d')) %>% # split on all but first "grid"/"gird"
mutate(correct = map2(Recall.CRESP, Recall.RESP, `==`),
score = map_int(correct, sum))
scored
#> # A tibble: 2 x 4
#> Recall.CRESP Recall.RESP correct score
#> <list> <list> <list> <int>
#> 1 <chr [5]> <chr [5]> <lgl [5]> 5
#> 2 <chr [5]> <chr [5]> <lgl [5]> 2
Pull out the individual columns if you'd like a closer look at the data. 如果您想仔细查看数据,请拉出各个列。
You can do this without tidyverse
with a simple mapply
and custom split_grid
function (I assume only the numbers are relevant): 您可以使用简单的
mapply
和自定义split_grid
函数(不使用tidyverse
进行此操作(我假设只有数字才有意义):
df <- data_frame(Recall.CRESP = c("grid35grid51grid12grid43grid54", "grid11gird42gird22grid51grid32"),
Recall.RESP = c("grid35grid51grid12grid43grid54", "grid11gird15gird55grid42grid32"))
split_grid <- function(x) {
unlist(regmatches(x, gregexpr("[[:digit:]]+", x)))
}
compare <- function(x, y) {
sum(split_grid(x) == split_grid(y))
}
df$Res <- mapply(compare, df$Recall.CRESP, df$Recall.RESP)
# A tibble: 2 x 3
Recall.CRESP Recall.RESP Res
<chr> <chr> <int>
1 grid35grid51grid12grid43grid54 grid35grid51grid12grid43grid54 5
2 grid11gird42gird22grid51grid32 grid11gird15gird55grid42grid32 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.