新列，计算一列的特定行中的值出现在另一列中的次数

Question

I have tried searching for an answer to this question but it continues to elude me.我曾尝试寻找这个问题的答案，但它仍然让我难以捉摸。 I am working with crime data where each row refers to a specific crime incident, There is a variable for suspect ID.我正在处理犯罪数据，其中每一行都代表一个特定的犯罪事件，嫌疑人 ID 有一个变量。 and a variable for victim ID, These ID numbers are consistent across the two columns (in other words, if a row contains the ID 424 in the victim ID column, and a separate row contains the ID 424 in the suspect column. I know that the same person was listed as a victim in the first crime and as a suspect in the second crime).和一个用于受害者 ID 的变量，这些 ID 号在两列中是一致的（换句话说，如果一行包含受害者 ID 列中的 ID 424，而另一行包含可疑列中的 ID 424。我知道同一个人在第一次犯罪中被列为受害者，在第二次犯罪中被列为嫌疑人）。

I want to create two new variables: one which counts the number of times the victim (in a particular crime incident) has been recorded as a suspect (in the dataset as a whole), and one which counts the number of times the suspect (in a particular crime incident) has been recorded as a victim (in the dataset as a whole).我想创建两个新变量：一个计算受害者（在特定犯罪事件中）被记录为嫌疑人的次数（在整个数据集中），一个计算嫌疑人的次数（在特定犯罪事件中）已被记录为受害者（在整个数据集中）。

Here's a simplified version of my data:这是我的数据的简化版本：

	s.uid s.uid	v.uid v.uid
1 1	1 1	9 9
2 2	2 2	8 8
3 3	3 3	2 2
4 4	4 4	2 2
5 5	5 5	2 2
6 6	NA不适用	7 7
7 7	5 5	6 6
8 8	9 9	5 5

And here is what I want to create:这是我想要创建的：

	s.uid s.uid	v.uid v.uid	s.in.v s.in.v	v.in.s v.in.s
1 1	1 1	9 9	0 0	1 1
2 2	2 2	8 8	3 3	0 0
3 3	3 3	2 2	0 0	1 1
4 4	4 4	2 2	0 0	1 1
5 5	5 5	2 2	1 1	1 1
6 6	NA不适用	7 7	NA不适用	0 0
7 7	5 5	6 6	1 1	0 0
8 8	9 9	5 5	1 1	2 2

Note that, where there is an NA, I would like the NA to be preserved.请注意，如果有 NA，我希望保留 NA。 I'm currently trying to work in tidyverse and piping where possible, so I would prefer answers in that kind of format, but I'm open to any solution!我目前正在尝试在可能的情况下使用 tidyverse 和管道，所以我更喜欢这种格式的答案，但我愿意接受任何解决方案！

Answer 1

Using dplyr :使用dplyr ：

dat %>% 
    group_by(s.uid) %>% 
    mutate(s.in.v = sum(dat$v.uid %in% s.uid)) %>% 
    group_by(v.uid) %>% 
    mutate(v.in.s = sum(dat$s.uid %in% v.uid))

# A tibble: 8 × 4
# Groups:   v.uid [6]
  s.uid v.uid s.in.v v.in.s
  <int> <int>  <int>  <int>
1     1     9      0      1
2     2     8      3      0
3     3     2      0      1
4     4     2      0      1
5     5     2      1      1
6    NA     7      0      0
7     5     6      1      0
8     9     5      1      2

Answer 2

First, a reprex of your data:首先，您的数据的代表：

library(tidyverse)

# Replica of your data:
s.uid <- c(1:5, NA, 5, 9)
v.uid <- c(9, 8, 2, 2, 2, 7, 6, 5)

DF <- tibble(s.uid, v.uid)

Custom function to use:自定义 function 使用：

# function to check how many times "a" (a length 1 atomic vector) occurs in "b":
f <- function(a, b) {
  a <- as.character(a)
  
  # make a lookup table a.k.a dictionary of values in b:
  b_freq <- table(b, useNA = "always")
  
  # if a is in b, return it's frequency:
  if (a %in% names(b_freq)) {
    return(b_freq[a])
  }
  
  # else (ie. a is not in b) return 0:
  return(0)
}

# vectorise that, enabling intake of any length of "a":
ff <- function(a, b) {
  purrr::map_dbl(.x = a, .f = f, b = b)
}

Finally:最后：

DF |> 
  mutate(
    s_in_v = ff(s.uid, v.uid), 
    v_in_s = ff(v.uid, s.uid)
  )

Results in:结果是：

#> # A tibble: 8 × 4
#>   s.uid v.uid s_in_v v_in_s
#>   <dbl> <dbl>  <dbl>  <dbl>
#> 1     1     9      0      1
#> 2     2     8      3      0
#> 3     3     2      0      1
#> 4     4     2      0      1
#> 5     5     2      1      1
#> 6    NA     7     NA      0
#> 7     5     6      1      0
#> 8     9     5      1      2

新列，计算一列的特定行中的值出现在另一列中的次数

问题描述

2 个解决方案

解决方案1
2 2022-01-14 11:53:19

解决方案2
1 2022-01-14 13:15:52

新列，计算一列的特定行中的值出现在另一列中的次数

问题描述

2 个解决方案

解决方案1 2 2022-01-14 11:53:19

解决方案2 1 2022-01-14 13:15:52

解决方案1
2 2022-01-14 11:53:19

解决方案2
1 2022-01-14 13:15:52