简体   繁体   English

新列,计算一列的特定行中的值出现在另一列中的次数

[英]New column which counts the number of times a value in a specific row of one column appears in another column

I have tried searching for an answer to this question but it continues to elude me.我曾尝试寻找这个问题的答案,但它仍然让我难以捉摸。 I am working with crime data where each row refers to a specific crime incident, There is a variable for suspect ID.我正在处理犯罪数据,其中每一行都代表一个特定的犯罪事件,嫌疑人 ID 有一个变量。 and a variable for victim ID, These ID numbers are consistent across the two columns (in other words, if a row contains the ID 424 in the victim ID column, and a separate row contains the ID 424 in the suspect column. I know that the same person was listed as a victim in the first crime and as a suspect in the second crime).和一个用于受害者 ID 的变量,这些 ID 号在两列中是一致的(换句话说,如果一行包含受害者 ID 列中的 ID 424,而另一行包含可疑列中的 ID 424。我知道同一个人在第一次犯罪中被列为受害者,在第二次犯罪中被列为嫌疑人)。

I want to create two new variables: one which counts the number of times the victim (in a particular crime incident) has been recorded as a suspect (in the dataset as a whole), and one which counts the number of times the suspect (in a particular crime incident) has been recorded as a victim (in the dataset as a whole).我想创建两个新变量:一个计算受害者(在特定犯罪事件中)被记录为嫌疑人的次数(在整个数据集中),一个计算嫌疑人的次数(在特定犯罪事件中)已被记录为受害者(在整个数据集中)。

Here's a simplified version of my data:这是我的数据的简化版本:

s.uid s.uid v.uid v.uid
1 1 1 1 9 9
2 2 2 2 8 8
3 3 3 3 2 2
4 4 4 4 2 2
5 5 5 5 2 2
6 6 NA不适用 7 7
7 7 5 5 6 6
8 8 9 9 5 5

And here is what I want to create:这是我想要创建的:

s.uid s.uid v.uid v.uid s.in.v s.in.v v.in.s v.in.s
1 1 1 1 9 9 0 0 1 1
2 2 2 2 8 8 3 3 0 0
3 3 3 3 2 2 0 0 1 1
4 4 4 4 2 2 0 0 1 1
5 5 5 5 2 2 1 1 1 1
6 6 NA不适用 7 7 NA不适用 0 0
7 7 5 5 6 6 1 1 0 0
8 8 9 9 5 5 1 1 2 2

Note that, where there is an NA, I would like the NA to be preserved.请注意,如果有 NA,我希望保留 NA。 I'm currently trying to work in tidyverse and piping where possible, so I would prefer answers in that kind of format, but I'm open to any solution!我目前正在尝试在可能的情况下使用 tidyverse 和管道,所以我更喜欢这种格式的答案,但我愿意接受任何解决方案!

Using dplyr :使用dplyr

dat %>% 
    group_by(s.uid) %>% 
    mutate(s.in.v = sum(dat$v.uid %in% s.uid)) %>% 
    group_by(v.uid) %>% 
    mutate(v.in.s = sum(dat$s.uid %in% v.uid))
# A tibble: 8 × 4
# Groups:   v.uid [6]
  s.uid v.uid s.in.v v.in.s
  <int> <int>  <int>  <int>
1     1     9      0      1
2     2     8      3      0
3     3     2      0      1
4     4     2      0      1
5     5     2      1      1
6    NA     7      0      0
7     5     6      1      0
8     9     5      1      2

First, a reprex of your data:首先,您的数据的代表:

library(tidyverse)

# Replica of your data:
s.uid <- c(1:5, NA, 5, 9)
v.uid <- c(9, 8, 2, 2, 2, 7, 6, 5)

DF <- tibble(s.uid, v.uid)

Custom function to use:自定义 function 使用:

# function to check how many times "a" (a length 1 atomic vector) occurs in "b":
f <- function(a, b) {
  a <- as.character(a)
  
  # make a lookup table a.k.a dictionary of values in b:
  b_freq <- table(b, useNA = "always")
  
  # if a is in b, return it's frequency:
  if (a %in% names(b_freq)) {
    return(b_freq[a])
  }
  
  # else (ie. a is not in b) return 0:
  return(0)
}

# vectorise that, enabling intake of any length of "a":
ff <- function(a, b) {
  purrr::map_dbl(.x = a, .f = f, b = b)
}

Finally:最后:

DF |> 
  mutate(
    s_in_v = ff(s.uid, v.uid), 
    v_in_s = ff(v.uid, s.uid)
  )

Results in:结果是:

#> # A tibble: 8 × 4
#>   s.uid v.uid s_in_v v_in_s
#>   <dbl> <dbl>  <dbl>  <dbl>
#> 1     1     9      0      1
#> 2     2     8      3      0
#> 3     3     2      0      1
#> 4     4     2      0      1
#> 5     5     2      1      1
#> 6    NA     7     NA      0
#> 7     5     6      1      0
#> 8     9     5      1      2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R中:创建一个新列,该列计算一个值在一个列中出现的次数,但从另一列中排除NA值 - in R: make a new column that counts the number of times a value appears in one column but excludes NA values from another column 如何创建一个新列,其中计算变量值在 R 的列中出现的次数 - How to create a new column with counts of how many times a variable value appears in a column in R R:累计计算列值出现在其他列中的次数 - R: Cumulatively count number of times column value appears in other column 以唯一性为条件删除列值出现次数少于给定次数的行 - Deleting rows for which a column value appears fewer than a given number of times conditional on uniqueness 如何计算一行的列值大于另一行的分组对的数量? - How can I count the number of grouped pairs in which one row's column value is greater than another? 如何计算每个唯一 ID 列中值出现的次数? - How to count number of times value appears in column for each unique id? 使用一列中的值指定从哪一行检索新列的值 - Using the value in one column to specify from which row to retrieve a value for a new column 计算字符串在列中出现的次数 - count number of times string appears in a column 如何在 data.frame 中创建一个新列,以便该列计算该 data.frame 中不同行的数量? - How to make a new column in a data.frame so that column counts the number of different row in that data.frame? 如果与该行对应的另一列中的值首次在两年内动态出现,则计算一列中的值的数量 - count number of values in one column if the value in another column corresponding to this row is first occurrence dynamically within two years
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM