Use cases with higher value on one variable for each case of another variable in R

Question

I am doing a meta-analysis in R. For each study (variable StudyID) I have multiple effect sizes. For some studies I have the same effect size multiple times depending on the level of acquaintance (variable Familiarity) between the subjects.

head(dat)
   studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex  N published
1       1        3.0        5.0           1  0.0462          4        0  44         1
2       1        5.0        2.5           1  0.1335          4        0  44         1
3       1        2.5        3.0           1 -0.1239          4        0  44         1
4       1        2.5        3.5           1  0.2062          4        0  44         1
5       1        2.5        3.0           1 -0.0370          4        0  44         1
6       1        3.0        5.0           1 -0.3850          4        0  44         1

Those are the first rows of the data set. In total there are over 50 studies. Most studies look like study 1 with the same value in "Familiarity" for all effect sizes. In some studies, there are effect sizes with multiple levels of familiarity. For example study 36 as seen below.

head(dat)
      studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex  N published
142      36        1.0        4.5           0  0.1233       5.00        0  311         1
143      36        3.5        3.0           0  0.0428       5.00        0  311         1
144      36        1.0        4.5           0  0.0986       5.00        0  311         1
145      36        1.0        4.5           1 -0.0520       5.00        0  311         1
146      36        1.5        2.5           1 -0.0258       5.00        0  311         1
147      36        3.5        3.0           1  0.1104       5.00        0  311         1
148      36        1.0        4.5           1  0.0282       5.00        0  311         1
149      36        1.0        4.5           2 -0.1724       5.00        0  311         1
150      36        3.5        3.0           2  0.2646       5.00        0  311         1
151      36        1.0        4.5           2 -0.1426       5.00        0  311         1
152      37        3.0        4.0           1  0.0118       5.35        0  123         0
153      37        1.0        4.5           1 -0.3205       5.35        0  123         0
154      37        2.5        3.0           1 -0.2356       5.35        0  123         0
155      37        3.0        2.0           1  0.1372       5.35        0  123         0
156      37        2.5        2.5           1 -0.1401       5.35        0  123         0
157      37        3.0        3.5           1 -0.3334       5.35        0  123         0
158      37        2.5        2.5           1  0.0317       5.35        0  123         0
159      37        1.0        3.0           1 -0.3025       5.35        0  123         0
160      37        1.0        3.5           1 -0.3248       5.35        0  123         0

Now I want for those studies that include multiple levels of familiarity, to take the rows with only one level of familiarity (two seperate versions: one with the lower, one with the higher familiarity). I think that it can be possible with the package dplyr, but I have no real code so far.

In a second step I would like to give those rows unique studyIDs for each level of familiarity (so create out of study 36 three "different" studies).

Thank you in advance!

Answer 1

If you want to use dplyr, you could create an alternate ID or casenum by using group_indices :

df <- df %>%
  mutate(case_num = group_indices(.dots=c("studyID", "Familiarity")))

Answer 2

You could do:

library(dplyr)

df %>%
  group_by(studyID) %>%
  mutate(nDist = n_distinct(Familiarity) > 1) %>%
  ungroup() %>%
  mutate(
    studyID = case_when(nDist ~ paste(studyID, Familiarity, sep = "_"), TRUE ~ studyID %>% as.character),
    nDist = NULL
  )

Output:

# A tibble: 19 x 9
   studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex     N published
   <chr>        <dbl>      <dbl>       <int>   <dbl>      <dbl>    <int> <int>     <int>
 1 36_0           1          4.5           0  0.123        5           0   311         1
 2 36_0           3.5        3             0  0.0428       5           0   311         1
 3 36_0           1          4.5           0  0.0986       5           0   311         1
 4 36_1           1          4.5           1 -0.052        5           0   311         1
 5 36_1           1.5        2.5           1 -0.0258       5           0   311         1
 6 36_1           3.5        3             1  0.110        5           0   311         1
 7 36_1           1          4.5           1  0.0282       5           0   311         1
 8 36_2           1          4.5           2 -0.172        5           0   311         1
 9 36_2           3.5        3             2  0.265        5           0   311         1
10 36_2           1          4.5           2 -0.143        5           0   311         1
11 37             3          4             1  0.0118       5.35        0   123         0
12 37             1          4.5           1 -0.320        5.35        0   123         0
13 37             2.5        3             1 -0.236        5.35        0   123         0
14 37             3          2             1  0.137        5.35        0   123         0
15 37             2.5        2.5           1 -0.140        5.35        0   123         0
16 37             3          3.5           1 -0.333        5.35        0   123         0
17 37             2.5        2.5           1  0.0317       5.35        0   123         0
18 37             1          3             1 -0.302        5.35        0   123         0
19 37             1          3.5           1 -0.325        5.35        0   123         0

Use cases with higher value on one variable for each case of another variable in R

Question

2 answers

solution1
1 2020-03-04 20:49:34

solution2
0 ACCPTED 2020-03-04 20:36:44

Use cases with higher value on one variable for each case of another variable in R

Question

2 answers

solution1 1 2020-03-04 20:49:34

solution2 0 ACCPTED 2020-03-04 20:36:44

solution1
1 2020-03-04 20:49:34

solution2
0 ACCPTED 2020-03-04 20:36:44