简体   繁体   English

对于 R 中另一个变量的每种情况,一个变量上具有更高值的用例

[英]Use cases with higher value on one variable for each case of another variable in R

I am doing a meta-analysis in R. For each study (variable StudyID) I have multiple effect sizes.我正在 R 中进行元分析。对于每项研究(可变 StudyID),我都有多个效应量。 For some studies I have the same effect size multiple times depending on the level of acquaintance (variable Familiarity) between the subjects.对于某些研究,我多次使用相同的效果大小,具体取决于受试者之间的熟悉程度(可变熟悉度)。

head(dat)
   studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex  N published
1       1        3.0        5.0           1  0.0462          4        0  44         1
2       1        5.0        2.5           1  0.1335          4        0  44         1
3       1        2.5        3.0           1 -0.1239          4        0  44         1
4       1        2.5        3.5           1  0.2062          4        0  44         1
5       1        2.5        3.0           1 -0.0370          4        0  44         1
6       1        3.0        5.0           1 -0.3850          4        0  44         1

Those are the first rows of the data set.这些是数据集的第一行。 In total there are over 50 studies.总共有 50 多项研究。 Most studies look like study 1 with the same value in "Familiarity" for all effect sizes.大多数研究看起来像研究 1,所有效应大小的“熟悉度”值都相同。 In some studies, there are effect sizes with multiple levels of familiarity.在一些研究中,存在具有多个熟悉程度的效应大小。 For example study 36 as seen below.例如,如下所示的研究 36。

head(dat)
      studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex  N published
142      36        1.0        4.5           0  0.1233       5.00        0  311         1
143      36        3.5        3.0           0  0.0428       5.00        0  311         1
144      36        1.0        4.5           0  0.0986       5.00        0  311         1
145      36        1.0        4.5           1 -0.0520       5.00        0  311         1
146      36        1.5        2.5           1 -0.0258       5.00        0  311         1
147      36        3.5        3.0           1  0.1104       5.00        0  311         1
148      36        1.0        4.5           1  0.0282       5.00        0  311         1
149      36        1.0        4.5           2 -0.1724       5.00        0  311         1
150      36        3.5        3.0           2  0.2646       5.00        0  311         1
151      36        1.0        4.5           2 -0.1426       5.00        0  311         1
152      37        3.0        4.0           1  0.0118       5.35        0  123         0
153      37        1.0        4.5           1 -0.3205       5.35        0  123         0
154      37        2.5        3.0           1 -0.2356       5.35        0  123         0
155      37        3.0        2.0           1  0.1372       5.35        0  123         0
156      37        2.5        2.5           1 -0.1401       5.35        0  123         0
157      37        3.0        3.5           1 -0.3334       5.35        0  123         0
158      37        2.5        2.5           1  0.0317       5.35        0  123         0
159      37        1.0        3.0           1 -0.3025       5.35        0  123         0
160      37        1.0        3.5           1 -0.3248       5.35        0  123         0

Now I want for those studies that include multiple levels of familiarity, to take the rows with only one level of familiarity (two seperate versions: one with the lower, one with the higher familiarity).现在,我希望对于那些包含多个熟悉程度的研究,仅采用一个熟悉程度的行(两个单独的版本:一个具有较低的熟悉程度,一个具有较高的熟悉程度)。 I think that it can be possible with the package dplyr, but I have no real code so far.我认为使用 dplyr 包是可能的,但到目前为止我还没有真正的代码。

In a second step I would like to give those rows unique studyIDs for each level of familiarity (so create out of study 36 three "different" studies).在第二步中,我想为每个熟悉程度的行提供唯一的 studyID(因此在研究 36 中创建三个“不同”的研究)。

Thank you in advance!先感谢您!

If you want to use dplyr, you could create an alternate ID or casenum by using group_indices :如果要使用 dplyr,可以使用group_indices创建备用 ID 或 casenum:

df <- df %>%
  mutate(case_num = group_indices(.dots=c("studyID", "Familiarity")))

You could do:你可以这样做:

library(dplyr)

df %>%
  group_by(studyID) %>%
  mutate(nDist = n_distinct(Familiarity) > 1) %>%
  ungroup() %>%
  mutate(
    studyID = case_when(nDist ~ paste(studyID, Familiarity, sep = "_"), TRUE ~ studyID %>% as.character),
    nDist = NULL
  )

Output:输出:

# A tibble: 19 x 9
   studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex     N published
   <chr>        <dbl>      <dbl>       <int>   <dbl>      <dbl>    <int> <int>     <int>
 1 36_0           1          4.5           0  0.123        5           0   311         1
 2 36_0           3.5        3             0  0.0428       5           0   311         1
 3 36_0           1          4.5           0  0.0986       5           0   311         1
 4 36_1           1          4.5           1 -0.052        5           0   311         1
 5 36_1           1.5        2.5           1 -0.0258       5           0   311         1
 6 36_1           3.5        3             1  0.110        5           0   311         1
 7 36_1           1          4.5           1  0.0282       5           0   311         1
 8 36_2           1          4.5           2 -0.172        5           0   311         1
 9 36_2           3.5        3             2  0.265        5           0   311         1
10 36_2           1          4.5           2 -0.143        5           0   311         1
11 37             3          4             1  0.0118       5.35        0   123         0
12 37             1          4.5           1 -0.320        5.35        0   123         0
13 37             2.5        3             1 -0.236        5.35        0   123         0
14 37             3          2             1  0.137        5.35        0   123         0
15 37             2.5        2.5           1 -0.140        5.35        0   123         0
16 37             3          3.5           1 -0.333        5.35        0   123         0
17 37             2.5        2.5           1  0.0317       5.35        0   123         0
18 37             1          3             1 -0.302        5.35        0   123         0
19 37             1          3.5           1 -0.325        5.35        0   123         0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:按组,测试是否对于一个变量的每个值,该值存在于另一个变量中 - R: By group, test if for each value of one variable, that value exists in another variable r-dplyr:计算同一数据帧中另一个变量的每个唯一值的一个变量中唯一值的频率 - r - dplyr: counting the frequency of unique values in one variable for each unique value of another variable in the same data frame 为R中另一个变量的每个值找到一个变量的最小值 - Finding the minimum value of a variable for each value of another variable in R R:在另一个变量的值为NA的情况下,如何将变量的值重新编码为NA - R: How to recode values of a variable to NA for cases where another variable has a value of NA 根据R中另一个变量的值创建一个变量的rowSum - Creating rowSums of one variable conditional on the value of another variable in R R-如何将一个变量的格添加到其他变量(堆栈变量) - R - how to add cases of one variable to other variable (stack variables) 如何在 R 中创建一个新变量,如果一个个案具有缺失值而另一个变量具有观察值,则该变量返回 1? - How to create a new variable in R that returns 1 if a case has a missing value while another variable has an observed value? R列出另一个变量的每个值的所有不同值 - R list all distinct values for each value of another variable 在R中使用一个变量作为列名并使用另一个变量作为值源进行转换 - Cast using one variable as column name and another as a value source in R 创建一个变量,其值是 R 中另一个变量的名称 - Create a variable with a value that is the name of another variable in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM