简体   繁体   English

对于两个其他变量与dplyr的每个唯一组合,仅对分组数据框中的变量求和一次

[英]Sum a variable in a grouped dataframe only once for each unique combination of two other variables with dplyr

I have a long table with repeating combinations of area and cluster . 我有一张长桌子,上面重复着areacluster组合。

counts <-  tibble::tribble(
         ~age,         ~area,          ~cluster, ~norm.to.area,
      "gw_25",   "cingulate",       "cluster_1",          0.03,
      "gw_20",   "cingulate",       "cluster_1",          0.03,
      "gw_18", "hippocampus",       "cluster_1",          0.02,
      "gw_25",      "insula",       "cluster_1",          0.01,
      "gw_20",       "motor",       "cluster_1",          0.01,
      "gw_22",       "motor",       "cluster_1",          0.01,
      "gw_25",       "motor",       "cluster_1",          0.01,
      "gw_14",       "motor",       "cluster_1",          0.01,
      "gw_18",       "motor",       "cluster_1",          0.01,
      "gw_19",       "motor",       "cluster_1",          0.01,
      "gw_17",       "motor",       "cluster_1",          0.01,
      "gw_20",   "occipital",       "cluster_1",          0.01,
      "gw_17",   "occipital",       "cluster_1",          0.01,
      "gw_18",   "occipital",       "cluster_1",          0.01,
      "gw_19",   "occipital",       "cluster_1",          0.01,
      "gw_22",   "occipital",       "cluster_1",          0.01,
      "gw_14",   "occipital",       "cluster_1",          0.01,
      "gw_22",    "parietal",       "cluster_1",             0,
      "gw_25",    "parietal",       "cluster_1",             0,
      "gw_17",    "parietal",       "cluster_1",             0,
      "gw_19",    "parietal",       "cluster_1",             0,
      "gw_20",    "parietal",       "cluster_1",             0,
      "gw_20",         "PFC",       "cluster_1",          0.01,
      "gw_22",         "PFC",       "cluster_1",          0.01,
      "gw_25",         "PFC",       "cluster_1",          0.01
      )

I want to create a new variable, sum.norm.to.area , which is the sum of norm.to.area for each cluster , using the value of norm.to.area only ONCE for each combination of area / subcluster.merge . 我想创建一个新变量sum.norm.to.area ,它是每个clusternorm.to.area的总和, norm.to.area每个area / subcluster.merge norm.to.area组合使用norm.to.area的值。

I've tried to group_by cluster , but this sums the values as many times as a given combination appears. 我尝试对group_by cluster ,但这会根据给定组合的出现将这些值相加多次。

counts %>% group_by(cluster) %>% mutate(sum.norm.to.area = sum(norm.to.area)

Thanks for your advice. 谢谢你的建议。

UPDATE 1: 更新1:

Tried using summarize as suggested below, but the same thing occurs (except, of course, without adding as a new column): 尝试使用汇总,如下所示,但是发生了相同的事情(当然,除了没有添加为新列):

> counts %>% group_by(subcluster.merge, area) %>% summarize(sum(norm.to.area))

    tibble::tribble(
      ~cluster .       ,           ~area, ~sum.norm.to.area.,
            "cluster_1",           "PFC",               0.06,
            "cluster_1", "somatosensory",               0.05,
            "cluster_1",         "motor",               0.07,
            "cluster_1",      "parietal",                  0,
            "cluster_1",      "temporal",               0.03,
            "cluster_1",     "occipital",               0.06,
            "cluster_1",   "hippocampus",               0.02,
            "cluster_1",        "insula",               0.01,
            "cluster_1",     "cingulate",               0.06,
        "cluster_10-34",           "PFC",               0.42,
        "cluster_10-34", "somatosensory",               0.35,
        "cluster_10-34",         "motor",               0.48,
        "cluster_10-34",      "parietal",               0.36,
        "cluster_10-34",      "temporal",               0.28,
        "cluster_10-34",     "occipital",                0.4,
        "cluster_10-34",   "hippocampus",               0.12,
        "cluster_10-34",        "insula",                  0,
        "cluster_10-34",     "cingulate",                  0,
           "cluster_11",           "PFC",               0.18,
           "cluster_11", "somatosensory",               0.15,
           "cluster_11",         "motor",               0.14,
           "cluster_11",      "parietal",               0.12,
           "cluster_11",      "temporal",               0.04,
           "cluster_11",     "occipital",               0.18,
           "cluster_11",   "hippocampus",               0.02
      )

UPDATE 2 更新2

This is the output that I want, but the way I'm arriving at it is too convoluted. 这是我想要的输出,但是我到达它的方式太复杂了。 I'd like to find an easier way using mutate and not having to use join . 我想找到一种使用mutate而不需要使用join的简便方法。

 > tmp <- counts %>% distinct(area, cluster, .keep_all = TRUE) %>%
 add_count(cluster, wt = norm.to.area, name = "sum.norm.to.area")

counts %>% left_join(tmp, by = c("cluster", "area"))

Desired output: sum.norm.to.area is the result of adding norm.to.area (only once) for all unique combinations of area and cluster : 所需的输出: sum.norm.to.area是为areacluster所有唯一组合添加norm.to.area的结果(仅一次):

     tibble::tribble(
         ~age,           ~area,          ~cluster, ~norm.to.area, ~sum.norm.to.area,
      "gw_25",     "cingulate",       "cluster_1",          0.03,              0.11,
      "gw_20",     "cingulate",       "cluster_1",          0.03,              0.11,
      "gw_18",   "hippocampus",       "cluster_1",          0.02,              0.11,
      "gw_25",        "insula",       "cluster_1",          0.01,              0.11,
      "gw_20",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_22",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_25",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_14",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_18",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_19",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_17",         "motor",       "cluster_1",          0.01,              0.11,
      "gw_20",     "occipital",       "cluster_1",          0.01,              0.11,
      "gw_17",     "occipital",       "cluster_1",          0.01,              0.11,
      "gw_18",     "occipital",       "cluster_1",          0.01,              0.11,
      "gw_19",     "occipital",       "cluster_1",          0.01,              0.11,
      "gw_22",     "occipital",       "cluster_1",          0.01,              0.11,
      "gw_14",     "occipital",       "cluster_1",          0.01,              0.11,
      "gw_22",      "parietal",       "cluster_1",             0,              0.11,
      "gw_25",      "parietal",       "cluster_1",             0,              0.11,
      "gw_17",      "parietal",       "cluster_1",             0,              0.11,
      "gw_19",      "parietal",       "cluster_1",             0,              0.11,
      "gw_20",      "parietal",       "cluster_1",             0,              0.11,
      "gw_20",           "PFC",       "cluster_1",          0.01,              0.11,
      "gw_22",           "PFC",       "cluster_1",          0.01,              0.11,
      "gw_25",           "PFC",       "cluster_1",          0.01,              0.11,
      "gw_18",           "PFC",       "cluster_1",          0.01,              0.11,
      "gw_19",           "PFC",       "cluster_1",          0.01,              0.11,
      "gw_17",           "PFC",       "cluster_1",          0.01,              0.11,
      "gw_22", "somatosensory",       "cluster_1",          0.01,              0.11,
      "gw_20", "somatosensory",       "cluster_1",          0.01,              0.11,
      "gw_25", "somatosensory",       "cluster_1",          0.01,              0.11,
      "gw_18", "somatosensory",       "cluster_1",          0.01,              0.11,
      "gw_19", "somatosensory",       "cluster_1",          0.01,              0.11,
      "gw_25",      "temporal",       "cluster_1",          0.01,              0.11,
      "gw_19",      "temporal",       "cluster_1",          0.01,              0.11,
      "gw_20",      "temporal",       "cluster_1",          0.01,              0.11
      )

Using dplyr we can group_by cluster and sum only the unique value in each area . 使用dplyr我们可以对每个area的唯一值进行group_by clustersum

library(dplyr)

counts %>%
   group_by(cluster) %>%
   mutate(sum.norm = sum(norm.to.area[!duplicated(area)]))


#   age   area        cluster   norm.to.area sum.norm
#   <chr> <chr>       <chr>            <dbl>    <dbl>
# 1 gw_25 cingulate   cluster_1         0.03     0.09
# 2 gw_20 cingulate   cluster_1         0.03     0.09
# 3 gw_18 hippocampus cluster_1         0.02     0.09
# 4 gw_25 insula      cluster_1         0.01     0.09
# 5 gw_20 motor       cluster_1         0.01     0.09
# 6 gw_22 motor       cluster_1         0.01     0.09
# 7 gw_25 motor       cluster_1         0.01     0.09
# 8 gw_14 motor       cluster_1         0.01     0.09
# 9 gw_18 motor       cluster_1         0.01     0.09
#10 gw_19 motor       cluster_1         0.01     0.09
# … with 15 more rows

我认为您不是在寻找mutate()

counts %>% group_by(cluster, area) %>% summarize(sum.norm.to.area = sum(norm.to.area))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为数据集中两个变量的每个组合保存变量的唯一值 - Save unique values of variable for each combination of two variables in a dataset 对 r 中每个唯一变量组合的行求和 - Sum rows of each unique combination of variables in r 在 R 中,如果其他两列中的值组合是唯一的,则取多个变量的总和 - In R, take sum of multiple variables if combination of values in two other columns are unique Map dplyr function to each combination of variable pairs in an R dataframe - Map dplyr function to each combination of variable pairs in an R dataframe 对于分组数据帧(dplyr)R中的每个元素,值的总和大于或等于 - Sum of values greater than or equal too for each element in grouped dataframe (dplyr) R 通过两列的唯一组合获得最小分组 - Get minimum grouped by unique combination of two columns 基于两个分组变量的变量总和-上一年 - Sum of a variable based on two grouped variables - for the previous year 如何在 R dataframe 中将数据从第 i 行第 2 列更新到第 j 行第 1 列但由两个变量 (dplyr) 分组? - How to update data from column i row 2 to column j row 1 but grouped by two variables (dplyr) in a R dataframe? 计算由r中的两个其他列的唯一组合分组的列中的成对值的出现 - Count occurrence of pair wise values in a column grouped by a unique combination of two other columns in r 循环遍历数据框:计算每个唯一变量的值的每个成对组合。 - Loop through a dataframe: counting each pairwise combination of a value for each unique variable.
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM