用于标记重复项的用户定义函数

Question

I'm trying to create a function that will identify duplicate samples (rows) based off of an ID number and create a new column which will write the test order of the duplicated samples (if any).我正在尝试创建一个函数，该函数将根据 ID 号识别重复样本（行）并创建一个新列，该列将写入重复样本（如果有）的测试顺序。

The duplicate samples will have the same ID, but will have a secondary ID that is sequential.重复样本将具有相同的 ID，但将具有一个连续的辅助 ID。 Below is an example of what I mean.下面是我的意思的一个例子。

Example data:示例数据：

df <- data.frame(ID1=c(2528,2528,2528,2530,2533,2533),
                 ID2=c("G_54", "G_55", "G_53", "G_99", "G_252", "G_253"),
                 RESULT=c(.235, .237, .236, .325, .445, .446))
df
#    ID1   ID2 RESULT
# 1 2528  G_54  0.235
# 2 2528  G_55  0.237
# 3 2528  G_53  0.236
# 4 2530  G_99  0.325
# 5 2533 G_252  0.445
# 6 2533 G_253  0.446

I would like the result to look like this:我希望结果如下所示：

#expected output
#  ID1  ID2    RESULT   RUN
# 2528  G_54    0.235   RUN2
# 2528  G_55    0.237   RUN3
# 2528  G_53    0.236   RUN1
# 2530  G_99    0.325   SINGLE
# 2533  G_252   0.445   RUN1
# 2533  G_253   0.446   RUN2

Answer 1

Using dplyr:使用 dplyr：

library(dplyr)

df %>% 
  group_by(ID1) %>% 
  arrange(ID1, ID2) %>% 
  mutate(RUN = row_number(),
         N = n(),
         RUN = ifelse(N == 1, "SINGLE", paste0("RUN", RUN))) %>% 
  select(-N)

#result
#    ID1   ID2 RESULT    RUN
# 1 2528  G_53  0.236   RUN1
# 2 2528  G_54  0.235   RUN2
# 3 2528  G_55  0.237   RUN3
# 4 2530  G_99  0.325 SINGLE
# 5 2533 G_252  0.445   RUN1
# 6 2533 G_253  0.446   RUN2

用于标记重复项的用户定义函数

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-10-20 19:49:11

用于标记重复项的用户定义函数

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-10-20 19:49:11

解决方案1
2 已采纳 2015-10-20 19:49:11