识别重复行的组并保留组顺序

Question

I am trying to organize a spreadsheet of patient data with random repeat "chunks". 我正在尝试使用随机重复的“块”来组织患者数据电子表格。 Unfortunately, the rows of data are repeated at random, giving me repeat "chunks." 不幸的是，数据行是随机重复的，给了我重复的“块”。 I need to remove the repeat chunks while preserving the original order. 我需要在保留原始顺序的同时删除重复的块。

Here is a sample: 这是一个示例：

+---------+-----+----------+
| patient | age | children |
+---------+-----+----------+
| x       | 30  | g        |
| x       | 30  | b        |
| x       | 30  | g        |
| x       | 30  | b        |
| x       | 30  | g        |
| x       | 30  | b        |
| y       | 25  | g        |
| y       | 25  | b        |
| y       | 25  | b        |
| y       | 25  | g        |
| y       | 25  | b        |
| y       | 25  | b        |
+---------+-----+----------+

You can see, patient "x" chunk (with 2 children) is repeated three times, and patient "y" chunk (with 3 children) is repeated twice. 您可以看到，患者“ x”块（带有2个孩子）被重复了3次，而患者“ y”块（带有3个孩子）被重复了两次。 The number of repeat chunks is random. 重复块的数量是随机的。

Here is my goal: It is important that the order of the children is preserved 这是我的目标：保持孩子们的秩序很重要

+---------+-----+----------+
| patient | age | children |
+---------+-----+----------+
| x       | 30  | g        |
| x       | 30  | b        |
| y       | 25  | g        |
| y       | 25  | b        |
| y       | 25  | b        |
+---------+-----+----------+

I tried this first in excel: step 1: gave all rows unique identifier, to preserve the order of the children step 2: tried to remove duplicates, but this was a problem for patient "y" who has 2 girls, the final table removed one of them... 我首先在excel中尝试了此步骤：步骤1：为所有行提供唯一的标识符，以保留子代的顺序步骤2：尝试删除重复项，但这对拥有2个女孩的患者“ y”来说是个问题，最终表已删除其中之一...

I usually do my analysis in R, so a dplyr solution would be great here if anyone could make a suggestion 我通常在R中进行分析，因此如果有人可以提出建议，在这里dplyr解决方案将是不错的选择

Beyond the following, I'm lost. 除了以下内容，我迷路了。 Is there a way to recognize unique groups? 有没有办法识别独特的群体？

dat %>% group_by(patient)

Answer 1

The distinct() function in dplyr might be your best bet; dplyr中的distinct（）函数可能是最好的选择。 eg: 例如：

dat %>% distinct()

You can find more information on identifying and removing duplicate data in R by reading this blog post . 通过阅读此博客文章，您可以找到有关在R中标识和删除重复数据的更多信息。

识别重复行的组并保留组顺序

问题描述

1 个解决方案

解决方案1
1 2019-04-04 14:14:35

识别重复行的组并保留组顺序

问题描述

1 个解决方案

解决方案1 1 2019-04-04 14:14:35

解决方案1
1 2019-04-04 14:14:35