[英]Identify groups of repeat rows and preserve group order
I am trying to organize a spreadsheet of patient data with random repeat "chunks". 我正在尝试使用随机重复的“块”来组织患者数据电子表格。 Unfortunately, the rows of data are repeated at random, giving me repeat "chunks."
不幸的是,数据行是随机重复的,给了我重复的“块”。 I need to remove the repeat chunks while preserving the original order.
我需要在保留原始顺序的同时删除重复的块。
Here is a sample: 这是一个示例:
+---------+-----+----------+
| patient | age | children |
+---------+-----+----------+
| x | 30 | g |
| x | 30 | b |
| x | 30 | g |
| x | 30 | b |
| x | 30 | g |
| x | 30 | b |
| y | 25 | g |
| y | 25 | b |
| y | 25 | b |
| y | 25 | g |
| y | 25 | b |
| y | 25 | b |
+---------+-----+----------+
You can see, patient "x" chunk (with 2 children) is repeated three times, and patient "y" chunk (with 3 children) is repeated twice. 您可以看到,患者“ x”块(带有2个孩子)被重复了3次,而患者“ y”块(带有3个孩子)被重复了两次。 The number of repeat chunks is random.
重复块的数量是随机的。
Here is my goal: It is important that the order of the children is preserved 这是我的目标:保持孩子们的秩序很重要
+---------+-----+----------+
| patient | age | children |
+---------+-----+----------+
| x | 30 | g |
| x | 30 | b |
| y | 25 | g |
| y | 25 | b |
| y | 25 | b |
+---------+-----+----------+
I tried this first in excel: step 1: gave all rows unique identifier, to preserve the order of the children step 2: tried to remove duplicates, but this was a problem for patient "y" who has 2 girls, the final table removed one of them... 我首先在excel中尝试了此步骤:步骤1:为所有行提供唯一的标识符,以保留子代的顺序步骤2:尝试删除重复项,但这对拥有2个女孩的患者“ y”来说是个问题,最终表已删除其中之一...
I usually do my analysis in R, so a dplyr solution would be great here if anyone could make a suggestion 我通常在R中进行分析,因此如果有人可以提出建议,在这里dplyr解决方案将是不错的选择
Beyond the following, I'm lost. 除了以下内容,我迷路了。 Is there a way to recognize unique groups?
有没有办法识别独特的群体?
dat %>% group_by(patient)
The distinct() function in dplyr might be your best bet; dplyr中的distinct()函数可能是最好的选择。 eg:
例如:
dat %>% distinct()
You can find more information on identifying and removing duplicate data in R by reading this blog post . 通过阅读此博客文章,您可以找到有关在R中标识和删除重复数据的更多信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.