简体   繁体   English

如何在R中的数据框中组合相似的元素

[英]How to combine similar elements in a data frame in R

I have a data frame consisting of 我有一个由...组成的数据框

Lancaster001A    76
Lancaster001B    35
Lancaster002A    46
Lancaster002D     9
....             ...

I'd like to consolidate the dataframe into this 我想将数据框合并到这里

Lancaster001    111
Lancaster002     55

And so remove the smaller categorising. 因此删除较小的分类。 I couldn't find a way to do with merge, is there a general function that can be used using similarity? 我找不到合并的方法,是否有可以使用相似性的一般功能?

Here is a base R solution using a regex to remove all characters after three numeric characters: 这是一个基本R解决方案,使用正则表达式删除三个数字字符后的所有字符:

DF <- read.table(text = "Lancaster001A    76
                 Lancaster001B    35
                 Lancaster002A    46
                 Lancaster002D     9")

setNames(aggregate(V2 ~ gsub("(?<=\\d{3}).*", "", V1, perl = TRUE), 
                   DF, FUN = sum), 
         c("V1", "V2"))
#            V1  V2
#1 Lancaster001 111
#2 Lancaster002  55

It would be trivial to use data.table if the aggregation is too slow on a large dataset. 如果聚合在大型数据集上太慢,那么使用data.table将是微不足道的。

Adjust the regex as needed if the structure of your data is different. 如果数据结构不同,请根据需要调整正则表达式。

Let's assume these names for your columns, and let's assume the 'smaller categorising' means one letter at the end. 让我们为您的列假设这些名称,让我们假设“较小的分类”意味着最后一个字母。

id               value
Lancaster001A    76
Lancaster001B    35
Lancaster002A    46
Lancaster002D     9
....             ...

I use dplyr for everything. 我用dplyr做所有事情。 Install dplyr , make sure your column names are correct, and then try: 安装dplyr ,确保列名正确,然后尝试:

library(dplyr)
mydata %>%
  mutate(id = substr(id, 1, nchar(id)-1) %>% # removes last character
  group_by(id) %>%
  summarize(sum = sum(value))

Edit: An even simpler data.table solution from @Arun's helpful tip: 编辑:来自@ Arun的有用提示的更简单的data.table解决方案:

library(data.table)
dt[, list(sum=sum(value)), by = substr(as.character(id),1,nchar(as.character(id)) - 1)]

             id sum
1: Lancaster001 111
2: Lancaster002  55

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM