plyr in r：使用通配符分组或删除数字并保留分组列中的字符

Question

I am using plyr to calculate means and standard deviations in r. 我正在使用plyr计算r中的均值和标准差。 However, my grouping variable contains a combination of letters and numbers, so I need to either use some kind of wildcard in my grouping variable, or create a new grouping variable by removing the numbers from the original grouping variable. 但是，我的分组变量包含字母和数字的组合，因此我需要在分组变量中使用某种通配符，或者通过从原始分组变量中删除数字来创建新的分组变量。 For example, with the following dataframe: 例如，使用以下数据帧：

test5 <- structure(list(A = structure(1:6, .Label = c("JCT1", "JCT2", 
"JCT3", "LFR1", "LFR2", "LFR3"), class = "factor"), B = c(4L, 
5L, 3L, 7L, 3L, 6L), C = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("JCT", 
"LFR"), class = "factor")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-6L))

    A   B   C
1   JCT1    4   JCT
2   JCT2    5   JCT
3   JCT3    3   JCT
4   LFR1    7   LFR
5   LFR2    3   LFR
6   LFR3    6   LFR

I can use the following code to calculate means and sd: 我可以使用以下代码来计算均值和sd：

library(plyr)
ddply(test5,~A,summarise,mean=mean(B),sd=sd(B))

which gives a result like 这给出了一个结果

    A   mean    sd
1   JCT1    4   NA
2   JCT2    5   NA
3   JCT3    3   NA
4   LFR1    7   NA
5   LFR2    3   NA
6   LFR3    6   NA

However, I really need the groups to be JCT and LFR , so need to either 1) use a wildcard in the code (so groups are based on JCT and LFR , with the number being the wildcard), or 2) create a new column like C in my original dataframe that has removed the numbers from column A . 但是，我真的需要组是JCT和LFR ，所以需要1）在代码中使用通配符（因此组基于JCT和LFR ，数字是通配符），或2）创建新列像我的原始数据框中的C已删除A列中的数字。 So for example, if I could create this new column C then I could use the code 例如，如果我可以创建这个新列C那么我可以使用代码

ddply(test5,~C,summarise,mean=mean(B),sd=sd(B))

to produce my desired result of 产生我想要的结果

      C     mean          sd
1   JCT 4.000000    1.000000
2   LFR 5.333333    2.081666

Does anyone know of an easy way to do this? 有谁知道一个简单的方法来做到这一点？ I thought I could use ifelse statements to somehow create a new column C , but this would require a lot of code as I have many different values in my real dataframe. 我以为我可以使用ifelse语句以某种方式创建一个新的列C ，但这需要很多代码，因为我在我的实际数据帧中有许多不同的值。 I am hoping there is a quicker way. 我希望有一个更快的方法。

Thanks! 谢谢！

Answer 1

Is something like this you are looking for? 你正在寻找这样的东西吗？

library(plyr)
test5$A <- gsub('[0-9]+', '', test5$A)

ddply(test5, .(A), summarise, mean=mean(B, na.rm = T), sd = sd(B, na.rm = T))

    A     mean       sd
1 JCT 4.000000 1.000000
2 LFR 5.333333 2.081666

Answer 2

You could use regmatches and regexpr , to extract the letters and then summarize based on that 您可以使用regmatches和regexpr来提取字母，然后根据它进行汇总

> ddply(test5,.(letter=regmatches(A,regexpr("[A-Za-z]*",A))),
    summarise,mean=mean(B),sd=sd(B))
  letter     mean       sd
1    JCT 4.000000 1.000000
2    LFR 5.333333 2.081666

plyr in r：使用通配符分组或删除数字并保留分组列中的字符

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-03-12 22:22:40

解决方案2
0 2014-03-12 22:17:53

plyr in r：使用通配符分组或删除数字并保留分组列中的字符

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-03-12 22:22:40

解决方案2 0 2014-03-12 22:17:53

解决方案1
2 已采纳 2014-03-12 22:22:40

解决方案2
0 2014-03-12 22:17:53