简体   繁体   English

如何在 R 中按频率重命名值

[英]How to rename values by frequency in R

I am making several graphs based on the clustering data from DAPC.我正在根据来自 DAPC 的聚类数据制作几个图表。 I need the colors to be the same across all the graphs, and I'd like to use specific colors for the largest groups.我需要 colors 在所有图表中都相同,并且我想对最大的组使用特定的 colors。 The important thing for this question, is I get a data set from DAPC like this:对于这个问题,重要的是我从 DAPC 获得了这样的数据集:

my_df <- data.frame(
  ID = c(1:10),
  Group = c("a", "b", "b", "c", "a", "b", "a", "b", "b", "c")
)

> my_df

ID  Group
1   a           
2   b           
3   b           
4   c           
5   a           
6   b           
7   a           
8   b           
9   b           
10  c

I know how to find the group with the most members like this:我知道如何找到这样的成员最多的组:

freqs <- table(my_df$Group)
freqs <- freqs[order(freqs, decreasing = TRUE)]

>freqs
b a c 
5 3 2 

Is there a way to change the values based on their frequency?有没有办法根据它们的频率改变值? Each time I rerun DAPC, it changes the groups, so I'd like to write code that does this automatically instead of having to redo it manually.每次我重新运行 DAPC 时,它都会更改组,因此我想编写自动执行此操作的代码,而不必手动重做。 Here's how I'd like the dataframe to be changed:这是我希望更改 dataframe 的方式:

> my_df                          > my_new_df
ID  Group                        ID  Group
1   a                             1  '2nd'
2   b                             2  '1st'          
3   b                             3  '1st'          
4   c                             4  '3rd'          
5   a                             5  '2nd'          
6   b                             6  '1st'          
7   a                             7  '2nd'          
8   b                             8  '1st'          
9   b                             9  '1st'          
10  c                             10 '3rd'          

You may use ave and create a factor out of it with the corresponding labels= .您可以使用ave并使用相应的labels=创建一个factor To avoid hard-coding, define the labels in a vector lb beforehand.为避免硬编码,请事先在向量lb中定义标签。

lb <- c("1st", "2nd", "3rd", paste0(4:10, "th"))

with(my_df, factor(as.numeric(ave(as.character(Group), as.character(Group), FUN=table)),
       labels=rev(lb[1:length(unique(table(Group)))])))
#  [1] 2nd 1st 1st 3rd 2nd 1st 2nd 1st 1st 3rd
# Levels: 3rd 2nd 1st

To convert more columns like this, use sapply .要转换更多这样的列,请使用sapply

sapply(my_df[selected.columns], function(x) {
  factor(as.numeric(ave(as.character(x), as.character(x), FUN=table)),
         labels=rev(lb[1:length(unique(table(x)))]))
})

Do you mean something like this:你的意思是这样的:

my_df %>% left_join(my_df %>% group_by(Group) %>% summarise(N=n())) %>%
  arrange(desc(N)) %>% select(-N)

   ID Group
1   2     B
2   3     B
3   6     B
4   8     B
5   9     B
6   1     A
7   5     A
8   7     A
9   4     C
10 10     C

Update更新

This can be useful.这可能很有用。 I hope this helps.我希望这有帮助。

my_df %>% left_join(my_df %>% group_by(Group) %>% summarise(N=n()) %>% arrange(desc(N)) %>%
                      bind_cols(my_df %>% select(Group) %>% distinct() %>% rename(key=Group)) %>%
                      rename(NewGroup=Group,Group=key)) %>%
  select(-c(Group,N)) %>% rename(Group=NewGroup)

   ID Group
1   1     B
2   2     A
3   3     A
4   4     C
5   5     B
6   6     A
7   7     B
8   8     A
9   9     A
10 10     C

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM