[英]How to achieve the most repeated values or names to show in a data frame
I have an easy question related to the library dplyr
in R. 我有一个简单的问题与R中的库dplyr
有关。
My actual data frame looks like this: 我的实际数据框如下所示:
Players <- data.frame(Group = c("A", "A", "A", "A", "B", "B", "B", "C","C","C"), Players= c("Jhon", "Jhon", "Jhon", "Charles", "Mike", "Mike","Carl", "Max", "Max","Max"))
: :
Group Players
A Jhon
A Jhon
A Jhon
A Charles
B Mike
B Mike
B Carl
C Max
C Max
C Max
And I would like to get another data frame with the players more repeated of each group and how many times are they listed. 我想获得另一个数据框,让每个组的球员更多重复,列出他们多少次。 So I would like to get this data frame: 所以我想得到这个数据框:
Group Players TimesListed
A Jhon 3
B Mike 2
B Max 3
I have tried this: 我已经试过了:
Station <- Players %>% group_by(Group,Players) %>%
summarise(TimesListed=length(Players)) %>%
summarise(TimesListed=max(TimesListed))
But I get a data frame without the names of the players like this: 但是我得到的数据框没有这样的播放器名称 :
Group TimesListed
1 A 3
2 B 2
3 C 3
Any idea? 任何想法? Thank you! 谢谢!
This should get you what you want: 这应该给您您想要的:
library(dplyr)
Players %>%
group_by(Group) %>%
count(Players) %>%
top_n(1, n)
# A tibble: 3 x 3
# Groups: Group [3]
Group Players n
<fctr> <fctr> <int>
1 A Jhon 3
2 B Mike 2
3 C Max 3
You could do the following to convert the factors to characters: 您可以执行以下操作将因子转换为字符:
Players[] <- lapply(Players, as.character)
And if you need to change variable n
to TimesListed
, add the following to the end of the chain: 并且,如果您需要将变量n
更改为TimesListed
,请将以下内容添加到链的末尾:
rename(TimesListed = n)
You can use aggregate
function in base R: 您可以在基数R中使用aggregate
函数:
aggregate(.~Group,dat,function(x)max(table(x)))
Group Players
1 A 3
2 B 2
3 C 3
For completeness, here is a solution using data.table . 为了完整起见 ,这是使用data.table的解决方案。
library(data.table)
setDT(Players)
Players[, .(TimesListed = .N), by = .(Group, Players)][
, .SD[which.max(TimesListed)], by = Group]
# Group Players TimesListed
# 1: A Jhon 3
# 2: B Mike 2
# 3: C Max 3
The above solution will return the first row with maximum in TimesListed
. 上面的解决方案将返回TimesListed
具有最大值的第一行。 If we want to return all the rows equal to the maximum, we can do the following. 如果要返回等于最大值的所有行,则可以执行以下操作。 In this case, the two solutions lead to the same results. 在这种情况下,两种解决方案得出的结果相同。
Players[, .(TimesListed = .N), by = .(Group, Players)][
, .SD[TimesListed == max(TimesListed)], by = Group]
# Group Players TimesListed
# 1: A Jhon 3
# 2: B Mike 2
# 3: C Max 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.