[英]Re-arrange rows of dataframe based on number of occurences of value in given column
首先示例:
a <- cbind(1:10, c("a","b","a","b","b","d","a","b", "d", "c"))
a
[,1] [,2]
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "a"
[4,] "4" "b"
[5,] "5" "b"
[6,] "6" "d"
[7,] "7" "a"
[8,] "8" "b"
[9,] "9" "d"
[10,] "10" "c"
這就是我需要的:我希望重新整理此表中的行,以使那些行位於第二列值最頻繁的頂部。 即我想要的結果是這樣的:
[,1] [,2]
[1,] "2" "b"
[2,] "4" "b"
[3,] "5" "b"
[4,] "8" "b"
[5,] "1" "a"
[6,] "3" "a"
[7,] "7" "a"
[8,] "6" "d"
[9,] "9" "d"
[10,] "10" "c"
我目前正在使用一個非常丑陋for
循環構造方法,該方法基本上是通過一個已排序的count(a, 2)
數據幀運行,然后重新組成一個新的數據幀。 任何想法如何更整齊地做到這一點?
您可以使用ave
和order
。
使用ave
計算每個“組”的長度,然后對該結果進行排序。 如果您關心關系, rank
也可能會有用。
> a[order(ave(a[, 2], a[, 2], FUN = length), decreasing = TRUE), ]
[,1] [,2]
[1,] "2" "b"
[2,] "4" "b"
[3,] "5" "b"
[4,] "8" "b"
[5,] "1" "a"
[6,] "3" "a"
[7,] "7" "a"
[8,] "6" "d"
[9,] "9" "d"
[10,] "10" "c"
標題是data.frame
。 使用data.table
和dplyr
a1 <- as.data.frame(a)
library(data.table)
ans <- setDT(a1)[,N := .N, by = V2][order(-N)][, N := NULL]
# V1 V2
# 1: 2 b
# 2: 4 b
# 3: 5 b
# 4: 8 b
# 5: 1 a
# 6: 3 a
# 7: 7 a
# 8: 6 d
# 9: 9 d
# 10: 10 c
要么
library(dplyr)
a1%>%
group_by(V2) %>%
mutate(L=n()) %>%
arrange(desc(L)) %>%
select(-L)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.