[英]r order of dataframe and selection
I would appreciate if someone could give me some direction of how to solve a complex ordering of a matrix and selection of the top 2 elements in each subcategory. 如果有人能给我一些指导,以解决矩阵的复杂排序以及每个子类别中前2个元素的选择,我将不胜感激。
code: 码:
index<-1:14
metric<-c(0.037777,0.041143,0.041043,0.042056,0.043701,0.042169,0.042134,
0.046565,0.044638,0.036653,0.046221,0.04033,0.045385,0.043873)
cat_1<-c("California Munis","California Munis","California Munis","California Munis",
"California Munis","California Munis","California Munis","Corporate Bonds",
"Corporate Bonds","Corporate Bonds","Government Bonds","Government Bonds",
"High Yield Bonds","High Yield Bonds")
cat_2<-c("California Munis","Corporate Bonds","Corporate Bonds","Government Bonds",
"High Yield Bonds","High Yield Bonds","High Yield Bonds","High Yield Bonds",
"High Yield Bonds","High Yield Bonds","California Munis","California Munis",
"Corporate Bonds","Corporate Bonds")
data<-data.frame(cbind(index,metric,cat_1,cat_2))
which produces the below matrix 产生下面的矩阵
Ind Metric Cat_1 Cat_2
1 0.037777 California Munis California Munis
2 0.041143 California Munis Corporate Bonds
3 0.041043 California Munis Corporate Bonds
4 0.042056 California Munis Government Bonds
5 0.043701 California Munis High Yield Bonds
6 0.042169 California Munis High Yield Bonds
7 0.042134 California Munis High Yield Bonds
8 0.046565 Corporate Bonds High Yield Bonds
9 0.044638 Corporate Bonds High Yield Bonds
10 0.036653 Corporate Bonds High Yield Bonds
11 0.046221 Government Bonds California Munis
12 0.04033 Government Bonds California Munis
13 0.045385 High Yield Bonds Corporate Bonds
14 0.043873 High Yield Bonds Corporate Bonds
Given the matrix above I would like to order based on the Cat_1, Cat_2 and Metric. 给定上面的矩阵,我想基于Cat_1,Cat_2和Metric进行订购。 i have tried this:
我已经试过了:
data[order(data[,3],data[,4],data[,2]),]
However Cat_1 and Cat_2 should be indifferent if their entries are the same. 但是,如果Cat_1和Cat_2的条目相同,则它们应该无关紧要。 As an example, "California Munis"&"Corporate Bonds"="Corporate Bonds"&"California Munis".
例如,“ California Munis”和“ Corporate Bonds” =“ Corporate Bonds”&“ California Munis”。 the outcome I am looking to get should look like the result in the following matrix
我希望获得的结果应类似于以下矩阵中的结果
Ind Metric Cat_1 Cat_2 Selection
1 0.037777 California Munis California Munis 1
2 0.041143 California Munis Corporate Bonds 1
3 0.041043 California Munis Corporate Bonds 2
11 0.046221 Government Bonds California Munis 1
4 0.042056 California Munis Government Bonds 2
12 0.04033 Government Bonds California Munis
5 0.043701 California Munis High Yield Bonds 1
6 0.042169 California Munis High Yield Bonds 2
7 0.042134 California Munis High Yield Bonds
8 0.046565 Corporate Bonds High Yield Bonds 1
13 0.045385 High Yield Bonds Corporate Bonds 2
9 0.044638 Corporate Bonds High Yield Bonds
14 0.043873 High Yield Bonds Corporate Bonds
10 0.036653 Corporate Bonds High Yield Bonds
The last column presents the selection of the top 2 lines per every subcategory that I need to extract. 最后一列显示了我需要提取的每个子类别的前2行的选择。
Any ideas or code would be highly appreciated. 任何想法或代码将不胜感激。
Thanks 谢谢
Please abandon the use of data.frame(cbind(...))
. 请放弃使用
data.frame(cbind(...))
。 It will only cause you grief. 只会让你悲伤。
newdat <- data[ with( data,
order( pmax( as.numeric(cat_1), as.numeric(cat_2) ),
pmin( as.numeric(cat_1), as.numeric(cat_2) ) ,
- metric) ) , ]
newdat$selection <- ave(index,
first=pmax( as.numeric(newdat$cat_1),
as.numeric(newdat$cat_2) ),
second= pmin( as.numeric(newdat$cat_1),
as.numeric(newdat$cat_2) ) ,
FUN=seq)
#-----------------------------------------
> newdat
index metric cat_1 cat_2 selection
1 1 0.037777 California Munis California Munis 1
2 2 0.041143 California Munis Corporate Bonds 1
3 3 0.041043 California Munis Corporate Bonds 2
11 11 0.046221 Government Bonds California Munis 1
4 4 0.042056 California Munis Government Bonds 2
12 12 0.040330 Government Bonds California Munis 3
5 5 0.043701 California Munis High Yield Bonds 1
6 6 0.042169 California Munis High Yield Bonds 2
7 7 0.042134 California Munis High Yield Bonds 3
8 8 0.046565 Corporate Bonds High Yield Bonds 1
13 13 0.045385 High Yield Bonds Corporate Bonds 2
9 9 0.044638 Corporate Bonds High Yield Bonds 3
14 14 0.043873 High Yield Bonds Corporate Bonds 4
10 10 0.036653 Corporate Bonds High Yield Bonds 5
The requirement for success here is that the levels in the two cat variables are the same. 成功的前提是两个cat变量中的级别相同。 If not, then make them the same with
levels(.) <- union(levels(cat1, levels(cat_2))
如果不是,则使它们与
levels(.) <- union(levels(cat1, levels(cat_2))
I expand on my comment 我扩大我的评论
# introduce combined category
cat3 <- sapply(paste(data$cat_1,data$cat_2,sep=" "),function(x){paste(sort(strsplit(x," ")[[1]]), collapse=" ")})
data$cat_3 <- cat3
# order as desired
data1 <- data[order( cat_3 , -metric), ]
# label and select top 2 in each cat
data1$rankByCat <- unlist(sapply(unique(data1$cat_3), function(mycat, mydf) {return(1:sum(mydf$cat_3==mycat))}, mydf=data1))
data1[data1$rankByCat < 3, !names(data1)%in%c("cat_3")]
@andrei @安德烈
I have got the sorting part with the following code: 我有以下代码的排序部分:
#concacenate the 2 strings
cat_3<-paste(data[,3],data[,4],sep=" ")
#break the string to 2 (creates a list)
temp_split<-strsplit(cat_3," ")
#sort by row
sort_split<-sapply(temp_split,sort)
#bind split
out<-cbind(data,t(sort_split))
Is that the best way to write it? 那是最好的写法吗?
How would I proceed from here to select the top 2 of each category? 我将如何从这里开始选择每个类别的前2个?
Thanks for the help! 谢谢您的帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.