简体   繁体   中英

How to order data frame by numeric vector and character vector

I would like to order a data frame by a numeric vector and a character vector so that I can remove duplicates in the Code column, retaining the records with the highest value in the Value column. However, if my Category column has a "YS" or "YS1", then I want to retain those records even if the Value isn't the highest number
Here's a sample data set:

Code <- c(2,2,3,5,3,7,8)
Value <- c(17,18,35,25,67,34,2)
Category <- c("YS", "DW", "YS1", "OS", "OS", "OS1", "GD")
Dataset <- data.frame(Code, Value, Category)

  Code Value Category
1    2    17       YS
2    2    18       DW
3    3    35      YS1
4    5    25       OS
5    3    67       OS
6    7    34      OS1
7    8     2       GD

When I order the data by Code (ascending) and Value (descending) and remove the duplicate records by Code, my "YS" record for Code = 2 is not retained because it has a lower Value.

order_data <- Dataset[order(Dataset$Code, -Dataset$Value),]
dataset_nodup <- order_data[!duplicated(order_data$Code),]

  Code Value Category
2    2    18       DW
5    3    67       OS
4    5    25       OS
6    7    34      OS1
7    8     2       GD

I'd like to first order by my Category column and then my Value column so that my "YS" and "YS1" records are listed first. I have tried the following but it is not working.

order_data <- Dataset[order(Dataset$Code, -Dataset$Category, -Dataset$Value),]

I would like my output to look like:

  Code Value Category
1     2     17        YS
2     3     67       YS1
3     5     25        OS
4     7     34       OS1
5     8      2        GD

We can use match to bring Category with "YS" and "YS1" ahead and then remove duplicates

order_data <- Dataset[with(Dataset, order(match(Category, c("YS", "YS1")),
                                    Code, -Value)),]
dataset_nodup <- order_data[!duplicated(order_data$Code),]

dataset_nodup
#  Code Value Category
#1    2    17       YS
#3    3    35      YS1
#4    5    25       OS
#6    7    34      OS1
#7    8     2       GD

Or using dplyr

library(dplyr)

Dataset %>%
  arrange(match(Category, c("YS", "YS1")), Code, desc(Value)) %>%
  filter(!duplicated(Code))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM