简体   繁体   中英

How to sort within a row of a data frame with categorical variables?

I have this code:

test <- data.frame("ClaimType1" = "Derivative", "ClaimType2" = "Derivative","ClaimType3" = "Class", "ClaimType4" = "Class", "Time1" = c(2,5), "Time2" = c(8,4), "Time3" = c(1,3), "Time4" = c(10,9))
claim1 claim2 claim3 claim4 time1 time2 time3 time4
Derivative Derivative Class Class 2 8 1 10
Derivative Derivative Class Class 5 4 3 9

I'm looking sort the get it in the following output:

claim1 claim2 claim3 claim4 time1 time2 time3 time4
Class Derivative Derivative Class 1 2 8 10
Class Derivative Derivative Class 3 4 5 9

I'm trying to sort within a row, but I'm not sure how to link the claim and times together. I'm guessing a dictionary wouldn't work here since it's an array.

This is definitely much easier with long data, so, at least in dplyr , one has to pivot_longer then pivot_wider back:

library(dplyr)
library(tidyr)

test %>% 
  pivot_longer(cols = everything(), names_to = c(".value","col"), names_pattern = "(ClaimType|Time)(.*)") %>% 
  mutate(group = cumsum(col == 1)) %>% 
  arrange(group, Time, .by_group = T) %>% 
  mutate(col = sequence(rle(group)$l)) %>% 
  pivot_wider(id_cols = group, names_from = col, values_from = c("ClaimType","Time"), names_sep = "") %>% 
  select(-group)

  ClaimType1 ClaimType2 ClaimType3 ClaimType4 Time1 Time2 Time3 Time4
  <chr>      <chr>      <chr>      <chr>      <dbl> <dbl> <dbl> <dbl>
1 Class      Derivative Derivative Class          1     2     8    10
2 Class      Derivative Derivative Class          3     4     5     9

Since you're looking to sever the column-based relationships, I'd recommend a split-apply-combine type of workflow. The idea is to chop up the data frame into smaller parts, operate on each one in the way you want, and then glue them back together.

Using base R and some extremely inelegant code to showcase the idea:

helper_function <- function(x){
  time_rank <- order(as.numeric(x[5:8]))
  c(x[time_rank], x[time_rank + 4])
}

as.data.frame(t(apply(test, 1, helper_function)))

##      V1         V2         V3    V4 V5 V6 V7 V8
## 1 Class Derivative Derivative Class  1  2  8 10
## 2 Class Derivative Derivative Class  3  4  5  9

The key idea is to use order() to write down the way that you want each row permuted; then, you can apply that permutation to multiple parts of each row.

Now, we should clean this up, since we've destroyed the column names and types:

test_output <- as.data.frame(t(apply(test, 1, helper_function)))
colnames(test_output) <- c("claim1", "claim2", "claim3", "claim4",
                           "test1", "test2", "test3", "test4")
test_output[5:8] <- apply(test_output[, 5:8], 2, as.numeric)

test_output

##   claim1     claim2     claim3 claim4 test1 test2 test3 test4
## 1  Class Derivative Derivative  Class     1     2     8    10
## 2  Class Derivative Derivative  Class     3     4     5     9

str(test_output)

I'll mention that it's not great practice to refer to static column numbers (eg 5:8 ) as I did several times, but hopefully this communicates one possible approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM