How to sort within a row of a data frame with categorical variables?

Question

I have this code:

test <- data.frame("ClaimType1" = "Derivative", "ClaimType2" = "Derivative","ClaimType3" = "Class", "ClaimType4" = "Class", "Time1" = c(2,5), "Time2" = c(8,4), "Time3" = c(1,3), "Time4" = c(10,9))

claim1	claim2	claim3	claim4	time1	time2	time3	time4
Derivative	Derivative	Class	Class	2	8	1	10
Derivative	Derivative	Class	Class	5	4	3	9

I'm looking sort the get it in the following output:

claim1	claim2	claim3	claim4	time1	time2	time3	time4
Class	Derivative	Derivative	Class	1	2	8	10
Class	Derivative	Derivative	Class	3	4	5	9

I'm trying to sort within a row, but I'm not sure how to link the claim and times together. I'm guessing a dictionary wouldn't work here since it's an array.

Answer 1

This is definitely much easier with long data, so, at least in dplyr , one has to pivot_longer then pivot_wider back:

library(dplyr)
library(tidyr)

test %>% 
  pivot_longer(cols = everything(), names_to = c(".value","col"), names_pattern = "(ClaimType|Time)(.*)") %>% 
  mutate(group = cumsum(col == 1)) %>% 
  arrange(group, Time, .by_group = T) %>% 
  mutate(col = sequence(rle(group)$l)) %>% 
  pivot_wider(id_cols = group, names_from = col, values_from = c("ClaimType","Time"), names_sep = "") %>% 
  select(-group)

  ClaimType1 ClaimType2 ClaimType3 ClaimType4 Time1 Time2 Time3 Time4
  <chr>      <chr>      <chr>      <chr>      <dbl> <dbl> <dbl> <dbl>
1 Class      Derivative Derivative Class          1     2     8    10
2 Class      Derivative Derivative Class          3     4     5     9

Answer 2

Since you're looking to sever the column-based relationships, I'd recommend a split-apply-combine type of workflow. The idea is to chop up the data frame into smaller parts, operate on each one in the way you want, and then glue them back together.

Using base R and some extremely inelegant code to showcase the idea:

helper_function <- function(x){
  time_rank <- order(as.numeric(x[5:8]))
  c(x[time_rank], x[time_rank + 4])
}

as.data.frame(t(apply(test, 1, helper_function)))

##      V1         V2         V3    V4 V5 V6 V7 V8
## 1 Class Derivative Derivative Class  1  2  8 10
## 2 Class Derivative Derivative Class  3  4  5  9

The key idea is to use order() to write down the way that you want each row permuted; then, you can apply that permutation to multiple parts of each row.

Now, we should clean this up, since we've destroyed the column names and types:

test_output <- as.data.frame(t(apply(test, 1, helper_function)))
colnames(test_output) <- c("claim1", "claim2", "claim3", "claim4",
                           "test1", "test2", "test3", "test4")
test_output[5:8] <- apply(test_output[, 5:8], 2, as.numeric)

test_output

##   claim1     claim2     claim3 claim4 test1 test2 test3 test4
## 1  Class Derivative Derivative  Class     1     2     8    10
## 2  Class Derivative Derivative  Class     3     4     5     9

str(test_output)

I'll mention that it's not great practice to refer to static column numbers (eg 5:8 ) as I did several times, but hopefully this communicates one possible approach.

How to sort within a row of a data frame with categorical variables?

Question

2 answers

solution1
1 2022-01-20 16:03:35

solution2
1 2022-01-20 16:11:05

How to sort within a row of a data frame with categorical variables?

Question

2 answers

solution1 1 2022-01-20 16:03:35

solution2 1 2022-01-20 16:11:05

solution1
1 2022-01-20 16:03:35

solution2
1 2022-01-20 16:11:05