简体   繁体   中英

How to Reshape DF with categorical variables from long to wide in R?

I am new to reshaping data frames. I have a df that I would like to make wider so I can use it in analysis such as cluster and NMDS. I have found several questions and (answers) related to how to reshape data containing primarily quantitative data (using aggregation functions), but in my case, my variables are all categorical.

As my df has a thousand rows and dozens of columns, I've created a toy df to use as an example. It looks like this:

df <- data.frame(
  id=c("a","c", "a","b","d","c","e","d","c","a","a","e","a","b","d"), 
  color=c("red", "blue", "gray", "yellow", "green","green","blue","purple"            ,"black","green","yellow","blue","red","yellow","gray"),
  fruit=c("apple", "orange", "avocado", "strawberry", "banana", "apple",               "orange", "avocado", "strawberry", "banana","banana", "strawberry",           "watermelon", "lemon", "lemon" ),
  country = c("Italy", "Spain", "Brazil", "Brazil", "Australia", "Italy",           "Japan", "India", "USA", "Mexico", "USA", "Mexico", "Spain",              "France", "France"),
  animal=c("alligator", "camel", "alligator", "bat", "dolphin", "camel",                "elephant", "dolphin", "camel", "alligator", "alligator",                    "elephant", "alligator", "bat", "dolphin")) 

I would like the column "id" to be the first in my reshaped data frame, "animal" to be the second, and then the levels of "color", "fruit", and "country". The point here is that I wanted them separated.

The code below shows some attempts I've made:

df <- dplyr::mutate_if(df,is.character,as.factor) 
attach(df)

dcast(df, id ~ color,value.var = "id") #The output is exactly what I wanted! 

dcast(df, id + animal ~ color,value.var = "id") #Exactly what I wanted!

dcast(df, id + animal ~ fruit,value.var = "id") #Exactly what I wanted!

dcast(df, id ~ country, value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?

dcast(df, id ~ color + country, value.var = "id") #Not the output what I wanted.

dcast(df, id + animal~ color + country, value.var = "id") #Not the output I wanted.

dcast(df, id + animal~ color + country + fruit, value.var = "id") #Not the output I wanted.

My expected reshaped df should look like this:

Expected reshape data frame

To achieve that, I've tried all commands below, but none of those worked well:

dcast(df, id + animal ~ color + country + fruit, fun.aggregate=length)

dcast(df, id + animal ~ c(color, country, fruit), fun.aggregate=length)

dcast(df, id + animal ~ c("color", "country", "fruit"), fun.aggregate=length)

dcast(df, id + animal ~ color:fruit, fun.aggregate=length)

I have also tried to do that using tidyr::pivot_wider, without success.

Is there a way to achieve my goal using reshape2::dcast or tidyr::pivot_wider, or any other function in R? I would appreciate it if you guys could help me. Thanks in advance.

First, you have to pivot_longer to get the column names you desire into a column. Then I arranged it by the future column names, so the words would be grouped, like your image, then I used pivot_wider . It drops the animal column, so I put it back, then arranged by id, so they would be in the same observation order as your image.

pivot_longer(df, cols = color:country, names_to = "variable", 
             values_to = "value") %>%                       # column names to rows
  arrange(variable, value) %>%                              # organize future column names
  pivot_wider(!variable, names_from = value, values_from = animal, 
              values_fn = list(animal = length), values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%                      # add animals back
  select(id, animal, everything()) %>%                      # rearrange columns
  arrange(id)                                               # reorder observations

在此处输入图像描述

Update based on your comment - ordered by color, fruit, then country

Added mutate and modified the first arrange and pivot_wider :

pivot_longer(df,cols = color:country, names_to = "variable", 
             values_to = "value") %>%                # future col names to rows
  mutate(ordering = ifelse(variable == "color", 1,   # create organizer variable
                           ifelse(variable == "fruit", 2, 3))) %>% 
  arrange(ordering, value) %>%                       # organize future column order
  pivot_wider(!c(variable,ordering),                 # make it wide
              names_from = value, 
              values_from = animal, 
              values_fn = list(animal = length), 
              values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%               # add the animals back
  select(id, animal, everything()) %>%               # move animals to 2nd position
  arrange(id)                                        # reorder observations 

Check it out: 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM