繁体   English   中英

如何在 R 中使用从长到宽的分类变量重塑 DF?

[英]How to Reshape DF with categorical variables from long to wide in R?

我是重塑数据框的新手。 我有一个想要更宽的 df,以便我可以在分析中使用它,例如集群和 NMDS。 我发现了几个与如何重塑主要包含定量数据的数据(使用聚合函数)相关的问题和(答案),但就我而言,我的变量都是分类的。

由于我的 df 有一千行和几十列,我创建了一个玩具 df 作为示例。 它看起来像这样:

df <- data.frame(
  id=c("a","c", "a","b","d","c","e","d","c","a","a","e","a","b","d"), 
  color=c("red", "blue", "gray", "yellow", "green","green","blue","purple"            ,"black","green","yellow","blue","red","yellow","gray"),
  fruit=c("apple", "orange", "avocado", "strawberry", "banana", "apple",               "orange", "avocado", "strawberry", "banana","banana", "strawberry",           "watermelon", "lemon", "lemon" ),
  country = c("Italy", "Spain", "Brazil", "Brazil", "Australia", "Italy",           "Japan", "India", "USA", "Mexico", "USA", "Mexico", "Spain",              "France", "France"),
  animal=c("alligator", "camel", "alligator", "bat", "dolphin", "camel",                "elephant", "dolphin", "camel", "alligator", "alligator",                    "elephant", "alligator", "bat", "dolphin")) 

我希望列“id”是我重新调整的数据框中的第一个,“animal”是第二个,然后是“color”、“fruit”和“country”的级别。 这里的重点是我希望他们分开。

下面的代码显示了我所做的一些尝试:

df <- dplyr::mutate_if(df,is.character,as.factor) 
attach(df)

dcast(df, id ~ color,value.var = "id") #The output is exactly what I wanted! 

dcast(df, id + animal ~ color,value.var = "id") #Exactly what I wanted!

dcast(df, id + animal ~ fruit,value.var = "id") #Exactly what I wanted!

dcast(df, id ~ country, value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?

dcast(df, id ~ color + country, value.var = "id") #Not the output what I wanted.

dcast(df, id + animal~ color + country, value.var = "id") #Not the output I wanted.

dcast(df, id + animal~ color + country + fruit, value.var = "id") #Not the output I wanted.

我预期的重塑 df 应该是这样的:

预期重塑数据框

为了实现这一点,我尝试了以下所有命令,但没有一个运行良好:

dcast(df, id + animal ~ color + country + fruit, fun.aggregate=length)

dcast(df, id + animal ~ c(color, country, fruit), fun.aggregate=length)

dcast(df, id + animal ~ c("color", "country", "fruit"), fun.aggregate=length)

dcast(df, id + animal ~ color:fruit, fun.aggregate=length)

我也尝试过使用 tidyr::pivot_wider 来做到这一点,但没有成功。

有没有办法使用 reshape2::dcast 或 tidyr::pivot_wider 或 R 中的任何其他 function 来实现我的目标? 如果你们能帮助我,我将不胜感激。 提前致谢。

首先,您必须使用pivot_longer才能将所需的列名放入列中。 然后我按未来的列名排列它,这样单词就会被分组,就像你的图像一样,然后我使用pivot_wider 它丢弃了动物列,所以我把它放回去,然后按 id 排列,所以它们的观察顺序与你的图像相同。

pivot_longer(df, cols = color:country, names_to = "variable", 
             values_to = "value") %>%                       # column names to rows
  arrange(variable, value) %>%                              # organize future column names
  pivot_wider(!variable, names_from = value, values_from = animal, 
              values_fn = list(animal = length), values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%                      # add animals back
  select(id, animal, everything()) %>%                      # rearrange columns
  arrange(id)                                               # reorder observations

在此处输入图像描述

根据您的评论进行更新- 按颜色、水果和国家/地区排序

添加了mutate并修改了第一个arrangepivot_wider

pivot_longer(df,cols = color:country, names_to = "variable", 
             values_to = "value") %>%                # future col names to rows
  mutate(ordering = ifelse(variable == "color", 1,   # create organizer variable
                           ifelse(variable == "fruit", 2, 3))) %>% 
  arrange(ordering, value) %>%                       # organize future column order
  pivot_wider(!c(variable,ordering),                 # make it wide
              names_from = value, 
              values_from = animal, 
              values_fn = list(animal = length), 
              values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%               # add the animals back
  select(id, animal, everything()) %>%               # move animals to 2nd position
  arrange(id)                                        # reorder observations 

一探究竟: 在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM