简体   繁体   中英

reorganizing dataframe drastically in R using tidyr

I have a dataframe that consists of vegetation data. Columns are species names and rows are their relative abundances per site. Site, plotcode and year are also variables. Data looks like this:

Site Code Year speca specb specc 
A     A1  2001   0     1     10   
A     A2  2001   5     5     15
B     B1  2001   0     5     20
B     B1  2004   15    75    0
C     C1  2006   50    0     15

I want the datatable to look like this:

species A1_2001 A2_2001 B1_2001 B1_2004 C1_2006
speca   0       5       0       15      50
specb   1       5       5       75      0
specc   10      15      20      0       15

I tried using the tidyr:pivot_longer function, but this does not give the result i want.

tidyr::pivot_longer(df, 4:length(df), names_to = "species", values_to = "abundance")

Is there a way to achieve this in a codefriendly way, preferably using tidyr ( tidyverse )?

We reshape it to 'long' format and then do the 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
df %>%
  pivot_longer(cols = starts_with('spec'), names_to = 'species') %>% 
  unite(CodeYear, Code, Year) %>%
  select(-Site) %>%
  pivot_wider(names_from = CodeYear, values_from = value)
# A tibble: 3 x 6
#  species A1_2001 A2_2001 B1_2001 B1_2004 C1_2006
#  <chr>     <int>   <int>   <int>   <int>   <int>
#1 speca         0       5       0      15      50
#2 specb         1       5       5      75       0
#3 specc        10      15      20       0      15

data

df <- structure(list(Site = c("A", "A", "B", "B", "C"), Code = c("A1", 
"A2", "B1", "B1", "C1"), Year = c(2001L, 2001L, 2001L, 2004L, 
2006L), speca = c(0L, 5L, 0L, 15L, 50L), specb = c(1L, 5L, 5L, 
75L, 0L), specc = c(10L, 15L, 20L, 0L, 15L)), class = "data.frame", 
row.names = c(NA, 
-5L))

In data.table:

library(data.table)

DT <- data.table(Site = c('A1','A2','B1','B1','C1'),
                 Year = c(2001, 2001, 2001, 2004, 2006),
                 speca = c(0,5,0,15,50),
                 specb = c(1,5,5,75,0),
                 specc = c(10,15,20,0,15))

DT <- melt(DT, id.vars = c('Site', 'Year'),
           measure.vars = c('speca', 'specb', 'specc') , variable.name = 'species')

DT <- dcast(DT, species ~ Site + Year, value.var = c('value'))

> DT

   species A1_2001 A2_2001 B1_2001 B1_2004 C1_2006
1:   speca       0       5       0      15      50
2:   specb       1       5       5      75       0
3:   specc      10      15      20       0      15

You mainly need a pivot_wider() to follow your pivot_longer() :

library(tidyverse)
df <- tribble(~Site, ~Code, ~Year, ~speca, ~specb, ~specc,
              "A", "A1", 2001, 0, 1, 10, 
              "A", "A2", 2001, 5, 5, 15,
              "B", "B1", 2001, 0, 5, 20,
              "B", "B1", 2004, 15, 75, 0,
              "C", "C1", 2006, 50, 0, 15)

df %>% 
  mutate(Code = paste(Code, Year, sep = "_")) %>% 
  select(-Site, -Year) %>% 
  pivot_longer(starts_with("spec"), names_to = "species", values_to = "abundance") %>% 
  pivot_wider(names_from = Code, values_from = abundance)

The result is

# A tibble: 3 x 6
  species A1_2001 A2_2001 B1_2001 B1_2004 C1_2006
  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 speca         0       5       0      15      50
2 specb         1       5       5      75       0
3 specc        10      15      20       0      15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM