![](/img/trans.png)
[英]Create data frame with factor variable levels based on maximum of numeric variable
[英]Create new column based on factor and numeric data
我有一個數據框,我想使用相鄰列中的數據將一列中的值轉換為新列。 df$species
中的每個因素都將成為一個新列,新列的數據將是df$fish_num
中的相應數據但是,我真的很困惑 go 如何做到這一點並且不知道在哪里開始!
這是我目前的 df:
site treatment section species fish_num
1 Site 1 Control A parr 7
2 Site 1 Control A salmon 6
3 Site 1 Control B trout 4
4 Site 1 Control B salmon 12
5 Site 1 Treatment A parr 8
6 Site 1 Treatment A salmon 5
7 Site 1 Treatment B trout 15
8 Site 1 Treatment B salmon 9
df <- structure(list(site = c("Site 1", "Site 1", "Site 1", "Site 1",
"Site 1", "Site 1", "Site 1", "Site 1"), treatment = c("Control",
"Control", "Control", "Control", "Treatment", "Treatment", "Treatment",
"Treatment"), section = c("A", "A", "B", "B", "A", "A", "B",
"B"), species = c("parr", "salmon", "trout", "salmon", "parr",
"salmon", "trout", "salmon"), fish_num = c(7L, 6L, 4L, 12L, 8L,
5L, 15L, 9L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))
我希望能夠制作以下內容:
site treatment section fish_num parr salmon trout
1 Site 1 Control A 7 7 0 0
2 Site 1 Control A 6 0 6 0
3 Site 1 Control B 4 0 0 4
4 Site 1 Control B 12 0 12 0
5 Site 1 Treatment A 8 8 0 0
6 Site 1 Treatment A 5 0 5 0
7 Site 1 Treatment B 15 0 0 15
8 Site 1 Treatment B 9 0 9 0
我不確定最好的方法!
如果您想留在tidyr
中,一種方法是使用 tidyr 的pivot_wider()
。 我還添加了對mutate_at()
和mutate()
的調用,以用零替換缺失值並計算列fish_num
。
library(tidyverse)
df %>%
pivot_wider(names_from = species,
values_from = fish_num) %>%
mutate_at(c("parr", "salmon", "trout"), ~replace(., is.na(.), 0)) %>%
mutate(fish_num = parr+salmon+trout)
# A tibble: 4 x 7
site treatment section parr salmon trout fish_num
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Site 1 Control A 7 6 0 13
2 Site 1 Control B 0 12 4 16
3 Site 1 Treatment A 8 5 0 13
4 Site 1 Treatment B 0 9 15 24
使用tidyverse
的pivot_wider
df %>%
mutate(id=1:nrow(.)) %>%
pivot_wider(names_from = species, values_from=fish_num, id_cols=everything())
#> site treatment section id parr salmon trout
#> <fct> <fct> <fct> <int> <int> <int> <int>
#> 1 Site 1 Control A 1 7 NA NA
#> 2 Site 1 Control A 2 NA 6 NA
#> 3 Site 1 Control B 3 NA NA 4
#> 4 Site 1 Control B 4 NA 12 NA
#> 5 Site 1 Treatment A 5 8 NA NA
#> 6 Site 1 Treatment A 6 NA 5 NA
#> 7 Site 1 Treatment B 7 NA NA 15
#> 8 Site 1 Treatment B 8 NA 9 NA
您可以使用 R 中的 TRUE 和 FALSE 在數字上表示零和一的技巧:
df$salmon <- df$fish_num * (df$species == "salmon")
df$trout <- df$fish_num * (df$species == "trout")
要對所有出現的物種執行此操作,請將其置於levels(df$species)
的循環中。
一個簡單的基礎 R 選項正在使用xtabs
cbind(df,unclass(t(xtabs(fish_num ~species + q,cbind(df,q = 1:nrow(df))))))
這使
site treatment section species fish_num parr salmon trout
1 Site 1 Control A parr 7 7 0 0
2 Site 1 Control A salmon 6 0 6 0
3 Site 1 Control B trout 4 0 0 4
4 Site 1 Control B salmon 12 0 12 0
5 Site 1 Treatment A parr 8 8 0 0
6 Site 1 Treatment A salmon 5 0 5 0
7 Site 1 Treatment B trout 15 0 0 15
8 Site 1 Treatment B salmon 9 0 9 0
數據
> dput(df)
structure(list(site = c("Site 1", "Site 1", "Site 1", "Site 1",
"Site 1", "Site 1", "Site 1", "Site 1"), treatment = c("Control",
"Control", "Control", "Control", "Treatment", "Treatment", "Treatment",
"Treatment"), section = c("A", "A", "B", "B", "A", "A", "B",
"B"), species = c("parr", "salmon", "trout", "salmon", "parr",
"salmon", "trout", "salmon"), fish_num = c(7L, 6L, 4L, 12L, 8L,
5L, 15L, 9L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.