In a large set of species count data, I unfortunately recorded two similar species as the same thing and counted them together (I counted Sp2 instead of Sp2a and Sp2b). I revisited all samples and measured the proportion of that joint count that should be each species (eg for sample "north", Sp2 was counted 40 times, and i determined that 20% of that count should be Sp2a and 80% should be Sp2b.
Does anybody know how I might apply the proportion data in the chart dataframe
samples <- c("north", "west", "south")
sp2a_props <- c(.2, .3, .4)
sp2b_props <- c(.8, .7, .6)
chart <- data.frame(samples, sp2a_props, sp2b_props, stringsAsFactors = FALSE)
chart
to the relevant rows in the raw dataframe
samples <- c("north","north", "west","west","south", "south")
species <- c("Sp1", "Sp2", "Sp1", "Sp4", "Sp2", "Sp3")
counts <- c(20, 40, 30, 50, 30, 30)
raw <- data.frame(samples, species, counts, stringsAsFactors = FALSE)
raw
to get the desired new dataframe
samples <- c("north","north","north", "west","west","south", "south", "south")
species <- c("Sp1", "Sp2a", "Sp2b", "Sp1", "Sp4", "Sp2a", "Sp2b", "Sp3")
counts <- c(20, 8,32, 30, 50, 12, 18, 30)
desired_result <- data.frame(samples, species, counts)
desired_result
While the dummy data only splits Sp2 into 2 parts, I will also likely have to split certain lumped taxa into 3 parts.
Using dplyr
and tidyr
, you just need to do a little bit of manipulation and joining to get what you want.
First, reshaping chart from wide to long, and removing '_props' from the species title in preparation for a downstream join.
Second, manipulate the raw
data frame to include a/b splits (use dplyr::case_when
to address multiple splits). Separate those into rows, unite them with species to get the sp2a/sp2b
, join that to the chart value to get the proportion, multiply counts by proportion if present and remove the proportion column.
library(dplyr)
library(tidyr)
chart <- chart %>%
gather(species, proportion, -samples) %>%
mutate(species = gsub("_props", "", species))
raw %>%
mutate(species = tolower(species)) %>%
mutate(split = ifelse(species == "sp2", "a,b", "")) %>%
separate_rows(split, sep = ",") %>%
unite(species, species, split, sep = "") %>%
left_join(chart) %>%
mutate(counts = ifelse(!is.na(proportion), counts * proportion, counts)) %>%
select(-proportion)
Results in:
samples species counts
1 north sp1 20
2 north sp2a 8
3 north sp2b 32
4 west sp1 30
5 west sp4 50
6 south sp2a 12
7 south sp2b 18
8 south sp3 30
(if you want the species back to title case, I would use tools::toTitleCase
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.