简体   繁体   中英

How to split rows counts using proportion dataframe

In a large set of species count data, I unfortunately recorded two similar species as the same thing and counted them together (I counted Sp2 instead of Sp2a and Sp2b). I revisited all samples and measured the proportion of that joint count that should be each species (eg for sample "north", Sp2 was counted 40 times, and i determined that 20% of that count should be Sp2a and 80% should be Sp2b.

Does anybody know how I might apply the proportion data in the chart dataframe

samples <- c("north", "west", "south")
sp2a_props <- c(.2, .3, .4)
sp2b_props <- c(.8, .7, .6)
chart <- data.frame(samples, sp2a_props, sp2b_props, stringsAsFactors = FALSE)
chart

to the relevant rows in the raw dataframe

samples <- c("north","north", "west","west","south", "south")
species <- c("Sp1", "Sp2", "Sp1", "Sp4", "Sp2", "Sp3")
counts <- c(20, 40, 30, 50, 30, 30)
raw <- data.frame(samples, species, counts, stringsAsFactors = FALSE)
raw

to get the desired new dataframe

samples <- c("north","north","north", "west","west","south", "south", "south")
species <- c("Sp1", "Sp2a", "Sp2b", "Sp1", "Sp4", "Sp2a", "Sp2b", "Sp3")
counts <- c(20, 8,32, 30, 50, 12, 18, 30)
desired_result <- data.frame(samples, species, counts)
desired_result

While the dummy data only splits Sp2 into 2 parts, I will also likely have to split certain lumped taxa into 3 parts.

Using dplyr and tidyr , you just need to do a little bit of manipulation and joining to get what you want.

First, reshaping chart from wide to long, and removing '_props' from the species title in preparation for a downstream join.

Second, manipulate the raw data frame to include a/b splits (use dplyr::case_when to address multiple splits). Separate those into rows, unite them with species to get the sp2a/sp2b , join that to the chart value to get the proportion, multiply counts by proportion if present and remove the proportion column.

library(dplyr)
library(tidyr)

chart <- chart  %>%
  gather(species, proportion, -samples) %>% 
  mutate(species = gsub("_props", "", species))

raw %>% 
  mutate(species = tolower(species)) %>% 
  mutate(split = ifelse(species == "sp2", "a,b", "")) %>% 
  separate_rows(split, sep = ",") %>% 
  unite(species, species, split, sep = "") %>% 
  left_join(chart) %>% 
  mutate(counts = ifelse(!is.na(proportion), counts * proportion, counts)) %>% 
  select(-proportion)

Results in:

  samples species counts
1   north     sp1     20
2   north    sp2a      8
3   north    sp2b     32
4    west     sp1     30
5    west     sp4     50
6   south    sp2a     12
7   south    sp2b     18
8   south     sp3     30

(if you want the species back to title case, I would use tools::toTitleCase )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM