简体   繁体   中英

Adding data to a dataframe based on groups

I'm working with bioinformatic data, with a gene in each row and statistics/metadata in the columns. Some genes are from the same organism which is indicated by column "ID", and I grouped the data on this variable.

data <- data %>%
  group_by(ID)

I want to add data from another file based on the ID (the grouping factor), so that rows with ID = a should have data from a file named a.gff and so on. The data I would like to add is from a.gff file containing gene locations. There is a gff file for ID=a, one for ID=b, one for ID=c etc named according to the ID (eg "a.gff").

What the data looks like:

Gene ID
CelA a
CelB a
Atl b
prT a
HUl c

Is there a way to implement a function to open a file for each ID grouping, do an operation and move onto the next ID?

I'm quite new to R, any help is much appreciated!

I think the easiest way to do this is by reading first all the .gff files. I'm not familiar with the format so my example will use the .csv extension. The following code reads all the files in the "dir" directory as a list column, then unnests it so is a regular tibble.

After that you can just left_join() using both tibbles and then group by ID .

library(tidyverse)

binded <- tibble(
    file = list.files("dir"), # can remove before the join
    location = list.files("dir", full.names = TRUE), # can remove before the join
    ID = str_remove(file, "\.csv"),
    df = map(location, read_csv)
) %>% 
    unnest(df)

data %>% 
    left_join(binded)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM