简体   繁体   中英

In R, how to delete a series of cells based on the value from another cell?

I have a 300x300 df. Each row represents the data of a different patient, named with an id. The first 2 columns are the patients id´s. Column 'id_all' contains more participants than needed, and than 'id'. My goal is to keep only the patient information that corresponds with the first column, 'id'.

The first 9x9 looks like this:

id id_all MMSE BL MMSE 12 MMSE 24 MMSE 36 MMSE 48 MMSE 60 MMSE 72
aaa002 aaa000 22 18 NA NA NA NA NA
aaa003 aaa002 29 28 28 29 30 29 NA
aaa005 aaa003 30 29 30 30 30 NA NA
aaa024 aaa005 29 28 25 NA NA 25 NA
aaa026 aaa024 30 29 29 29 NA NA NA
aaa048 aaa026 28 30 28 27 30 30 NA
aaa095 aaa038 29 29 29 26 NA NA NA
aaa222 aaa048 30 29 29 28 28 29 NA

So, based on the first column, I would like to somehow iterate through the second column, and delete all the information of the participants who do not have a match with the first column, 'id'. This means, in the first row, deleting columns 2:300, and keeping the first column intact.

At the end, I want my df to look like this:

id id_all MMSE BL MMSE 12 MMSE 24 MMSE 36 MMSE 48 MMSE 60 MMSE 72
aaa002 aaa002 29 28 28 29 30 29 NA
aaa003 aaa003 30 29 30 30 30 NA NA
aaa005 aaa005 29 28 25 NA NA 25 NA
aaa024 aaa024 30 29 29 29 NA NA NA
aaa026 aaa026 28 30 28 27 30 30 NA
aaa048 aaa048 30 29 29 28 28 29 NA
aaa095 ...
aaa222 ...

In this example, id_all aaa000, and aaa038 dont have a match in the id column, so I would like the delete all the information about those participants. I can´t figure it out how to delete almost all the row (almost because only the first two cells remain), and to move all the cells up.

Another possibility will be, to move the first column down until id matches id_all, and then delete the whole row that has an empty cell in the id column.

Thanks!

In tidyverse:

library(dplyr)
df %>% 
  filter(id == id_all)

In base:

df[df$id == df$id_all, ]

I think this will serve the purpose

library(tidyverse)

df %>% filter(!is.na(match(id_all, df$id))) %>% select(-id)

  id_all MMSE.BL MMSE.12 MMSE.24 MMSE.36 MMSE.48 MMSE.60 MMSE.72
1 aaa002      29      28      28      29      30      29      NA
2 aaa003      30      29      30      30      30      NA      NA
3 aaa005      29      28      25      NA      NA      25      NA
4 aaa024      30      29      29      29      NA      NA      NA
5 aaa026      28      30      28      27      30      30      NA
6 aaa048      30      29      29      28      28      29      NA

Since id == id_all in the final data, there is no need for a duplicate column. If it is needed, it can be easily created by adding mutate(id = id_all) at the end of the above syntax

df %>% filter(!is.na(match(id_all, df$id))) %>% select(-id) %>% mutate(id = id_all) %>%
  select(id, everything())

      id id_all MMSE.BL MMSE.12 MMSE.24 MMSE.36 MMSE.48 MMSE.60 MMSE.72
1 aaa002 aaa002      29      28      28      29      30      29      NA
2 aaa003 aaa003      30      29      30      30      30      NA      NA
3 aaa005 aaa005      29      28      25      NA      NA      25      NA
4 aaa024 aaa024      30      29      29      29      NA      NA      NA
5 aaa026 aaa026      28      30      28      27      30      30      NA
6 aaa048 aaa048      30      29      29      28      28      29      NA

dput of df used

df <- structure(list(id = c("aaa002", "aaa003", "aaa005", "aaa024", 
"aaa026", "aaa048", "aaa095", "aaa222"), id_all = c("aaa000", 
"aaa002", "aaa003", "aaa005", "aaa024", "aaa026", "aaa038", "aaa048"
), MMSE.BL = c(22L, 29L, 30L, 29L, 30L, 28L, 29L, 30L), MMSE.12 = c(18L, 
28L, 29L, 28L, 29L, 30L, 29L, 29L), MMSE.24 = c(NA, 28L, 30L, 
25L, 29L, 28L, 29L, 29L), MMSE.36 = c(NA, 29L, 30L, NA, 29L, 
27L, 26L, 28L), MMSE.48 = c(NA, 30L, 30L, NA, NA, 30L, NA, 28L
), MMSE.60 = c(NA, 29L, NA, 25L, NA, 30L, NA, 29L), MMSE.72 = c(NA, 
NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-8L))

> df
      id id_all MMSE.BL MMSE.12 MMSE.24 MMSE.36 MMSE.48 MMSE.60 MMSE.72
1 aaa002 aaa000      22      18      NA      NA      NA      NA      NA
2 aaa003 aaa002      29      28      28      29      30      29      NA
3 aaa005 aaa003      30      29      30      30      30      NA      NA
4 aaa024 aaa005      29      28      25      NA      NA      25      NA
5 aaa026 aaa024      30      29      29      29      NA      NA      NA
6 aaa048 aaa026      28      30      28      27      30      30      NA
7 aaa095 aaa038      29      29      29      26      NA      NA      NA
8 aaa222 aaa048      30      29      29      28      28      29      NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM