简体   繁体   中英

R - deleting rows from data.frame

I am very new to r an programming and have a basic question (my first one on stackoverflow :) ) I want to delete some rows from a data.frame and use an if-statement on that account. My code is running but it is unfortunately not deleting the correct rows but instead every second row of my dataframe I think.

"myDataVergleich" is the name of my data.frame. "myData$QUESTNNR" is the column by which is decided whether the row is supposed to stay in the dataframe or not.

for(i in 1:nrow(myDataVergleich))
  {if(myData$QUESTNNR[i] != "t0_mathe" | myData$QUESTNNR[i] != "t0_bio" | myData$QUESTNNR[i] != "t0_allg2" |
     myData$QUESTNNR[i] != "t7_mathe_Version1" | myData$QUESTNNR[i] != "t7_bio_Version1") 
    {myDataVergleich <- myDataVergleich[-c(i),] }}

What am I doing wrong?

Welcome to stack overflow and to R. I think your intuition is correct but there are some issues. First, you say your data is called 'myDataVergleich' but inside your loop you are accessing 'myData'. So you might need to change 'myData$QUESTNNR[i]' to 'myDataVergleich$QUESTNNR[i]' in the loop.

A great thing about R is that there are solutions people have figured out already for many problems, sub-setting a data frame by a condition is one of them. You should look into the tidyverse family of packages, especially dplyr in this case.

install.packages('dplyr')
install.packages('magrittr')

If you want to keep the rows with these strings this code will work

library(dplyr)
library(magrittr)
strings <- c(
  "t0_mathe", "t0_bio", "t0_allg2", "t7_mathe_Version1", "t7_bio_Version1"
)
filtered_data <- myDataVergleich %>%
  dplyr::filter(QUESTNNR %in% strings)

If you want to keep the rows that don't contain these strings this code will work

library(dplyr)
library(magrittr)
strings <- c(
  "t0_mathe", "t0_bio", "t0_allg2", "t7_mathe_Version1", "t7_bio_Version1"
)
filtered_data <- myDataVergleich %>%
  dplyr::filter(!QUESTNNR %in% strings)

Hope that helps

I would have to know the error, QUESTNNR %in% strings returns a TRUE or FALSE and adding the ! just returns the opposite, so that should word fine. You can detect part of a string with str_detect from the 'stringr' package.

library(dplyr)
library(stringr)
library(tibble)
library(magrittr)
df <- tibble(x = c('h', 'e', 'l', 'l', '0')) 
df %>% dplyr::filter(str_detect(x, 'l'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM