简体   繁体   中英

How to subset by only taking infromation from thsoe who have a certain value in a column

I have a very large data set and I would like to create a new table that only has information from the columns that equal a certain number. This is a fake data set but lets call it mydata. example data

My actual data set is much larger than this but this is basically what I want to see

我们可以根据 'V4' 中的 '0' 值对数据subset的行进行subset ,同时select列 1 到 4

subset(df1, V4 == 0, select = 1:4)

If your data file is very large and you only want the rows that match a certain criterion, package sqldf can filter while it reads in the data.

Here is an example use case. I will create a binary column in built in data set iris and write the new table to disk.

library(sqldf)

set.seed(1234)
iris1 <- iris
iris1$V4 <- rbinom(nrow(iris1), 1, 0.5)
write.table(iris1, "iris3.dat", sep = ",", quote = FALSE, row.names = FALSE)

Now read the data in and filter only the rows where V4 == 0 .

# set up file connection
iris3 <- file("iris3.dat")
df1 <- sqldf('select * from iris3 where "V4" = 0')
close(iris3)

Compare with the result of subset .

df2 <- subset(iris1, V4 == 0)
row.names(df2) <- NULL
all.equal(df1, df2)
#[1] "Component “Species”: Modes: character, numeric"                      
#[2] "Component “Species”: Attributes: < target is NULL, current is list >"
#[3] "Component “Species”: target is character, current is factor"

Final clean up.

unlink("iris3.dat")
rm(iris1, df1, df2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM