I have a very large data set and I would like to create a new table that only has information from the columns that equal a certain number. This is a fake data set but lets call it mydata. example data
My actual data set is much larger than this but this is basically what I want to see
我们可以根据 'V4' 中的 '0' 值对数据subset
的行进行subset
,同时select
列 1 到 4
subset(df1, V4 == 0, select = 1:4)
If your data file is very large and you only want the rows that match a certain criterion, package sqldf
can filter while it reads in the data.
Here is an example use case. I will create a binary column in built in data set iris
and write the new table to disk.
library(sqldf)
set.seed(1234)
iris1 <- iris
iris1$V4 <- rbinom(nrow(iris1), 1, 0.5)
write.table(iris1, "iris3.dat", sep = ",", quote = FALSE, row.names = FALSE)
Now read the data in and filter only the rows where V4 == 0
.
# set up file connection
iris3 <- file("iris3.dat")
df1 <- sqldf('select * from iris3 where "V4" = 0')
close(iris3)
Compare with the result of subset
.
df2 <- subset(iris1, V4 == 0)
row.names(df2) <- NULL
all.equal(df1, df2)
#[1] "Component “Species”: Modes: character, numeric"
#[2] "Component “Species”: Attributes: < target is NULL, current is list >"
#[3] "Component “Species”: target is character, current is factor"
Final clean up.
unlink("iris3.dat")
rm(iris1, df1, df2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.