I have a very big data frame (121920 obs of 7 variables). All variables are Factors. Data frame looks like this (with many more rows and different levels for each variable):
metaDATA:
SITE SOIL TIME HOST TISSUE TEMP MEDIA
MSHM1 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM2 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM3 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM4 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM5 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM6 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM7 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM8 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
I want to merge every 4 row into 1 row in a new data frame. Something like this:
MSHM1 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM4 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM8 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM12 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
Or only keep 1 of every 4 rows since they have the same level of each variable.
I tried :
S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
D[i,1]<-noquote(paste(metaDATA[S1[i]:S2[i],1]))
D[i,2]<-noquote(paste(metaDATA[S1[i]:S2[i],2]))
D[i,3]<-noquote(paste(metaDATA[S1[i]:S2[i],3]))
D[i,4]<-noquote(paste(metaDATA[S1[i]:S2[i],4]))
D[i,5]<-noquote(paste(metaDATA[S1[i]:S2[i],5]))
D[i,6]<-noquote(paste(metaDATA[S1[i]:S2[i],6]))
D[i,7]<-noquote(paste(metaDATA[S1[i]:S2[i],7]))
}
But this did not work and I got this error:
Error in D[i, 6] <- noquote(paste(metaDATA[S1[i]:S2[i], 6])) :
number of items to replace is not a multiple of replacement length
assuming your date is named df
, try
newdf <- df[ c(TRUE, rep(FALSE,3) ), ]
keeps the first row, skips 3, keeps the fifth row, skips 3, etc...
We can use %%
(modulo) to create the row indices for subsetting:
D <- df[(1:nrow(df)%%4) == 1, ]
Output:
> (1:nrow(mtcars)%%4)
[1] 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
> mtcars[(1:nrow(mtcars)%%4)==1,]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
According to your problem, let's say you want to take 1 row after skipping 3 rows
take = 1
skip = 3
total = nrow(df)
reps = total %/% (skip + take)
index = rep(0:(reps-1), each = take) * (skip + take) + 1
The value of index is
# Assuming nrow(df) = 100
[1] 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
Now, you can get your subset:
subset = df[index, ]
or you just create a vector to index the rows that you want to extract
index_<-seq(1, nrow(df), by = 4)
df[index_,]
With help from a friend, I found a code which worked as I wanted. Here is the code that I used:
S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
D[i,1]<-noquote(paste(data[S2[i],1]))
D[i,2]<-noquote(paste(data[S2[i],2]))
D[i,3]<-noquote(paste(data[S2[i],3]))
D[i,4]<-noquote(paste(data[S2[i],4]))
D[i,5]<-noquote(paste(data[S2[i],5]))
D[i,6]<-noquote(paste(data[S2[i],6]))
D[i,7]<-noquote(paste(data[S2[i],7]))
}
This kept every 4th row of my original data frame and gave me a new matrix. Thanks.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.