简体   繁体   中英

How to merge rows in a dataframe in R?

I have a very big data frame (121920 obs of 7 variables). All variables are Factors. Data frame looks like this (with many more rows and different levels for each variable):

metaDATA:

         SITE        SOIL        TIME                      HOST TISSUE TEMP             MEDIA
MSHM1                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM2                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM3                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM4                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM5                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM6                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM7                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM8                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA

I want to merge every 4 row into 1 row in a new data frame. Something like this:

MSHM1                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM4                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM8                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM12                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA

Or only keep 1 of every 4 rows since they have the same level of each variable.

I tried :

S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
  D[i,1]<-noquote(paste(metaDATA[S1[i]:S2[i],1]))
  D[i,2]<-noquote(paste(metaDATA[S1[i]:S2[i],2]))
  D[i,3]<-noquote(paste(metaDATA[S1[i]:S2[i],3]))
  D[i,4]<-noquote(paste(metaDATA[S1[i]:S2[i],4]))
  D[i,5]<-noquote(paste(metaDATA[S1[i]:S2[i],5]))
  D[i,6]<-noquote(paste(metaDATA[S1[i]:S2[i],6]))
  D[i,7]<-noquote(paste(metaDATA[S1[i]:S2[i],7]))
  }

But this did not work and I got this error:

Error in D[i, 6] <- noquote(paste(metaDATA[S1[i]:S2[i], 6])) : 
  number of items to replace is not a multiple of replacement length

assuming your date is named df , try

newdf <- df[ c(TRUE, rep(FALSE,3) ), ]

keeps the first row, skips 3, keeps the fifth row, skips 3, etc...

We can use %% (modulo) to create the row indices for subsetting:

D <- df[(1:nrow(df)%%4) == 1, ]

Output:

> (1:nrow(mtcars)%%4)
 [1] 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0

> mtcars[(1:nrow(mtcars)%%4)==1,]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4

According to your problem, let's say you want to take 1 row after skipping 3 rows

take = 1
skip = 3

total = nrow(df)
reps = total %/% (skip + take)
index = rep(0:(reps-1), each = take) * (skip + take) + 1

The value of index is

# Assuming nrow(df) = 100 
[1]  1  5  9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Now, you can get your subset:

subset = df[index, ]

or you just create a vector to index the rows that you want to extract

index_<-seq(1, nrow(df), by = 4)
df[index_,]

With help from a friend, I found a code which worked as I wanted. Here is the code that I used:

S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
  D[i,1]<-noquote(paste(data[S2[i],1]))
  D[i,2]<-noquote(paste(data[S2[i],2]))
  D[i,3]<-noquote(paste(data[S2[i],3]))
  D[i,4]<-noquote(paste(data[S2[i],4]))
  D[i,5]<-noquote(paste(data[S2[i],5]))
  D[i,6]<-noquote(paste(data[S2[i],6]))
  D[i,7]<-noquote(paste(data[S2[i],7]))
  }

This kept every 4th row of my original data frame and gave me a new matrix. Thanks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM