[英]How to merge rows in a dataframe in R?
我有一個非常大的數據框(12 1920 obs,包含7個變量)。 所有變量都是因素。 數據框如下所示(每個變量具有更多行和不同級別):
元數據:
SITE SOIL TIME HOST TISSUE TEMP MEDIA
MSHM1 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM2 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM3 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM4 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM5 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM6 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM7 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM8 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
我想在新數據框中將每4行合並為1行。 像這樣:
MSHM1 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM4 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM8 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
MSHM12 Sorkhe Gypsum Soil 2016-Winter Acantholimon sp. Leaf 23 PDA
或僅保留每4行中的1行,因為它們的每個變量的級別相同。
我試過了 :
S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
D[i,1]<-noquote(paste(metaDATA[S1[i]:S2[i],1]))
D[i,2]<-noquote(paste(metaDATA[S1[i]:S2[i],2]))
D[i,3]<-noquote(paste(metaDATA[S1[i]:S2[i],3]))
D[i,4]<-noquote(paste(metaDATA[S1[i]:S2[i],4]))
D[i,5]<-noquote(paste(metaDATA[S1[i]:S2[i],5]))
D[i,6]<-noquote(paste(metaDATA[S1[i]:S2[i],6]))
D[i,7]<-noquote(paste(metaDATA[S1[i]:S2[i],7]))
}
但這沒有用,我得到了這個錯誤:
Error in D[i, 6] <- noquote(paste(metaDATA[S1[i]:S2[i], 6])) :
number of items to replace is not a multiple of replacement length
假設您的日期命名為df
,請嘗試
newdf <- df[ c(TRUE, rep(FALSE,3) ), ]
保留第一行,跳過3,保留第五行,跳過3,依此類推...
我們可以使用%%
(模)來創建用於子集的行索引:
D <- df[(1:nrow(df)%%4) == 1, ]
輸出:
> (1:nrow(mtcars)%%4)
[1] 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
> mtcars[(1:nrow(mtcars)%%4)==1,]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
根據您的問題,假設您要跳過3行后再進行1行
take = 1
skip = 3
total = nrow(df)
reps = total %/% (skip + take)
index = rep(0:(reps-1), each = take) * (skip + take) + 1
索引的值為
# Assuming nrow(df) = 100
[1] 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
現在,您可以獲取子集:
subset = df[index, ]
或者您只是創建一個向量來索引要提取的行
index_<-seq(1, nrow(df), by = 4)
df[index_,]
在朋友的幫助下,我找到了一個可以按需工作的代碼。 這是我使用的代碼:
S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
D[i,1]<-noquote(paste(data[S2[i],1]))
D[i,2]<-noquote(paste(data[S2[i],2]))
D[i,3]<-noquote(paste(data[S2[i],3]))
D[i,4]<-noquote(paste(data[S2[i],4]))
D[i,5]<-noquote(paste(data[S2[i],5]))
D[i,6]<-noquote(paste(data[S2[i],6]))
D[i,7]<-noquote(paste(data[S2[i],7]))
}
這保留了原始數據幀的第4行,並給了我一個新的矩陣。 謝謝。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.