簡體   English   中英

如何在R中合並數據框中的行?

[英]How to merge rows in a dataframe in R?

我有一個非常大的數據框(12 1920 ob​​s,包含7個變量)。 所有變量都是因素。 數據框如下所示(每個變量具有更多行和不同級別):

元數據:

         SITE        SOIL        TIME                      HOST TISSUE TEMP             MEDIA
MSHM1                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM2                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM3                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM4                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM5                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM6                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM7                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM8                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA

我想在新數據框中將每4行合並為1行。 像這樣:

MSHM1                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM4                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM8                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA
MSHM12                  Sorkhe Gypsum Soil 2016-Winter          Acantholimon sp.   Leaf   23               PDA

或僅保留每4行中的1行,因為它們的每個變量的級別相同。

我試過了 :

S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
  D[i,1]<-noquote(paste(metaDATA[S1[i]:S2[i],1]))
  D[i,2]<-noquote(paste(metaDATA[S1[i]:S2[i],2]))
  D[i,3]<-noquote(paste(metaDATA[S1[i]:S2[i],3]))
  D[i,4]<-noquote(paste(metaDATA[S1[i]:S2[i],4]))
  D[i,5]<-noquote(paste(metaDATA[S1[i]:S2[i],5]))
  D[i,6]<-noquote(paste(metaDATA[S1[i]:S2[i],6]))
  D[i,7]<-noquote(paste(metaDATA[S1[i]:S2[i],7]))
  }

但這沒有用,我得到了這個錯誤:

Error in D[i, 6] <- noquote(paste(metaDATA[S1[i]:S2[i], 6])) : 
  number of items to replace is not a multiple of replacement length

假設您的日期命名為df ,請嘗試

newdf <- df[ c(TRUE, rep(FALSE,3) ), ]

保留第一行,跳過3,保留第五行,跳過3,依此類推...

我們可以使用%% (模)來創建用於子集的行索引:

D <- df[(1:nrow(df)%%4) == 1, ]

輸出:

> (1:nrow(mtcars)%%4)
 [1] 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0

> mtcars[(1:nrow(mtcars)%%4)==1,]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4

根據您的問題,假設您要跳過3行后再進行1行

take = 1
skip = 3

total = nrow(df)
reps = total %/% (skip + take)
index = rep(0:(reps-1), each = take) * (skip + take) + 1

索引的值為

# Assuming nrow(df) = 100 
[1]  1  5  9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

現在,您可以獲取子集:

subset = df[index, ]

或者您只是創建一個向量來索引要提取的行

index_<-seq(1, nrow(df), by = 4)
df[index_,]

在朋友的幫助下,我找到了一個可以按需工作的代碼。 這是我使用的代碼:

S1<-seq(1,121920,4)
S2<-seq(4,121920,4)
D<-matrix(0,length(S1),7)
for (i in 1:length(S1)) {
  D[i,1]<-noquote(paste(data[S2[i],1]))
  D[i,2]<-noquote(paste(data[S2[i],2]))
  D[i,3]<-noquote(paste(data[S2[i],3]))
  D[i,4]<-noquote(paste(data[S2[i],4]))
  D[i,5]<-noquote(paste(data[S2[i],5]))
  D[i,6]<-noquote(paste(data[S2[i],6]))
  D[i,7]<-noquote(paste(data[S2[i],7]))
  }

這保留了原始數據幀的第4行,並給了我一個新的矩陣。 謝謝。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM