簡體   English   中英

將數據幀子集化為相等的子組塊

[英]Subset dataframe into equal subgroup chunks

我有df數據框,需要將子集划分為2個names塊。 從下面的示例中,有4個唯一的名稱: a,b,c,d 我需要將2個列矩陣a,bc,d子集。

輸出格式:

name1
item_value
item_value
...
END
name2
item_value
item_value
...
END

例:

#dummy data
df <- data.frame(name=sort(c(rep(letters[1:4],2),"a","a","c")),
                   item=round(runif(11,1,10)),
                   stringsAsFactors=FALSE)
#tried approach - split per name. I need to split per 2 names.
lapply(split(df,f=df$name),
       function(x) 
       {name <- unique(x$name)
        as.matrix(c(name,x[,2],"END"))
       })

#expected output
[,1] 
[1,] "a"  
[2,] "8"  
[3,] "9"  
[4,] "6"  
[5,] "4"  
[6,] "END"
[1,] "b"  
[2,] "2"  
[3,] "10" 
[4,] "END"

[,2] 
[1,] "c"  
[2,] "6"  
[3,] "6"  
[4,] "2"  
[5,] "END"
[1,] "d"  
[2,] "4"  
[3,] "1"  
[4,] "END"

注意:實際df有~300000行,有~35000個唯一名稱。

而不是從單個名稱創建列表,而是從data.frame的子集列中創建

res <- list("a_b" = c(df[df$name == "a",2],"END",df[df$name == "b", 2],"END"),
        "c_d" = c(df[df$name == "c",2],"END", df[df$name == "d", 2],"END"))

res2 <- vector(mode="list",length=2)
res2 <- sapply(1:(length(unique(df$name))/2),function(x) {
  sapply(seq(1,length(unique(df$name))-1,by=2), function(y) {
    name <- unique(df$name)
    res2[x] <- as.matrix(c(name[y],df[df$name == name[y],2],"END",name[y+1],df[df$name == name[y+1],2],"END"))
  })
})
answer <- res2[,1]

這給了我一個列表矩陣,因為有兩個sapply發生,我想你想要的一切都在res2 [,1]

你可以試試這個。

# for each 'name', "pad" 'item' with 'name' and 'END'
l1 <- lapply(split(df, f = df$name), function(x){
  name <- unique(x$name)
  as.matrix(c(name, x$item, "END")) 
  })

# create a sequence of numbers, to select two by two elements from the list
steps <- seq(from = 0, to = length(unique(df$name))/2, by = 2)

# loop over 'steps' to bind together list elements, two by two. 
l2 <- lapply(steps, function(x){
  do.call(rbind, l1[1:2 + x])
})

l2
# [[1]]
#      [,1] 
# [1,] "a"  
# [2,] "6"  
# [3,] "4"  
# [4,] "10" 
# [5,] "3"  
# [6,] "END"
# [7,] "b"  
# [8,] "6"  
# [9,] "7"  
# [10,] "END"
# 
# [[2]]
#     [,1] 
# [1,] "c"  
# [2,] "2"  
# [3,] "6"  
# [4,] "10" 
# [5,] "END"
# [6,] "d"  
# [7,] "5"  
# [8,] "4"  
# [9,] "END"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM