簡體   English   中英

R逐元素計算,按組

[英]R element by element compute, by group

我正在嘗試按ID組進行計算。 想使用dplyr,但不是必需的。 在“歷史記錄”列中,我有一串數字(均相同,長度為36)。 我想應用該規則,逐個元素獲取最大(max)值,並為每個ID找出新的單一歷史記錄。 例如,對於ID = 1157,新的單個字符串將為432400000000000000000000000000000000000000,因為這些是該ID每個元素的最大值。 我想對所有ID(數千個)執行此操作。

     Id                              history
1  1157 101000000000000000000000000000000000
2  1157 000000000000000000000000000000000000
3  1157 432100000000000000000000000000000000
4  1157 321000000000000000000000000000000000
5  1157 000400000000000000000000000000000000
6  1157 432100000000000000000000000000000000
7  1157 211000000000000000000000000000000000
26 1351 000000000000000000000000000000000000
27 1351 000000000000000000000000000000000000
45 1351 000000000000000000000000000000000000
46 1351 000000000000000000000000000000000000
47 1351 000000000000000000000000000000000000
48 1351 000000000000000000000000000000000000
49 1351 000000000000000000000000000000000000
50 1351 000000000000000000000000000000000000
51 1351 000000000000000000000000000000000000
52 1351 000000000000000000000000000000000000
53 1351 000000000000000000000000000000000000
54 1351 000000000000000000000000000000000000
55 1351 000000000000000000000000000000000000

我們可以拆分每個字符上的每個history值,並創建一個列表列和group_by Id並使用pmax獲取每個位置上具有最大值的元素。

library(dplyr)
library(purrr)

df %>%
  mutate(new_col = map(history, ~strsplit(., "")[[1L]])) %>%
  group_by(Id) %>%
  summarise(temp = paste0(Reduce(pmax, new_col), collapse = ""))

#  Id    temp                                
# <int> <chr>                               
#1 1157  432400000000000000000000000000000000
#2 1351  000000000000000000000000000000000000

strsplit創建一個字符列表,由於我們使用的是map因此創建了另一個列表,因此輸出成為嵌套列表,我們通過使用[[1L]]避免了嵌套嵌套,因此strsplit輸出是字符向量而不是list。

new_col是一個列表列,使用Reduce我們比較組( Id )中的所有new_col值,並使用pmax逐個元素地選擇具有max的元素。

這里要注意的另一件事是我們將new_col作為字符向量的列表,這意味着1為“ 1”,2為“ 2”,依此類推。 理想情況下, new_col應該是用於比較目的的整數向量列表,但在這里我認為這沒關系,因為我們正在進行逐元素比較,並且其結果將與普通整數比較相同。 測試幾個

"2" > "1"
#[1] TRUE
"6" < "1"
#[1] FALSE

在基數R中使用相同的邏輯,這將是

stack(lapply(split(strsplit(df$history, ""), df$Id), function(x) 
              paste0(Reduce(pmax, x), collapse = "")))

#                                values  ind
#1 432400000000000000000000000000000000 1157
#2 000000000000000000000000000000000000 1351

數據

df <- structure(list(Id = c(1157L, 1157L, 1157L, 1157L, 1157L, 1157L, 
1157L, 1351L, 1351L, 1351L, 1351L, 1351L, 1351L, 1351L, 1351L, 
1351L, 1351L, 1351L, 1351L, 1351L), history = 
c("101000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"432100000000000000000000000000000000", 
"321000000000000000000000000000000000", 
"000400000000000000000000000000000000", 
"432100000000000000000000000000000000", 
"211000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
 "000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000")), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "26", "27", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55"), class = "data.frame")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM