简体   繁体   English

创建小标题(或数据框)的列,其中包含长格式小标题的列表

[英]Create column of a tibble (or data frame) that contains a list from a long-format tibble

I have objects that have varying numbers of events at varying times. 我的对象在不同的​​时间具有不同数量的事件。 This is currently stored in a long format (using tibbles from library(tidyverse)) : 当前以长格式存储(使用来自library(tidyverse)的小标题):

timing_tbl <- tibble(ID = c(101,101,101,102,102,103,103,103,103),
                     event_time = c(0,4,8,0,6,0,4,9,12))

The real data has thousands of objects, with up to 50 or so events, so I want to make this process as efficient as possible. 实际数据具有数千个对象,最多包含50个左右的事件,因此我想使此过程尽可能高效。

I would like to convert this to a pseudo-wide format, where the first column is the patient ID, and the second column is a list of the event times for that object. 我想将其转换为伪宽格式,其中第一列是患者ID,第二列是该对象的事件时间列表。 I can do that where the second column is a column of tibbles in the following way 我可以按照以下方式在第二列是小标题列的情况下执行此操作

tmp <- lapply(unique(timing_tbl$ID),
               function(x) timing_tbl[timing_tbl$ID == x, "event_time"])

timing_tbl2 <- tibble(unique(timing_tbl$ID),tmp)

> timing_tbl2[1,2]
# A tibble: 1 x 1
  tmp             
  <list>          
1 <tibble [3 × 1]>
> timing_tbl2[[1,2]]
# A tibble: 3 x 1
  event_time
       <dbl>
1       0   
2       4.00
3       8.00

I would prefer to store these objects as lists, as I then want to find the “distance” between each pair of objects using the following function, and I worry that extracting the vector from the list adds unnecessary processing, slowing down the calculation. 我宁愿将这些对象存储为列表,然后使用以下函数查找每对对象之间的“距离”,并且我担心从列表中提取向量会增加不必要的处理,从而降低了计算速度。

lap_exp2 <- function(x,y,tau) {
  exp(-abs(x - y)/tau)
}

distance_lap2 <- function(vec1,vec2,tau) {
  ## vec1 is first list of event times
  ## vec2 is second list of event times
  ## tau is the decay parameter
  0.5*(sum(outer(vec1,vec1,FUN=lap_exp2, tau = tau)) +
       sum(outer(vec2,vec2,FUN=lap_exp2, tau = tau))
       ) -
       sum(outer(vec1,vec2,FUN=lap_exp2, tau = tau))

}

distance_lap2(timing_tbl2[[1,2]]$event_time,timing_tbl2[[2,2]]$event_time,2)
[1] 0.8995764

If I try extracting the list instead of the tibble using [[ 如果我尝试使用[[

tmp <- lapply(unique(timing_tbl$ID),
               function(x) timing_tbl[[timing_tbl$ID == x, "event_time"]])

I get the following error, which makes sense 我收到以下错误,这很有意义

Error in col[[i, exact = exact]] : attempt to select more than one element in vectorIndex

Is there a reasonably simple way I can extract the column from the long tibble as a list and store it in the new tibble? 有没有一种合理的简单方法可以将长标题中的列提取为列表并将其存储在新标题中? Is this even the right way to go about this? 这甚至是正确的解决方法吗?

I've found using tidyr::nest a good way to generate the 'list columns' I think you may be after (especially for stuffing in time series-ish sort of data). 我发现使用tidyr::nest是生成“列表列”的好方法,我想您可能tidyr::nest (尤其是用于填充类似时间序列的数据)。 Hope the following helps! 希望以下帮助!

library(dplyr)
library(tidyr)
library(purrr)

timing_tbl <- tibble(ID = c(101,101,101,102,102,103,103,103,103),
                     event_time = c(0,4,8,0,6,0,4,9,12))

ID_times <-
    timing_tbl %>%
    group_by(ID) %>%
    nest(.key = "times_df") %>%
    split(.$ID) %>%
    map(~ .$times_df %>% unlist(use.names = F))

# > ID_times
# $`101`
# [1] 0 4 8

# $`102`
# [1] 0 6

# $`103`
# [1]  0  4  9 12

dists_long <-
    names(ID_times) %>% 
    expand.grid(IDx = ., IDy = .) %>%
    filter(IDx != IDy) %>%
    rowwise() %>% 
    mutate(dist = distance_lap2(vec1 = ID_times[[IDx]], vec2 = ID_times[[IDy]], tau = 2))

# # A tibble: 6 x 3
#   IDx   IDy    dist
#   <fct> <fct> <dbl>
# 1 102   101   0.900
# 2 103   101   0.981
# 3 101   102   0.900
# 4 103   102   1.68 
# 5 101   103   0.981
# 6 102   103   1.68 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM