繁体   English   中英

如何计算R中数据集中排列的出现次数?

[英]How to count the occurrence of permutations in a data set in R?

我有一个关于如何计算 R 中数据集中指定排列的出现的问题。

我目前正在研究连续葡萄糖监测数据集。 很快,每个数据集都有 1500 到 2000 个观察值(每个观察值是在 6 天内每 5 分钟测量一次的血浆葡萄糖值)。

我需要在数字刻度上计算连续 15 分钟或更长时间且小于 120 分钟的葡萄糖值低于 3.9 的出现(>3 个观察值和 <24 个观察值连续 <3.9 的值)。

对于血浆葡萄糖值是否低于 3.9,我创建了一个因子为 1 或 0 的新变量。

然后我想计算排列的出现次数 > 连续三个 1 和 < 连续二十四个 1。

R 中是否有此功能或最简单的方法是什么?

我不确定我的数据结构是否正确,但也许以下代码仍然可以提供帮助

我假设一个数据结构包括测量、人员 ID 和测量 ID。

library(dplyr)
# create dumy-data
set.seed(123)
data_test = data.frame(measure = rnorm(100, 3.5,2), person_id = rep(1:10, each = 10), measure_id = rep(1:10, 10))

data_test$below_criterion = 0 # indicator for measures below crit-value
data_test$below_criterion[which(data_test$measure < 3.9)] = 1 # indicator for measures below crit-value

# indicator, that shows if the current measurement is the first one below crit_val in a possible series
# shift columns, to compare current value with previous one
data_test = data_test %>% group_by(person_id) %>% mutate(prev_below_crit = c(below_criterion[1], below_criterion[1:(n()-1)]))
data_test$start_of_run = 0 # create the indicator variable
data_test$start_of_run[which(data_test$below_criterion == 1 & data_test$prev_below_crit == 0)] = 1 # if current value is below crit and previous value is above, this is the start of a series
data_test = data_test %>% group_by(person_id) %>% mutate(grouper = cumsum(start_of_run)) # helper-variable to group all the possible series within a person

data_test = data_test %>% select(measure, person_id, measure_id, below_criterion, grouper) # get rid of the previous created helper-variables

data_results = data_test %>% group_by(person_id, grouper) %>% summarise(count_below_crit = sum(below_criterion)) # count the length of each series by summing up all below_crit indicators within a person and series

data_results = data_results %>% group_by(person_id) %>% filter(count_below_crit >= 3 & count_below_crit <=24) %>% summarise(n()) # count all series within a desired length for each person
data_results

data.frame(data_test)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM