[英]R: Calculating median value from time data series returns an error
从时间数据序列计算中值时,我遇到以下R问题。 有人可以理解为什么当需要计算诸如中值之类的简单事物时,R的表现如此奇怪吗?
1)将数据读入“ all_runners”数据框
all_runners <- read.csv("NEJ_21_km_results.csv", stringsAsFactors=FALSE, strip.white = TRUE)
“ RESULT”数据字段信息的数据类型为“ chr”
str(all_runners)
'data.frame': 100 obs. of 10 variables:
$ POS : int 1 2 3 4 5 6 7 8 9 10 ...
$ BIB : int 3 2 1 9 5 10 8 33 34 67 ...
$ NAME : chr "DOMINIC KIPTARUS" "TIIDREK NURME" "ROMAN FOSTI" "RAIDO MITT"...
$ YOB : int 1996 1985 1983 1991 1984 1982 1993 1992 1984 1996 ...
$ NAT : chr "KEN" "EST" "EST" "EST" ...
$ CITY : chr "" "" "" "" ...
$ RESULT : chr "01:03:55" "01:03:57" "01:06:18" "01:09:33" ...
$ BEHIND : chr "" "00:00:02" "00:02:23" "00:05:38" ...
$ NET.TIME: chr "01:03:55" "01:03:57" "01:06:18" "01:09:31"...
$ CAT : chr "MN" "M" "M" "M" ...
2)计算所有跑步者成绩的中位数
> all_runners_median = median(all_runners$RESULT, na.rm = TRUE)
警告消息:在mean.default(sort(x,部分=一半+ 0L:1L)[一半+ 0L:1L])中:参数不是数字或逻辑:返回NA
3)将时间值从char转换为数字
> results_to_numeric <- as.numeric(all_runners$RESULT)
警告消息:强制引入的NA
4)计算所有女性结果的中位数(“ N” =>女性,“ M” =>男性)
all_womens <- all_runners %>%
filter(str_sub(CAT, 1, 1) == "N") %>%
select(RESULT)
'RESULT'数据字段信息为'chr'数据类型
> str(all_womens)
'data.frame':8磅。 1个变量:$ RESULT:chr“ 01:18:36”“ 01:20:07”“ 01:22:52”“ 01:25:11” ...
警告消息:在mean.default(sort(x,部分=一半+ 0L:1L)[一半+ 0L:1L])中:参数不是数字或逻辑:返回NA
> all_womens
RESULT
1 01:18:36
2 01:20:07
3 01:22:52
4 01:25:11
5 01:26:04
6 01:26:09
7 01:26:42
8 01:26:55
这里如何按时应用median
:
# Get sample of 'Date/Time Type'
x <- c("01:03:55", "01:03:57", "01:06:18", "01:09:33")
# Convert to proper format
y <- as.POSIXct(x, format = "%H:%M:%S")
# Find the median
y <- median(y)
# Updated, no need to use strsplit and sapply, directly use format
# ys <- strsplit(as.character(y), split = " ")
# sapply(ys, function(x) x[2])
# Get the time
format(y,"%H:%M:%S" )
[1] "01:05:07"
当您使用as.POSIXct
,它将与日期关联。
编辑 :根据: Rich Scriven的建议,我们可以直接使用format
,它消除了使用拆分和循环的需要。
如果要按组(例如性别)执行分析,则可以简单地使用:
x <- c("01:03:55", "01:03:57", "01:06:18", "01:09:33")
df <- data.frame(Gender = rep(c("M", "F"), each = 4), time = x)
# > df
# Gender time
# 1 M 01:03:55
# 2 M 01:03:57
# 3 M 01:06:18
# 4 M 01:09:33
# 5 F 01:03:55
# 6 F 01:03:57
# 7 F 01:06:18
# 8 F 01:09:33
df$time <- as.POSIXct(x, format = "%H:%M:%S")
time_group_by_gender <- split(df$time, df$Gender )
# > time_group_by_gender
# $F
# [1] "2018-07-21 01:03:55 +03" "2018-07-21 01:03:57 +03" "2018-07-21 01:06:18 +03"
# [4] "2018-07-21 01:09:33 +03"
#
# $M
# [1] "2018-07-21 01:03:55 +03" "2018-07-21 01:03:57 +03" "2018-07-21 01:06:18 +03"
# [4] "2018-07-21 01:09:33 +03"
time_median <- lapply(time_group_by_gender, median)
time_median <- lapply(time_median, format, "%H:%M:%S")
# > time_median
# $F
# [1] "01:05:07"
#
# $M
# [1] "01:05:07"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.