[英]R strip split a column in dataframe
I have a 'data' frame, with multiple columns, one of them being 'Runtime' which has data in two formats: 我有一个'数据'框架,有多列,其中一个是'Runtime',它有两种格式的数据:
Runtime
1 h 10 min
67 min
1 h 0 min
86 min
97 min
I want to convert all of them into Minutes. 我想将它们全部转换成分钟。 Have tried 'strsplit' and 'strip_split_fixed'.
尝试'strsplit'和'strip_split_fixed'。 Can anyone show me a way to achieve my goal, split or any other method?
谁能告诉我一个实现目标,分裂或任何其他方法的方法?
Thank you in advance ! 先感谢您 !
I think I saw this kind of solution somewhere. 我想我在某个地方看到了这种解决方案。 Don't hit me.
不要打我。
df = data.frame(Runtime = c('1 h 10 min', '67 min', '1 h 0 min', '86 min', '97 min'))
df$exp <- gsub("h", "* 60 +", df$Runtime)
df$exp <- gsub("min", "* 1", df$exp)
sapply(df$exp, FUN = function(x) eval(parse(text = x)))
1 * 60 + 10 * 1 67 * 1 1 * 60 + 0 * 1 86 * 1 97 * 1
70 67 60 86 97
You can get it one call using gsubfn
and regex: 您可以使用
gsubfn
和regex进行一次调用:
library(gsubfn)
gsubfn("^(?:(\\d+)\\s*h)?\\s*(\\d+)\\s*min.*$",
~ sum(as.numeric(x) * 60, as.numeric(y), as.numeric(z), na.rm=TRUE), x)
#[1] "70" "67" "60" "86" "97"
Here's an example of how you can do it: 这是一个如何做到这一点的例子:
# setting up your data.frame of interest
df = data.frame(Runtime = c('1 h 10 min', '67 min', '1 h 0 min', '86 min', '97 min'))
df$Runtime = gsub(' min', '', df$Runtime) # remove the min labels
hrs = grepl('h', x = df$Runtime) # which values are in an "x h y min" format?
runtime_sub = sapply(strsplit(df[hrs, 'Runtime'], ' h '), function(i) sum(as.numeric(i) * c(60, 1))) # convert the "x h y min" entries into numeric values in minutes
df$Runtime = as.numeric(df$Runtime) # convert the vector to numeric (yes, it's supposed to return a warning. Ignore it.
df[hrs, 'Runtime'] = runtime_sub # add the converted values
This results in: 这导致:
Runtime
1 70
2 67
3 60
4 86
5 97
1) Read df[[1]]
and if the third column is NA then the first column gives the minutes; 1)读取
df[[1]]
,如果第三列是NA,则第一列给出分钟; otherwise, 60 times the first column plus the third column gives the minutes: 否则,第一列加上第三列的60倍给出分钟:
with(read.table(text = as.character(df[[1]]), fill = TRUE),
ifelse(is.na(V3), V1, 60*V1 + V3))
## [1] 70 67 60 86 97
2) A variation is to paste "0 h" at the beginning of each component that does not have an h giving hm
and read that computing 60 times the first column plus the third column. 2)一种变化是在每个没有给出
hm
组件的开头粘贴“ hm
并读取计算第一列加第三列的60倍。
hm <- paste(ifelse(grepl("h", df[[1]]), "", "0 h"), df[[1]])
with(read.table(text = hm), 60 * V1 + V3)
## [1] 70 67 60 86 97
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.