[英]Subset a dataset in R using part of one column
To elaborate on my previous question: I am looking to subset a large csv dataset in R. I want to take information for the column "timestamp" and extract only the time intervals 7pm to 12 am, inclusive. 详细说明我的上一个问题:我想在R中对大型csv数据集进行子集化。我想获取“时间戳”列的信息,并仅提取7pm到12 am(含)之间的时间间隔。 Below is an example of the data:
以下是数据示例:
Deer ID TimeStamp Location
1 4/16/18 12:00AM DMA 1
2 4/16/18 3:00AM DMA 1
3 4/16/18 9:30AM DMA 2
4 4/16/18 7:00PM DMA 1
5 4/16/18 8:30PM DMA 2
6 4/16/18 11:00PM DMA 3
7 4/17/18 1:30AM DMA 2
8 4/17/18 5:00AM DMA 1
9 4/17/18 9:00PM DMA 3
10 4/17/18 11:30PM DMA 1
11 4/18/18 12:30AM DMA 2
So my end goal is to end up with wit the following: 因此,我的最终目标是最终获得以下收益:
Deer ID TimeStamp Location
4 4/16/18 7:00PM DMA 1
5 4/16/18 8:30PM DMA 2
6 4/16/18 11:00PM DMA 3
9 4/17/18 9:00PM DMA 3
10 4/17/18 11:30PM DMA 1
Any ideas on how to accomplish this? 关于如何做到这一点的任何想法? Thank you!
谢谢!
You could do the following 您可以执行以下操作
# Convert TimeStamp to POSIXct
df <- transform(df, TimeStamp = strptime(TimeStamp, "%m/%d/%Y %I:%M%p"))
# Use lubridate::hour to extract the hours from the POSIXct timestamp
library(lubridate)
df[(hour(df$TimeStamp) >= 19 & hour(df$TimeStamp) <= 24), ]
# Deer.ID TimeStamp Location
#4 4 0018-04-16 19:00:00 DMA 1
#5 5 0018-04-16 20:30:00 DMA 2
#6 6 0018-04-16 23:00:00 DMA 3
#9 9 0018-04-17 21:00:00 DMA 3
#10 10 0018-04-17 23:30:00 DMA 1
df <- read.table(text =
"'Deer ID' TimeStamp Location
1 '4/16/18 12:00AM' 'DMA 1'
2 '4/16/18 3:00AM' 'DMA 1'
3 '4/16/18 9:30AM' 'DMA 2'
4 '4/16/18 7:00PM' 'DMA 1'
5 '4/16/18 8:30PM' 'DMA 2'
6 '4/16/18 11:00PM' 'DMA 3'
7 '4/17/18 1:30AM' 'DMA 2'
8 '4/17/18 5:00AM' 'DMA 1'
9 '4/17/18 9:00PM' 'DMA 3'
10 '4/17/18 11:30PM' 'DMA 1'
11 '4/18/18 12:30AM' 'DMA 2'", header = T)
The tidyverse way would be something along these lines: 整洁的方式将遵循以下原则:
library(dplyr)
df <- read.table(
text =
"id timestamp location
1 '4/16/18 12:00AM' 'DMA 1'
2 '4/16/18 3:00AM' 'DMA 1'
3 '4/16/18 9:30AM' 'DMA 2'
4 '4/16/18 7:00PM' 'DMA 1'
5 '4/16/18 8:30PM' 'DMA 2'
6 '4/16/18 11:00PM' 'DMA 3'
7 '4/17/18 1:30AM' 'DMA 2'
8 '4/17/18 5:00AM' 'DMA 1'
9 '4/17/18 9:00PM' 'DMA 3'
10 '4/17/18 11:30PM' 'DMA 1'
11 '4/18/18 12:30AM' 'DMA 2'",
header = TRUE
) %>%
as_tibble()
df %>%
mutate(timestamp = as.POSIXct(strptime(.data$timestamp, "%m/%d/%Y %I:%M%p"))) %>%
filter(between(lubridate::hour(.data$timestamp), 19, 24))
#> # A tibble: 5 x 3
#> id timestamp location
#> <int> <dttm> <chr>
#> 1 4 0018-04-16 19:00:00 DMA 1
#> 2 5 0018-04-16 20:30:00 DMA 2
#> 3 6 0018-04-16 23:00:00 DMA 3
#> 4 9 0018-04-17 21:00:00 DMA 3
#> 5 10 0018-04-17 23:30:00 DMA 1
Created on 2019-02-19 by the reprex package (v0.2.1) 由reprex软件包 (v0.2.1)创建于2019-02-19
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.