使用一列的一部分在R中子集数据集

Question

To elaborate on my previous question: I am looking to subset a large csv dataset in R. I want to take information for the column "timestamp" and extract only the time intervals 7pm to 12 am, inclusive. 详细说明我的上一个问题：我想在R中对大型csv数据集进行子集化。我想获取“时间戳”列的信息，并仅提取7pm到12 am（含）之间的时间间隔。 Below is an example of the data: 以下是数据示例：

Deer ID    TimeStamp         Location
1          4/16/18 12:00AM   DMA 1
2          4/16/18 3:00AM    DMA 1
3          4/16/18 9:30AM    DMA 2
4          4/16/18 7:00PM    DMA 1
5          4/16/18 8:30PM    DMA 2
6          4/16/18 11:00PM   DMA 3
7          4/17/18 1:30AM    DMA 2
8          4/17/18 5:00AM    DMA 1
9          4/17/18 9:00PM    DMA 3
10         4/17/18 11:30PM   DMA 1
11         4/18/18 12:30AM   DMA 2

So my end goal is to end up with wit the following: 因此，我的最终目标是最终获得以下收益：

Deer ID     TimeStamp        Location
4           4/16/18 7:00PM   DMA 1
5           4/16/18 8:30PM   DMA 2
6           4/16/18 11:00PM  DMA 3
9           4/17/18 9:00PM   DMA 3
10          4/17/18 11:30PM  DMA 1

Any ideas on how to accomplish this? 关于如何做到这一点的任何想法？ Thank you! 谢谢！

Answer 1

You could do the following 您可以执行以下操作

# Convert TimeStamp to POSIXct 
df <- transform(df, TimeStamp = strptime(TimeStamp, "%m/%d/%Y %I:%M%p"))

# Use lubridate::hour to extract the hours from the POSIXct timestamp
library(lubridate)
df[(hour(df$TimeStamp) >= 19 & hour(df$TimeStamp) <= 24), ]
#   Deer.ID           TimeStamp Location
#4        4 0018-04-16 19:00:00    DMA 1
#5        5 0018-04-16 20:30:00    DMA 2
#6        6 0018-04-16 23:00:00    DMA 3
#9        9 0018-04-17 21:00:00    DMA 3
#10      10 0018-04-17 23:30:00    DMA 1

Sample data 样本数据

df <- read.table(text =
    "'Deer ID'    TimeStamp         Location
1          '4/16/18 12:00AM'   'DMA 1'
2          '4/16/18 3:00AM'    'DMA 1'
3          '4/16/18 9:30AM'    'DMA 2'
4          '4/16/18 7:00PM'    'DMA 1'
5          '4/16/18 8:30PM'    'DMA 2'
6          '4/16/18 11:00PM'   'DMA 3'
7          '4/17/18 1:30AM'    'DMA 2'
8          '4/17/18 5:00AM'    'DMA 1'
9          '4/17/18 9:00PM'    'DMA 3'
10         '4/17/18 11:30PM'   'DMA 1'
11         '4/18/18 12:30AM'   'DMA 2'", header = T)

Answer 2

The tidyverse way would be something along these lines: 整洁的方式将遵循以下原则：

library(dplyr)
df <- read.table(
  text =
    "id    timestamp         location
                 1          '4/16/18 12:00AM'   'DMA 1'
                 2          '4/16/18 3:00AM'    'DMA 1'
                 3          '4/16/18 9:30AM'    'DMA 2'
                 4          '4/16/18 7:00PM'    'DMA 1'
                 5          '4/16/18 8:30PM'    'DMA 2'
                 6          '4/16/18 11:00PM'   'DMA 3'
                 7          '4/17/18 1:30AM'    'DMA 2'
                 8          '4/17/18 5:00AM'    'DMA 1'
                 9          '4/17/18 9:00PM'    'DMA 3'
                 10         '4/17/18 11:30PM'   'DMA 1'
                 11         '4/18/18 12:30AM'   'DMA 2'",
  header = TRUE
) %>%
  as_tibble()

df %>%
  mutate(timestamp = as.POSIXct(strptime(.data$timestamp, "%m/%d/%Y %I:%M%p"))) %>%
  filter(between(lubridate::hour(.data$timestamp), 19, 24))
#> # A tibble: 5 x 3
#>      id timestamp           location
#>   <int> <dttm>              <chr>   
#> 1     4 0018-04-16 19:00:00 DMA 1   
#> 2     5 0018-04-16 20:30:00 DMA 2   
#> 3     6 0018-04-16 23:00:00 DMA 3   
#> 4     9 0018-04-17 21:00:00 DMA 3   
#> 5    10 0018-04-17 23:30:00 DMA 1

^{Created on 2019-02-19 by the reprex package (v0.2.1)} ^{由reprex软件包（v0.2.1）创建于2019-02-19}

使用一列的一部分在R中子集数据集

问题描述

2 个解决方案

解决方案1
2 2019-02-18 21:18:23

Sample data 样本数据

解决方案2
0 2019-02-19 00:16:41

使用一列的一部分在R中子集数据集

问题描述

2 个解决方案

解决方案1 2 2019-02-18 21:18:23

Sample data 样本数据

解决方案2 0 2019-02-19 00:16:41

解决方案1
2 2019-02-18 21:18:23

解决方案2
0 2019-02-19 00:16:41