简体   繁体   English

重复识别日期/日期时间范围内的最小和最大数值

[英]Identify Min & Max Numeric Value within Date/Datetime range repeatedly

I am completely new to R so this is proving too complex to handle for me right now, so any help is much appreciated.我是 R 的新手,所以这对我来说太复杂了,现在无法处理,所以非常感谢任何帮助。

I am analysing price action data for BTC.我正在分析 BTC 的价格行为数据。 I have 1 minute candles from 2019-09-08 19:13:00 to 2022-03-15 00:22:00 with the variables of open, high, low, close price as well as volume in BTC & USD and trade count for each of those minutes.我有从 2019-09-08 19:13:00 到 2022-03-15 00:22:00 的 1 分钟蜡烛,变量包括开盘价、最高价、最低价、收盘价以及 BTC 和美元的交易量和交易数量对于每一分钟。 Data source is https://www.cryptodatadownload.com/data/binance/ for anyone interested.任何感兴趣的人的数据源是https://www.cryptodatadownload.com/data/binance/

I cleaned up & correctly formatted the data and now want to analyse when BTC price made a low & high for various date & time ranges, for example:我清理并正确格式化了数据,现在想分析 BTC 价格何时在各种日期和时间范围内出现低点和高点,例如:

What time of day in 30 minute increments did BTC made a low for the week? BTC 以 30 分钟为增量在一天中的什么时间创下本周的低点?

Here is what I believe I need to do: I need to tell R that 30 minutes is a range and identify the lowest & highest value for the "Low" and "High" variables within in as well as that a day is a range and within that the lowest & highest value for the "Low" and "High" variables as well as define a week as a range and within that the lowest & highest value for the "Low" and "High" variables.这是我认为我需要做的事情:我需要告诉 R 30 分钟是一个范围,并确定其中“低”和“高”变量的最低值和最高值,以及一天是一个范围和在此范围内,“低”和“高”变量的最低和最高值以及将一周定义为一个范围,并在此范围内定义“低”和“高”变量的最低和最高值。 Then I'd need to mark these values, the best method I can think of would be creating a new variable and have it as a TRUE/FALSE column like so:然后我需要标记这些值,我能想到的最好方法是创建一个新变量并将其作为 TRUE/FALSE 列,如下所示:

btcusdt_binance_fut_1min$pa.low.of.week.30min
btcusdt_binance_fut_1min$pa.high.of.week.30min

Every minute row that is within that 30min low and high will be marked TRUE and every other minute within that week will be marked FALSE.该 30 分钟低点和高点内的每一分钟行都将标记为 TRUE,而该周内每隔一分钟将标记为 FALSE。

I looked at lubridate's interval() function but as far as I know the problem is I'd need to define each year, month, week, day, 30mins interval individually with start and end time, which is obviously not feasible.我查看了 lubridate 的 interval() function 但据我所知,问题是我需要分别定义每年、每月、每周、每天、30 分钟的间隔以及开始和结束时间,这显然是不可行的。 I believe I run into the same problem with the subset() function.我相信我遇到了与 subset() function 相同的问题。

Another option seems to be the seq() and seq.POSIXt() functions as well as the range() function, but I haven't found a way for it.另一个选项似乎是 seq() 和 seq.POSIXt() 函数以及 range() function,但我还没有找到解决方法。

Here is all my code and I am using this data set: https://www.cryptodatadownload.com/cdd/BTCUSDT_Binance_futures_data_minute.csv这是我所有的代码,我正在使用这个数据集: https://www.cryptodatadownload.com/cdd/BTCUSDT_Binance_futures_data_minute.csv

library(readr)
library(lubridate)
library(tidyverse)
library(plyr)
library(dplyr)


# IMPORT CSV FILE AS DATA SET

# Name data set & choose import file
# Skip = 1 for skipping first row of CSV
btcusdt_binance_fut_1min <-
  read.csv(
    file.choose(),
    skip = 1,
    header = T,
    sep = ","
  )


# CLEAN UP & REORGANISE DATA

# Remove unix & symbol column
btcusdt_binance_fut_1min$unix = NULL
btcusdt_binance_fut_1min$symbol = NULL

# Rename date column to datetime
colnames(btcusdt_binance_fut_1min)[colnames(btcusdt_binance_fut_1min) == "date"] <-
  "datetime"

# Convert datetime column to POSIXct format
btcusdt_binance_fut_1min$datetime <-
  as_datetime(btcusdt_binance_fut_1min$datetime, tz = "UTC")

# Create variable column for each time element
btcusdt_binance_fut_1min$year <-
  year(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$month <-
  month(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$week <-
  isoweek(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$weekday <-
  wday(btcusdt_binance_fut_1min$datetime,
       label = TRUE,
       abbr = FALSE)
btcusdt_binance_fut_1min$hour <-
  hour(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$minute <-
  minute(btcusdt_binance_fut_1min$datetime)

# Reorder columns
btcusdt_binance_fut_1min <-
  btcusdt_binance_fut_1min[, c(1, 9, 10, 11, 12, 13, 14, 4, 3, 2, 5, 6, 7, 8)]

Using data.table we can do the following:使用data.table我们可以执行以下操作:

btcusdt_binance_fut_1min <- data.table(datetime = seq.POSIXt(as.POSIXct("2022-01-01 0:00"), as.POSIXct("2022-01-01 2:59"), by = "1 min"))
btcusdt_binance_fut_1min[, group := format(as.POSIXct(cut(datetime, breaks = "30 min")), "%H:%M")]

the cut function will "floor" each datetime to it's nearest, smaller, half an hour. cut function 会将每个日期时间“下限”到最接近的、更小的半小时。 The format and as.POSIXct are just there to remove the date part to allow for easy comparing between dates for the same half hours, but if you prefer to keep it a datetime you can remove these functions. formatas.POSIXct只是为了删除日期部分,以便轻松比较相同半小时的日期,但如果您希望将其保留为日期时间,则可以删除这些功能。

After this the next steps are pretty straightforward:在此之后,接下来的步骤非常简单:

btcusdt_binance_fut_1min[, .(High = max(High), Low = min(Low)), by=.(group)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM