简体   繁体   English

将新数据附加到 R 中的现有 csv 文件

[英]Append new data to existing csv file in R

I am working on a project where I need to graph 10 days worth of data from remote sites.我正在做一个项目,我需要从远程站点绘制 10 天的数据。

I am downloading new data every 30 minutes from a remote computer via FTP (data is written every half hour also).我每 30 分钟通过 FTP 从远程计算机下载新数据(数据也每半小时写入一次)。 The local (onsite) file path changes every month so I have a dynamic IP address based on the current date.本地(现场)文件路径每个月都会变化,所以我有一个基于当前日期的动态 IP 地址。

eg.例如。

/data/sitename/2020/July/data.csv /data/sitename/2020/July/data.csv
/data/sitename/2020/August/data.csv /data/sitename/2020/August/data.csv

My problem is at each new month the csv I am downloading will be in a new folder and when I FTP the new csv file, it will only contain data from the new month and not the previous months.我的问题是在每个新月份,我正在下载的 csv 将位于一个新文件夹中,当我通过 FTP 传输新的 csv 文件时,它将只包含新月份的数据,而不包含前几个月的数据。

I need to graph the last 10 days of data.我需要绘制过去 10 天的数据图表。 So what I'm hoping to do is download the new data every half hour and only append the newest records to the master data set.所以我希望做的是每半小时下载一次新数据,并且只将最新的记录追加到主数据集中。 Or is there a better way all together?或者有更好的方法吗?

What I (think I) need to do is download the csv into R, and append only the new data to a master file and remove the oldest records so as to only contain 10 days worth of data in the csv.我(认为我)需要做的是将 csv 下载到 R 中,并仅将新数据附加到主文件并删除最旧的记录,以便在 csv 中仅包含 10 天的数据。 I have searched everywhere but cannot seem to crack it.我到处搜索,但似乎无法破解它。

This seems like it should be so easy, maybe I am using the wrong search terms.这看起来应该很简单,也许我使用了错误的搜索词。

I would like the following pretty please (showed 10 lines of data, I'll need 480 for 10 days).我想要以下漂亮的东西(显示 10 行数据,我需要 480 行 10 天)。

INITIAL DATA初始数据

                        DateTime   Data1 Data2    Data3  Data4   Data5
641 2020-08-26T02:31:59.999+10:00 10.00      53.4 3.101   42 20.70
642 2020-08-26T03:01:59.999+10:00 11.11      52.0 2.778   44 20.70
643 2020-08-26T03:31:59.999+10:00  2.63     105.5 2.899   45 20.70
644 2020-08-26T04:01:59.999+10:00 11.11      60.5 2.920   45 20.70
645 2020-08-26T04:31:59.999+10:00  3.03     101.3 2.899   48 20.70
646 2020-08-26T05:01:59.999+10:00  2.86     125.2 2.899   49 20.65
647 2020-08-26T05:31:59.999+10:00  2.86     132.2 2.899   56 20.65
648 2020-08-26T06:01:59.999+10:00  3.23     113.9 2.963   61 20.65
649 2020-08-26T06:31:59.999+10:00  3.45     113.9 3.008   64 20.65
650 2020-08-26T07:01:59.999+10:00  3.57     108.3 3.053   66 20.65

NEW DATA新数据

                         DateTime   Data1 Data2    Data3  Data4   Data5
641 2020-08-26T02:31:59.999+10:00 10.00      53.4 3.101   42 20.70
642 2020-08-26T03:01:59.999+10:00 11.11      52.0 2.778   44 20.70
643 2020-08-26T03:31:59.999+10:00  2.63     105.5 2.899   45 20.70
644 2020-08-26T04:01:59.999+10:00 11.11      60.5 2.920   45 20.70
645 2020-08-26T04:31:59.999+10:00  3.03     101.3 2.899   48 20.70
646 2020-08-26T05:01:59.999+10:00  2.86     125.2 2.899   49 20.65
647 2020-08-26T05:31:59.999+10:00  2.86     132.2 2.899   56 20.65
648 2020-08-26T06:01:59.999+10:00  3.23     113.9 2.963   61 20.65
649 2020-08-26T06:31:59.999+10:00  3.45     113.9 3.008   64 20.65
650 2020-08-26T07:01:59.999+10:00  3.57     108.3 3.053   66 20.65
651 2020-08-26T07:31:59.999+10:00  3.85     109.7 3.125   70 20.65

REQUIRED DATA所需数据

                         DateTime   Data1 Data2    Data3  Data4   Data5
642 2020-08-26T03:01:59.999+10:00 11.11      52.0 2.778   44 20.70
643 2020-08-26T03:31:59.999+10:00  2.63     105.5 2.899   45 20.70
644 2020-08-26T04:01:59.999+10:00 11.11      60.5 2.920   45 20.70
645 2020-08-26T04:31:59.999+10:00  3.03     101.3 2.899   48 20.70
646 2020-08-26T05:01:59.999+10:00  2.86     125.2 2.899   49 20.65
647 2020-08-26T05:31:59.999+10:00  2.86     132.2 2.899   56 20.65
648 2020-08-26T06:01:59.999+10:00  3.23     113.9 2.963   61 20.65
649 2020-08-26T06:31:59.999+10:00  3.45     113.9 3.008   64 20.65
650 2020-08-26T07:01:59.999+10:00  3.57     108.3 3.053   66 20.65
651 2020-08-26T07:31:59.999+10:00  3.85     109.7 3.125   70 20.65

This is where I am at...这就是我在...

library(RCurl) 
library(readr)
library(ggplot2)
library(data.table) 

# Get the date parts we need
Year <-format(Sys.Date(), format="%Y")
Month <- format(Sys.Date(), format="%B")
MM <- format(Sys.Date(), format="%m")

# Create the file string and read
site <- glue::glue("ftp://user:passwd@99.99.99.99/path/{Year}/{Month}/site}{Year}-{MM}.csv")
site <- read.csv(site, header = FALSE)

# Write table and create csv
EP <- write.table(site, "EP.csv", col.names = FALSE, row.names = FALSE)
EP <- fread("EP.csv", header = FALSE, select = c( 1, 2, 3, 5, 6, 18))
output<- write.table(EP, file = 'output.csv', col.names = c("A", "B", etc), sep = ",", row.names = FALSE)
#working up to here

# Append to master csv file
master <- read.csv("C:\\path\\"master.csv")

You can turn the DateTime column to POSIXct class, combine the new and initial data and get data which is present in last 10 days.您可以将DateTime列转换为POSIXct类,组合新数据和初始数据并获取过去 10 天内存在的数据。

library(dplyr)
library(lubridate)

initial_data <- initial_data %>% mutate(DateTime = ymd_hms(DateTime))
new_data <- new_data %>% mutate(DateTime = ymd_hms(DateTime))
combined_data <- bind_rows(new_data, initial_data)

ten_days_data <- combined_data %>% 
                   filter(between(as.Date(DateTime), Sys.Date() - 10, Sys.Date()))

I'll try and answer this combining the help from Ronak.我会尝试结合 Ronak 的帮助来回答这个问题。 I am still hopeful that a better solution can be found where I can simply append the new data to the old data.我仍然希望可以找到更好的解决方案,我可以简单地将新数据附加到旧数据。

There were multiple parts to my question and Ronak provided a solution for the last 10 days problem:我的问题有多个部分,Ronak 为过去 10 天的问题提供了解决方案:

ten_days_data <- combined_data %>% 
                   filter(between(as.Date(DateTime), Sys.Date() - 10, Sys.Date()))

The second part about combining the data I found from another post How to rbind new rows from one data frame to an existing data frame in R关于组合我从另一篇文章中找到的数据的第二部分How to rbind new rows from a data frame to an existing data frame in R

combined_data <- unique(rbindlist(list(inital_data, new_data)), by = "DateTime")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM