简体   繁体   English

为data.frame中的每个唯一ID编写多个excel文件

[英]Write multiple excel files for each unique ID in a data.frame

I want to write an Excel file for each unique ID with required date range.我想为每个具有所需日期范围的唯一 ID 编写一个 Excel 文件。 The below is a snapshot of raw data in text.以下是文本中原始数据的快照。

ID,Type,PostCode,Date
15,SS,2520,2015-11-01
15,SS,2520,2015-10-01
20,SS,2520,2015-11-20
16,SS,2520,2015-11-12
16,SS,2520,2015-10-25
11,SS,2520,2015-10-14
20,SS,2520,2015-11-30

The data can have 100+ of individual ID with more than 100 thousand rows.数据可以有 100+ 个人 ID,超过 10 万行。 I want to read the raw data and write the separate excel file with the data table in required date range for each ID ideally with the file name of ID number.我想读取原始数据并在每个 ID 所需的日期范围内使用数据表编写单独的 excel 文件,理想情况下使用 ID 号的文件名。

My attempt我的尝试

myfunction <- function(startdate, enddate) {
x <- read.table("aaa.text", sep = ",")
split(x,x$ID)
}

Any advice or suggestion would be very much appreciated.任何意见或建议将不胜感激。

Using data.table and xlsx , the following will do the trick:使用data.tablexlsx ,以下内容可以解决问题:

library(data.table)
library(xlsx)
setDT(x)
x[ , write.xlsx(.SD, file = paste0(.BY[[1]], ".xlsx")), by = Date]

From there, it's bells and whistles.从那里开始,就是花里胡哨。

You'll notice this prints a column of row names -- write.xlsx has an option to turn this off ( row.names = FALSE ).您会注意到这会打印一列行名称—— write.xlsx有一个选项可以关闭它( row.names = FALSE )。

If you want to include Date as a column in your output, it'll be a tiny bit more nebulous (props to Frank for cleaning it up):如果你想在你的输出中包含Date作为一列,它会有点模糊(支持 Frank 清理它):

x[ , write.xlsx(c(.BY, .SD), file = paste0(.BY[[1]], ".xlsx")), by = Date]

Basically, because .SD and .BY are both list s, c just concatenates, and apparently write.xlsx works fine on lists.基本上,因为.SD.BY都是list s,所以c只是连接,显然write.xlsx在列表上工作正常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM