[英]Rearranging a dataframe in R
我有一個如下所示的數據框:
created_at actor_attributes_email type
3/11/12 7:28 jeremy@asynk.ch PushEvent
3/11/12 7:28 jeremy@asynk.ch PushEvent
3/11/12 7:28 jeremy@asynk.ch PushEvent
3/11/12 7:42 jeremy@asynk.ch IssueCommentEvent
3/11/12 11:06 d.bussink@gmail.com PushEvent
3/11/12 11:06 d.bussink@gmail.com PushEvent
現在我想按月/年重新排列它(仍然按時間排序,仍然保持行的完整性)。 這應該為每個月創建3列,然后將與該月相關的所有數據(created_at,actor_attributes_email和type)放在這3列中,以便獲得以下標題(對於數據中存在的所有月份):
april_2011_created_at april_2011_actor_attributes_email april_2011_type may_2011_created_at may_2011_actor_attributes_email may_2011_type
我怎樣才能在R中實現這一目標?
可以在此處找到包含整個數據集的CSV文件: https : //github.com/aronlindberg/VOSS-Sequencing-Toolkit/blob/master/rubinius_rubinius_sequencing/rubinius_6months.csv
這是CSV的第一行的dput()
:
structure(list(created_at = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L,
8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c("2012-03-11 07:28:04",
"2012-03-11 07:28:19", "2012-03-11 07:42:16", "2012-03-11 11:06:13",
"2012-03-11 12:46:25", "2012-03-11 13:03:12", "2012-03-11 13:12:34",
"2012-03-11 13:14:52", "2012-03-11 13:30:14", "2012-03-11 13:30:48"
), class = "factor"), actor_attributes_email = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"d.bussink@gmail.com", "jeremy@asynk.ch"), class = "factor"),
type = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("IssueCommentEvent", "PushEvent"
), class = "factor")), .Names = c("created_at", "actor_attributes_email",
"type"), class = "data.frame", row.names = c(NA, -30L))
其他一些假設是:
library(plyr)
library(lubridate)
df$created_at <- ymd_hms(df$created_at, quiet = TRUE)
df$mname <- as.character(lubridate::month(df$created_at,label = T, abbr = T))
result <- dlply(df, .(mname), function(x){
x <- arrange(x, created_at)
names(x) <- paste0(unique(x$mname), "_", names(x))
x$mname <- NULL
x
}, .progress = 'text')
final_result <- ldply(result, rbind.fill)[, -1]
請注意,由於您希望將月份名稱附加到3個列名稱並填入相應的數據,因此沒有數據的所有列都將填充NA
(這是rbind.fill
的預期行為)。
Maiasaura提供了一種優雅的方式來完成plyr和lubridate的工作。 這是在基礎R中完成它的稍微不那么優雅的方法。但與Maiasaura的不同,這種方式可以最大限度地減少NA
行的數量。 每個月的NA
行數是該月的行數與任何月份的最大行數之間的差異。
# split df by month
by.mon <- split(df, months(as.POSIXct(df$created_at)))
# rename the columns to include the month name
by.mon <- mapply(
function(x, mon.name) {
names(x) <- paste(mon.name, names(x), sep='_');
return(x)
}, x=by.mon, mon.name=names(by.mon), SIMPLIFY=FALSE)
# add an index column for merging on
by.mon.indexed <- lapply(by.mon, function(x) within(x, index <- 1:nrow(x)))
# merge all of the months together
results <- Reduce(function(x, y) merge(x, y, by='index', all=TRUE, sort=FALSE),
by.mon.indexed)
# remove the index column
final_result <- results[names(results) != 'index']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.