[英]R - Create dataframe from a list of data with different number of columns
I have a dataset showing date, name of the location and values in the same list.我有一个数据集,在同一列表中显示日期、位置名称和值。 My goal is to generate a dataframe in a loop where columns show the value for each date and the name of the location are the rows.我的目标是在循环中生成一个数据框,其中列显示每个日期的值,位置名称是行。
However, the dates don't always match.但是,日期并不总是匹配。 Missing values can be NA.缺失值可以是 NA。
Furthermore, I only need the second value (value2).此外,我只需要第二个值 (value2)。
What's the easiest way to do this?什么是最简单的方法来做到这一点?
Sample data:样本数据:
data <- structure(list(date = structure(c(7L, 4L, 6L, 15L, 3L, 1L, 13L, 2L, 16L, 11L, 7L, 14L, 8L, 4L, 6L, 15L, 3L, 9L, 10L, 1L, 12L, 5L, 7L, 14L, 8L, 4L, 6L, 15L, 9L, 10L, 1L, 13L, 2L, 16L, 11L), .Label = c("01.10.2013", "01.10.2015", "08.10.2010", "13.09.2007", "16.09.2003", "17.09.2008", "20.09.2004", "21.09.2006", "23.09.2011", "26.09.2012", "26.09.2017", "27.08.2001", "29.09.2014", "30.08.2005", "30.09.2009", "30.09.2016" ), class = "factor"), name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("P1", "P2", "P3"), class = "factor"), value = c(14L, 453L, 345L, 765L, 87L, 45L, 4L, 12L, 2L, 6L, 84L, 11L, 87L, 45L, 4L, 12L, 12L, 14L, 453L, 345L, 51L, 50L, 123L, 34L, 75L, 74L, 42L, 2L, 123L, 42L, 12L, 6L, 9L, 4L, 1L), value2 = c(0.046, 0.003, 0.022, 0.016, -0.035, 0.032, 0.013, -0.001, 0.018, -0.006, 0.017, -0.001, 0.02, 0, 0.009, 0.191, 0.169, 0.191, 0.286, 0.324, 0.426, 0.35, 0.212, 0.107, 0.081, 0.034, 0.084, 0.092, 0.054, 0.019, 0.022, 0.017, 0.018, 0.002, 0.017)), .Names = c("date", "name", "value", "value2" ), class = "data.frame", row.names = c(NA, -35L))
So far, I've tried different things I found on the internet but none has worked for me.到目前为止,我已经尝试了在互联网上找到的不同的东西,但没有一个对我有用。
First, I generated a dataframe with all unique dates found in the list首先,我生成了一个数据框,其中包含在列表中找到的所有唯一日期
data$date <- as.Date(data$date, format="%d.%m.%Y")
uniquedates <- as.data.frame(unique(sort(data$date)))
colnames(uniquedates) <- c("date")
Next, I split up the data by name.接下来,我按名称拆分数据。
split <- split(data, data$name)
Finally, I tried get the date and value2 from every split and merge them together in a loop.最后,我尝试从每个拆分中获取 date 和 value2 并将它们合并到一个循环中。
for (i in seq_along(split)) {
point <- split[[i]][,c("date","value2")]
name <- as.character(unique(split[[i]]$name))
colnames(merge)[colnames(merge) == "value2"] <- name
merge <- merge(x=uniquedates, y = point, by='date', all.x = TRUE)
}
This is the result I'm looking for:这是我正在寻找的结果:
date P1 P2 P3
27.08.2001 NA 0.426 NA
16.09.2003 NA NA 0.35
20.09.2004 0.046 0.017 0.212
30.08.2005 NA -0.001 0.107
21.09.2006 NA 0.02 0.081
13.09.2007 0.003 0 0.034
17.09.2008 0.022 0.009 0.084
30.09.2009 0.016 0.191 0.092
08.10.2010 -0.035 0.169 NA
23.09.2011 NA 0.191 0.054
26.09.2012 NA 0.286 0.019
01.10.2013 0.032 0.324 0.022
29.09.2014 0.013 NA 0.017
01.10.2015 -0.001 NA 0.018
30.09.2016 0.018 NA 0.002
26.09.2017 -0.006 NA 0.017
Look into the reshape2 package and the dcast
and melt
methods.查看 reshape2 包以及dcast
和melt
方法。
library(dplyr)
library(reshape2)
data2 = data%>%
mutate(date = as.Date(date,format = '%d.%m.%Y'))%>% #Convert date to a date time object
select(-value)%>% #Remove value because we dont need it
dcast(date~name,value.var = "value2") # Pivot the dataframe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.