简体   繁体   English

在 R 中:按日期从数据框中提取特定列,并将基本列保留在开头

[英]In R: Extract specific columns from data frame by date and keep basic columns at beginning

I have several big data frames with time series data for specific coordinates from 2007-2019.我有几个大数据框,其中包含 2007-2019 年特定坐标的时间序列数据。 8113 rows and 301 columns. 8113 行和 301 列。 Each year is divided in 16 days time steps, which results in 23 values per year per coordinate.每年被划分为 16 天的时间步长,这导致每个坐标每年有 23 个值。 It looks like this:它看起来像这样:

X   longitude   latitude   label     2007-01-07   2007-01-23 ... 2019-12-10   2019-12-26  
1   -56.58652   -30.87850  cropland  0.08367160   0.07883158     0.07414120   0.08120061
2   -56.58458   -30.88260  cropland  0.07888613   0.07438400     0.07831833   0.07352642
3   -56.58429   -30.87860  cropland  0.08331446   0.07837244     0.07169452   0.07229450

I would like to subset the data frame into years, keeping the first four columns the same for each subset.我想将数据框子集为年,使每个子集的前四列保持相同。 Then afterwards save them all together in a list of dataframes.然后将它们一起保存在数据框列表中。 So the output I am looking for should be like this:所以我要找的output应该是这样的:

X   longitude   latitude   label     2007-01-07   2007-01-23   ...
1   -56.58652   -30.87850  cropland  0.08367160   0.07883158   ...  
2   -56.58458   -30.88260  cropland  0.07888613   0.07438400   ...  
3   -56.58429   -30.87860  cropland  0.08331446   0.07837244   ...
X   longitude   latitude   label     2008-01-10   2008-01-26   ...
1   -56.58652   -30.87850  cropland  0.08367160   0.07883158   ...  
2   -56.58458   -30.88260  cropland  0.07888613   0.07438400   ...  
3   -56.58429   -30.87860  cropland  0.08331446   0.07837244   ...

... ...

X   longitude   latitude   label     2019-01-12   2019-01-28   ...
1   -56.58652   -30.87850  cropland  0.08367160   0.07883158   ...  
2   -56.58458   -30.88260  cropland  0.07888613   0.07438400   ...  
3   -56.58429   -30.87860  cropland  0.08331446   0.07837244   ...

I need to do that for 8 dataframes like the example above.我需要像上面的例子那样对 8 个数据帧这样做。 I know, this should be quite basic, but I'm also quite new to R and programming in general.我知道,这应该很基本,但我对 R 和一般编程也很陌生。 So I'm thankful for any hint on that one!所以我很感谢任何关于那个的提示! Cheers!干杯!

You can use split.default to split data based on year and with lapply cbind the first four columns to each list.您可以使用split.default根据年份拆分数据,并使用lapply cbind将前四列绑定到每个列表。

result <- lapply(split.default(df[-(1:4)], 
                 format(as.Date(names(df)[-(1:4)], 'X%Y.%m.%d'), '%Y')), 
                 function(x) cbind(df[1:4], x))

R tries to discourage column names starting with numbers so if you read the data with default options it will change column name from 2007-01-07 to X2007.01.07 so keeping that in mind I have used 'X%Y.%m.%d' in as.Date . R 试图阻止以数字开头的列名,因此如果您使用默认选项读取数据,它会将列名从2007-01-07更改为X2007.01.07所以请记住,我使用了'X%Y.%m.%d'as.Date中。 If you have somehow managed to read column names as you have shown ie 2007-01-07 use %Y-%m-%d in as.Date .如果您以某种方式设法读取了显示的列名,即2007-01-07as.Date中使用%Y-%m-%d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM