简体   繁体   English

选择R中满足条件的数据,并在该条件下使用for循环

[英]Select data in R that meet a condition and use a for loop on that condition

I have a problem with the selection of column in a dataframe using a for loop. 我在使用for循环选择数据框中的列时遇到问题。 I'm new to R so it's very possible that I missed something obvious, but I did not find anything that works for me. 我是R的新手,所以很可能错过了一些显而易见的事情,但是我找不到适合我的任何东西。

I have a file with 20 climatic variable measured during 60 years in 399 differents places. 我有一个文件,其中包含399个不同地方在60年中测得的20个气候变量。 I have a line for each day, and my column are the 20 climatic variable for each place (with a number at the end of the name to identify the place where the measure was taken). 我每天都有一行,而我的栏是每个地点的20个气候变量(名称末尾有一个数字,用于标识采取该措施的地点)。 It looks like that : 看起来像这样:

     Temperature_1   Rain_1 .....Temperature_399   Rain_399
Date 1
Date 2
...

I want to select the 20 column corresponding to one place, run some calculations on the variables, put the results in an empty 3D array I have created, then do the same for the next place until the last one. 我想选择一个地方对应的20列,对变量进行一些计算,将结果放入我创建的空3D数组中,然后对下一个地方进行同样的操作,直到最后一个。

My problem is that I don't know how to select the right columns automatically. 我的问题是我不知道如何自动选择正确的列。 I also have issues with the writing of the results in the array. 我在数组中写入结果也遇到了问题。

I tried to select the columns corresponding to one place using the numbers at the end of the name of the variables, but I don't think it is possible to change automatically the condition. 我试图使用变量名称末尾的数字来选择与一个位置对应的列,但我认为不可能自动更改条件。

I also tried to use the position of the columns but I'm not doing it properly 我也尝试使用列的位置,但操作不正确

This is my code : 这是我的代码:

#creation of an empty array
Indice_clim=array(NA,dim = c(60,8,399),dimnames=list(c(1959:2018),c("Huglin","CNI","HD","VHD","SHS","DoF","FreqLF","SLF"),c(1:399)))

#selection of the columns corresponding to the first place using "end with"
maille=select(donnees_SAFRAN,c(1:4),ends_with(".1",ignore.case = FALSE))

# another try using the columns position which I know is really badly done
for (j in seq(from=5, to=7984,by=20)){ 
paste0("maille",j-4)=select(donnees_SAFRAN,c(1:4),c(j:j+19))
} 

#and the calculation on the selected columns, the "i loop" is working.
for(i in 1959:2018)temp=c(maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9)%>%summarise(sum(((T_moy.1-10)+(T_max.1-10))/2)*1.03),
   maille%>%filter(an==i,mois==9)%>%summarise(mean(T_min.1)),
   maille%>%filter(an==i)%>%summarise(sum(T_max.1>=30)),
   maille%>%filter(an==i)%>%summarise(sum(T_max.1>=35)),
   maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9,T_moy.1>=28)%>%summarise(sum(T_moy.1-28)),
   maille%>%filter(an==i)%>%summarise(sum(T_min.1<=0)),
   maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9)%>%summarise(sum(T_min.1<=0)),
   maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9,T_moy.1<2)%>%summarise(sum(abs(2-T_moy.1))))

   Indice_clim[[i-1958,,]]=as.numeric(temp)}

I would like to create a loop or something to do my calculation on each place and write the result in my array. 我想创建一个循环或类似的东西在每个位置进行计算,然后将结果写入数组。 If you have any idea, I would very much appreciate it ! 如果您有任何想法,我将不胜感激!

You can use the grep() function to look for each of the locations 1, 2, ..., 399 in the column names. 您可以使用grep()函数在列名称中查找位置1、2,...,399。 If your big dataframe containing all the data is called df, then you could do this: 如果包含所有数据的大数据框称为df,则可以执行以下操作:

for (i in 1:399) {
  selected_indices <- grep(paste0('_', i, '$'), colnames(df))
  # do calculations on the selected columns
  df[, selected_indices]
}

The for loop will automatically run through each location i from 1 through 399. The paste0() function concatenates '_' with the variable i and the dollar sign $ to create strings like "_1$", "_2$", ..., "_399$", which are then searched for using the grep() function in the column names of df. for循环将自动在1到399的每个位置i上运行。paste0()函数将'_'与变量i和美元符号$连接起来,以创建字符串,例如“ _1 $”,“ _ 2 $”,... ,“ _ 399 $”,然后使用grep()函数在df的列名中进行搜索。 The '$' is used to specify that you want the patterns _1, _2, ... to appear at the end of the column names (it is a regular expression special character). “ $”用于指定您希望模式_1,_2,...出现在列名的末尾 (这是一个正则表达式特殊字符)。

The grep() function uses the above regular expressions to returns the column indices required for each location. grep()函数使用上面的正则表达式返回每个位置所需的列索引。 You can then extract the relevant portion of df and do whatever calculations you want. 然后,您可以提取df的相关部分并进行所需的任何计算。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM