简体   繁体   English

R function 或循环重复选择满足条件的行,另存为单独的 object 和重命名列标题

[英]R function or loop for repeatedly selecting rows that meet a condition, saving as separate object, and renaming column headers

I have 16 large datasets of landcover variables around routes.我有 16 个围绕路线的土地覆盖变量的大型数据集。 Example dataset "Trial1":示例数据集“Trial1”:

RtNo     TYPE    CA      PLAND   NP      PD      LPI     TE 
2001     cls_11     996.57  6.4297  22  0.1419  6.3055  31080
2010     cls_11     56.34   0.3654  23  0.1492  0.1669  15480
18003    cls_11     141.12  0.9899  37  0.2596  0.1503  38700
18014    cls_11     797.58  5.3499  47  0.3153  1.3969  98310
2001     cls_21     1514.97 9.7744  592 3.8195  0.8443  761670
2010     cls_21     638.55  4.1414  95  0.6161  0.7489  463260
18003    cls_21     904.68  6.3463  612 4.2931  0.8769  549780
18014    cls_21     1189.89 7.9814  759 5.0911  0.4123  769650
2001     cls_22     732.33  4.7249  653 4.2131  0.7212  377430
2010     cls_22     32.31   0.2096  168 1.0896  0.0198  31470
18003    cls_22     275.85  1.9351  781 5.4787  0.0423  237390
18014    cls_22     469.44  3.1488  104 6.7345  0.1014  377580

I want to first select rows that meet a condition, for example, all rows in column "TYPE" that is cls_21.我想首先 select 满足条件的行,例如“TYPE”列中的所有行,即 cls_21。 I know the following code does this work:我知道以下代码可以做到这一点:

Trial21 <-subset(Trial1, TYPE==" cls_21 ")

(yes the invisible space before and after the categorical variable caused me a considerable headache). (是的,分类变量前后的不可见空间让我很头疼)。 And there are several other ways of doing this as shown in [https://stackoverflow.com/questions/5391124/select-rows-of-a-matrix-that-meet-a-condition]还有其他几种方法可以做到这一点,如 [https://stackoverflow.com/questions/5391124/select-rows-of-a-matrix-that-meet-a-condition]

I get the following output (sorry this one has extra columns, but shouldn't affect my question):我得到以下 output (对不起,这个有额外的列,但不应该影响我的问题):

    RtNo    TYPE    CA     PLAND     NP  PD    LPI     TE       ED      LSI
2   18003   cls_21  904.68  6.3463  612 4.2931  0.8769  549780  38.5668 46.1194
18  18014   cls_21  1189.89 7.9814  759 5.0911  0.4123  769650  51.6255 56.2522
34  2001    cls_21  1514.97 9.7744  592 3.8195  0.8443  761670  49.1418 49.3462
50  2010    cls_21  638.55  4.1414  95  0.6161  0.7489  463260  30.0457 46.0118
62  2020    cls_21  625.5   4.1165  180 1.1846  0.5064  384840  25.3268 38.6407
85  2021    cls_21  503.55  2.7926  214 1.1868  0.1178  348330  19.3175 38.9267

I want to rename the columns in this subset so they uniquely identify the class by adding "L21" at the back of existing column names, and I can do this using我想重命名此子集中的列,以便它们通过在现有列名的后面添加“L21”来唯一标识 class,我可以使用

library(data.table)
setnames(Trial21, old = c('CA', 'PLAND', 'NP', 'PD', 'LPI', 'TE', 'ED', 'LSI'), 
         new = c('CAL21', 'PLANDL21', 'NPL21', 'PDL21', 'LPIL21', 'TEL21', 'EDL21', 'LSIL21'))

I want help to develop a function or a loop that automates this process so I don't have to spend days repeating the same codes for 15 different classes and 16 datasets (240 times).我需要帮助开发 function 或自动执行此过程的循环,因此我不必花费数天时间为 15 个不同的类和 16 个数据集(240 次)重复相同的代码。 Also, decrease the risk of errors.此外,降低出错的风险。 I may have to do the same for additional datasets.对于其他数据集,我可能必须这样做。 Any help to speed the process will be greatly appreciated.任何有助于加快该过程的帮助将不胜感激。

You could do:你可以这样做:

a <- split(df, df$TYPE)

b <- sapply(names(a), function(x)setNames(a[[x]],
              paste0(names(a[[x]]), sub(".*_", 'L', x))), simplify = FALSE)

Here is a start that should work for your example:这是一个应该适用于您的示例的开始:

library(dplyr)

myfilter <- function(data, number) {
  data %>%
    filter(TYPE == sprintf(" cls_%s ") %>%
    rename_with(\(x) sprintf("%s%s", x, suffix), !1:2)
}

myfilter(example_data, 21)

Given a list of numbers (here: 21 to 31) you could then automatically use them to filter a single dataframe:给定一个数字列表(此处:21 到 31),您可以自动使用它们来过滤单个 dataframe:

multifilter <- function(data) {
  purrr::map(21:31, \(i) myfilter(data, i))
}

multifilter(example_data)

Finally, given a list of dataframes, you can automatically apply the filters to them:最后,给定一个数据框列表,您可以自动将过滤器应用于它们:

purrr::map(list_of_dataframes, multifilter)

You can use ls to get the variable names of the datasets, and manipulate them as you wish inside a loop and with get function, then create new datasets with assign .您可以使用ls获取数据集的变量名称,并在循环中根据需要操作它们,并使用get function,然后使用assign创建新数据集。

sets = grep("Trial", ls(), value=TRUE) #Assuming every dataset has "Trial" in the name

for(i in sets){
  classes = unique(get(i)$TYPE)
  
  for(j in classes){
    number = gsub("(.+)([0-9]{2})( )", "\\2", j)#this might be an overly complicated way of getting just the number, you can look for better options if you want
    assign(paste0("Trial", number),
           subset(Trial1, TYPE==j) %>% rename_with(function(x){paste0(x, number)}))}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM