[英]R function or loop for repeatedly selecting rows that meet a condition, saving as separate object, and renaming column headers
I have 16 large datasets of landcover variables around routes.我有 16 个围绕路线的土地覆盖变量的大型数据集。 Example dataset "Trial1":示例数据集“Trial1”:
RtNo TYPE CA PLAND NP PD LPI TE
2001 cls_11 996.57 6.4297 22 0.1419 6.3055 31080
2010 cls_11 56.34 0.3654 23 0.1492 0.1669 15480
18003 cls_11 141.12 0.9899 37 0.2596 0.1503 38700
18014 cls_11 797.58 5.3499 47 0.3153 1.3969 98310
2001 cls_21 1514.97 9.7744 592 3.8195 0.8443 761670
2010 cls_21 638.55 4.1414 95 0.6161 0.7489 463260
18003 cls_21 904.68 6.3463 612 4.2931 0.8769 549780
18014 cls_21 1189.89 7.9814 759 5.0911 0.4123 769650
2001 cls_22 732.33 4.7249 653 4.2131 0.7212 377430
2010 cls_22 32.31 0.2096 168 1.0896 0.0198 31470
18003 cls_22 275.85 1.9351 781 5.4787 0.0423 237390
18014 cls_22 469.44 3.1488 104 6.7345 0.1014 377580
I want to first select rows that meet a condition, for example, all rows in column "TYPE" that is cls_21.我想首先 select 满足条件的行,例如“TYPE”列中的所有行,即 cls_21。 I know the following code does this work:我知道以下代码可以做到这一点:
Trial21 <-subset(Trial1, TYPE==" cls_21 ")
(yes the invisible space before and after the categorical variable caused me a considerable headache). (是的,分类变量前后的不可见空间让我很头疼)。 And there are several other ways of doing this as shown in [https://stackoverflow.com/questions/5391124/select-rows-of-a-matrix-that-meet-a-condition]还有其他几种方法可以做到这一点,如 [https://stackoverflow.com/questions/5391124/select-rows-of-a-matrix-that-meet-a-condition]
I get the following output (sorry this one has extra columns, but shouldn't affect my question):我得到以下 output (对不起,这个有额外的列,但不应该影响我的问题):
RtNo TYPE CA PLAND NP PD LPI TE ED LSI
2 18003 cls_21 904.68 6.3463 612 4.2931 0.8769 549780 38.5668 46.1194
18 18014 cls_21 1189.89 7.9814 759 5.0911 0.4123 769650 51.6255 56.2522
34 2001 cls_21 1514.97 9.7744 592 3.8195 0.8443 761670 49.1418 49.3462
50 2010 cls_21 638.55 4.1414 95 0.6161 0.7489 463260 30.0457 46.0118
62 2020 cls_21 625.5 4.1165 180 1.1846 0.5064 384840 25.3268 38.6407
85 2021 cls_21 503.55 2.7926 214 1.1868 0.1178 348330 19.3175 38.9267
I want to rename the columns in this subset so they uniquely identify the class by adding "L21" at the back of existing column names, and I can do this using我想重命名此子集中的列,以便它们通过在现有列名的后面添加“L21”来唯一标识 class,我可以使用
library(data.table)
setnames(Trial21, old = c('CA', 'PLAND', 'NP', 'PD', 'LPI', 'TE', 'ED', 'LSI'),
new = c('CAL21', 'PLANDL21', 'NPL21', 'PDL21', 'LPIL21', 'TEL21', 'EDL21', 'LSIL21'))
I want help to develop a function or a loop that automates this process so I don't have to spend days repeating the same codes for 15 different classes and 16 datasets (240 times).我需要帮助开发 function 或自动执行此过程的循环,因此我不必花费数天时间为 15 个不同的类和 16 个数据集(240 次)重复相同的代码。 Also, decrease the risk of errors.此外,降低出错的风险。 I may have to do the same for additional datasets.对于其他数据集,我可能必须这样做。 Any help to speed the process will be greatly appreciated.任何有助于加快该过程的帮助将不胜感激。
You could do:你可以这样做:
a <- split(df, df$TYPE)
b <- sapply(names(a), function(x)setNames(a[[x]],
paste0(names(a[[x]]), sub(".*_", 'L', x))), simplify = FALSE)
Here is a start that should work for your example:这是一个应该适用于您的示例的开始:
library(dplyr)
myfilter <- function(data, number) {
data %>%
filter(TYPE == sprintf(" cls_%s ") %>%
rename_with(\(x) sprintf("%s%s", x, suffix), !1:2)
}
myfilter(example_data, 21)
Given a list of numbers (here: 21 to 31) you could then automatically use them to filter a single dataframe:给定一个数字列表(此处:21 到 31),您可以自动使用它们来过滤单个 dataframe:
multifilter <- function(data) {
purrr::map(21:31, \(i) myfilter(data, i))
}
multifilter(example_data)
Finally, given a list of dataframes, you can automatically apply the filters to them:最后,给定一个数据框列表,您可以自动将过滤器应用于它们:
purrr::map(list_of_dataframes, multifilter)
You can use ls
to get the variable names of the datasets, and manipulate them as you wish inside a loop and with get
function, then create new datasets with assign
.您可以使用ls
获取数据集的变量名称,并在循环中根据需要操作它们,并使用get
function,然后使用assign
创建新数据集。
sets = grep("Trial", ls(), value=TRUE) #Assuming every dataset has "Trial" in the name
for(i in sets){
classes = unique(get(i)$TYPE)
for(j in classes){
number = gsub("(.+)([0-9]{2})( )", "\\2", j)#this might be an overly complicated way of getting just the number, you can look for better options if you want
assign(paste0("Trial", number),
subset(Trial1, TYPE==j) %>% rename_with(function(x){paste0(x, number)}))}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.