简体   繁体   English

如何对 R 中数据集中的一组列进行重复操作(循环)

[英]How to make repetitive operations (loop) over a set of columns in a dataset in R

I have to make a series of operations over a subset of columns.我必须对列的子集进行一系列操作。 I have a set of columns which measures the same thing to different parties A, B, and C:我有一组列,它们对不同的各方 A、B 和 C 测量相同的东西:

id var1_A var1_B var1_C var2_A var2_B var2_C var3_A var3_B var3_C

So, in the example, var1_A var1_B var1_C refer to the same measurement for different parties.因此,在示例中,var1_A var1_B var1_C 指的是不同方的相同测量。 And var1_A, var2_A, var3_A refer to different variables for the same party A.而var1_A、var2_A、var3_A指的是同一方A的不同变量。

I would like to accomplish 2 things:我想完成两件事:

I need to create multiple data frames and merge the id with another dataframe, each one refers to one specific party.我需要创建多个数据框并将 id 与另一个 dataframe 合并,每个数据框都指向一个特定的方。 I wrote the code for each data frame individually, as the example below.我分别为每个数据框编写了代码,如下例所示。 The issue is that in the example it is simple.问题是在示例中它很简单。 What complicates my life is that I have multiple datasets like df, and each of them contain information for multiple parties, and I end up with 50 lines of repetitive code.让我的生活变得复杂的是,我有多个像 df 这样的数据集,每个数据集都包含多方的信息,最后我得到了 50 行重复代码。 Is that a way to simplify?那是简化的方法吗?

df_A <- df %>% select(id var1_A var2_A var3_A)
df_A <- merge(df_A, df_merge, by="id")
df_B <- df %>% select(id var1_B var2_B var3_B)
df_B <- merge(df_B, df_merge, by="id")
df_C <- df %>% select(id var1_C var2_C var3_C)
df_C <- merge(df_C, df_merge, by="id")

The second thing I would like to accomplish is to change the variable name for df.我想完成的第二件事是更改 df 的变量名。 I would like to change the variable name for all the columns that measure the same thing, but maintaining the party which it refers to.我想更改所有测量同一事物的列的变量名称,但保留它所指的一方。 For example, say var1 refers to height, var2 refers to weight, and var3 refers to gender:例如,假设 var1 表示身高,var2 表示体重,var3 表示性别:

id var1_A var1_B var1_C var2_A var2_B var2_C var3_A var3_B var3_C

I would like to get something like:我想得到类似的东西:

id height_A height_B height_C weight_A weight_B weight_C gender_A gender_B gender_C

Is there a way to accomplish this with few lines of code?有没有办法用几行代码来完成这个? Or do I have to rename each of them individually (using rename command, for example)?或者我是否必须单独重命名它们中的每一个(例如使用重命名命令)?

A tidy way:整洁的方式:

require(tidyverse)

#CREATE DATA
df <- data.frame(id = 1:10,
                 var1_A = runif(10),
                 var1_B = runif(10),
                 var1_C = runif(10),
                 var2_A = runif(10),
                 var2_B = runif(10),
                 var2_C = runif(10),
                 var3_A = runif(10),
                 var3_B = runif(10),
                 var3_C = runif(10))

df_merge<-data.frame(id = 1:10,
                     value=11:20)
#grabs current names
nam<-colnames(df)

#Create map of new names
new_names = c('var1'='height','var2'='weight','var3'='gender')

#replace the strings with new strings in map 
nam <- str_replace_all(nam, new_names)

#reassign column names to dataframe
colnames(df)<-nam

# loop through all letters in list assign to variable 
#pasted with "df" and the letter, selects columns ending with 
# letter, merges with df_ids and returns the new subset of data
#to the assigned variable name


for (letter in c('A', "B", "C")){

  assign(paste("df", letter, sep = '_'),
         df%>%select(id, ends_with(letter))%>%
           merge(df_merge, by='id'))
}

This is similar to @thelatemail's comment (answer) above, but with a couple of extra subsequent steps, ie rename the columns, pivot the data to 'long' format, split the df into groups ("df_A", "df_B", "df_C"), pivot the data back to wide, and save the dfs to your global environment:这类似于上面@thelatemail 的评论(答案),但有几个额外的后续步骤,即重命名列,pivot 数据为“长”格式,将 df 分成组(“df_A”、“df_B”、“ df_C"), pivot 将数据返回到 wide,并将 dfs 保存到您的全局环境中:

library(tidyverse)
library(purrr)

df <- data.frame(id = 1:10,
                 var1_A = runif(10),
                 var1_B = runif(10),
                 var1_C = runif(10),
                 var2_A = runif(10),
                 var2_B = runif(10),
                 var2_C = runif(10),
                 var3_A = runif(10),
                 var3_B = runif(10),
                 var3_C = runif(10))

list_of_dfs <- df %>%
  rename_with(.cols = starts_with("var1"), ~gsub("var1", "height", .x)) %>%
  rename_with(.cols = starts_with("var2"), ~gsub("var2", "weight", .x)) %>%
  rename_with(.cols = starts_with("var3"), ~gsub("var3", "gender", .x)) %>%
  pivot_longer(-id) %>%
  mutate(group = case_when(
    str_detect(name, "_A") ~ "df_A",
    str_detect(name, "_B") ~ "df_B",
    str_detect(name, "_C") ~ "df_C"
    )) %>%
  split(., .$group)

df_list <- map(list_of_dfs, 
               \(x) pivot_wider(x, names_from = name,
                                values_from = value) %>%
                 select(-group))

list2env(df_list, envir = .GlobalEnv)
#> <environment: R_GlobalEnv>
ls()
#> [1] "df"          "df_A"        "df_B"        "df_C"        "df_list"    
#> [6] "list_of_dfs"
df_A
#> # A tibble: 10 × 4
#>       id height_A weight_A gender_A
#>    <int>    <dbl>    <dbl>    <dbl>
#>  1     1   0.417   0.693    0.320  
#>  2     2   0.387   0.879    0.00590
#>  3     3   0.882   0.805    0.861  
#>  4     4   0.611   0.246    0.336  
#>  5     5   0.795   0.185    0.680  
#>  6     6   0.274   0.00675  0.568  
#>  7     7   0.722   0.950    0.757  
#>  8     8   0.776   0.757    0.0457 
#>  9     9   0.613   0.352    0.853  
#> 10    10   0.0603  0.438    0.421

Created on 2022-10-05 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-10-05

You can then merge/join the dfs as required.然后您可以根据需要合并/加入 dfs。 Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM