简体   繁体   English

R:如何循环从数据框中选择基于名称的变量,并为每个变量创建一个包含第一个列均值的新变量?

[英]R: How to loop over a name-based selection of variables from a dataframe and for each create a new variable containing the column mean of the first?

I have a dataset containing a number of numeric variables whose names all start with "Ranking".我有一个数据集,其中包含许多名称都以“排名”开头的数字变量。 For each of these variables, I want to add another variable to the dataset that contains the column mean of the first variable.对于这些变量中的每一个,我想将另一个变量添加到包含第一个变量的列均值的数据集中。

So the data look something like this:所以数据看起来像这样:

| Ranking_blah | Ranking_bleh | 

| --------     | ----------   |

| 1            | 0            |

| 0            | 1            |

| NA           | 0.5          |

and what I want is:我想要的是:

| Ranking_blah | Ranking_bleh | Ranking_blah_mean | Ranking_bleh_mean |

| --------     | ----------   |----------------   |----------------|

| 1            | 0            | 0                 | 0.5            |

| -1           | 1            | 0                 | 0.5            |

| NA           | 0.5          | 0                 | 0.5    

(I am aware this way the mean variables have the same values in all rows, respectively - I need this because the data will be reshaped later) (我知道这样平均变量在所有行中分别具有相同的值 - 我需要这个,因为稍后将重新调整数据)

What I've tried so far:到目前为止我已经尝试过:

#getting a list of all ranking variables I want to create a new mean variable from

ranking_variables = names(data)[grepl("Ranking", names(data))]

#creating a new variable for each base variable in the list and setting it to the mean of the respective base variable

data[paste0(ranking_variables, "_mean")] <- do.call(cbind, lapply(data[ranking_variables], function(x) mean(x, na.rm = TRUE)))

The second part is not working, though, it only yields NA values.但是,第二部分不起作用,它只产生 NA 值。 What am I doing wrong?我究竟做错了什么?

An alternative approach is to use dplyr 's across :另一种方法是使用dplyr across cross :

dat |>
    mutate(across(starts_with("Ranking"), ~ mean(., na.rm = TRUE), .names = "{.col}_mean"))

Output:输出:

# A tibble: 3 × 4
  Ranking_blah Ranking_bleh Ranking_blah_mean Ranking_bleh_mean
         <dbl>        <dbl>             <dbl>             <dbl>
1            1          0                   0               0.5
2           -1          1                   0               0.5
3           NA          0.5                 0               0.5

Data:数据:

tibble(Ranking_blah = c(1,-1,NA), Ranking_bleh = c(0,1,0.5))

The across approach is fine, here is another one: across方法很好,这是另一种方法:

There is less struggle with tidy data, because R makes it easier to compute across rows than across columns.整洁的数据没有那么困难,因为 R 使得跨行计算比跨列计算更容易。

Tidy data means that every observation has its own row and every variable its own column.整齐的数据意味着每个观察都有自己的行,每个变量都有自己的列。 Columns are designed to represent variables.列旨在表示变量。 I think the "Ranking…" columns are not distinct variables, but different observations of the variable "type".我认为“排名...”列不是不同的变量,而是对变量“类型”的不同观察。 To fix this, we can use tidyr .为了解决这个问题,我们可以使用tidyr
See this chapter of R for data science.有关数据科学,请参阅 R 的这一章。

library(tidyverse)

data <- data.frame(Ranking_blah = c(1,-1,NA), Ranking_bleh = c(0,1,0.5))
data$id <- c(1:nrow(data))

pivot_longer(data,1:2,names_to = "type") %>%
  group_by(type) %>%
  mutate(mean = mean(value, na.rm = TRUE)) %>%
  ungroup()
# A tibble: 6 × 4
     id type         value  mean
  <int> <chr>        <dbl> <dbl>
1     1 Ranking_blah   1     0  
2     1 Ranking_bleh   0     0.5
3     2 Ranking_blah  -1     0  
4     2 Ranking_bleh   1     0.5
5     3 Ranking_blah  NA     0  
6     3 Ranking_bleh   0.5   0.5

This data is less human readable, but more R friendly.这些数据不太可读,但对 R 更友好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据每行中一些变量的单独平均值创建一个新变量? - How to create a new variable based on the individual mean of some variables from each row? 在R中,如何从一列中提取多个变量并创建新变量 - In R, how to extract multiple variables from a column and create new variable R循环根据数据框名称创建多个新列 - R loop to create multiple new columns based on dataframe name 循环以从统一数据帧创建新变量 - Loop over to create new variables from uniform dataframe R:根据列表元素名称创建新的数据框变量 - R: Create New Dataframe Variable Based on List Element Name R 如何从 dataframe 创建包含多组计数的表? - R how to create table from dataframe containing counts over groups? 在列表的每个数据框中创建新列,并根据位置 (R) 从字符向量中填充字符串 - Create new column in each dataframe of list and fill with string from character vector based on position (R) 从数据框中的前 2 个元素创建一个字符串并添加到 R 中的新列 - create a string from first 2 elements in dataframe and add to new column in R R在数据框上循环以创建新的数据框 - R loop over dataframe to create new dataframes 如何为循环的 R 中的每次迭代创建一个新的 dataframe? - How to create a new dataframe for each iteration in R for loop?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM