简体   繁体   English

R:如何使用Apply循环遍历data.frame中的变量

[英]R: how to use apply to loop over variables in a data.frame

I'm trying to learn how to use apply (or any other members of the family of apply) to loop over variables in a data.frame 我正在尝试学习如何使用apply(或apply系列的任何其他成员)遍历data.frame中的变量

For example: say I have the following data.frame 例如:说我有以下data.frame

    df_long <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3), 
             country=c('a','a','a','a','b','b','b','b','c','c','c','c'),
             year=c(1,2,3,4,1,2,3,4,1,2,3,4),
             amt = c(3,4,23,5,76,5,2,3,5,4,6,2))

and I want to loop through all the variables such that if the variable is numeric, then I had one to it, else I do nothing. 我想遍历所有变量,以便如果该变量是数字变量,则该变量为1,否则我什么也不做。 I want the return variable to be a data.frame. 我希望返回变量为data.frame。 This is what I have so far but it doesn't work 这是我到目前为止的内容,但是没有用

    apply(df_long, 2, function(x) x = ifelse(is.numeric(x), x+1, x))

Any insights on this question or in general how to loop through variables in a data.frame using apply and/or other methods would be greatly appreciated. 我们将不胜感激对此问题的任何见解,或者一般而言,如何使用data和/或其他方法在data.frame中循环遍历变量。

I would first find columns which are numeric using is.numeric and then add 1 to only those columns. 我将首先使用is.numeric查找数字列,然后仅将1加到那些列中。 sapply/lapply loops over each column and returns TRUE/FALSE if the columns is numeric or not. sapply/lapply遍历每列,如果列是否为数字,则返回TRUE / FALSE。 We use that logical indices ( col_ind ) to subset the dataframe and add a 1 to it. 我们使用该逻辑索引( col_ind )对数据帧进行子集化并为其添加1。

col_ind <- sapply(df_long, is.numeric)
df_long[col_ind] <- df_long[col_ind] + 1
df_long

#   id country year amt
#1   2       a    2   4
#2   2       a    3   5
#3   2       a    4  24
#4   2       a    5   6
#5   3       b    2  77
#6   3       b    3   6
#7   3       b    4   3
#8   3       b    5   4
#9   4       c    2   6
#10  4       c    3   5
#11  4       c    4   7
#12  4       c    5   3

Possibly a more simpler approach with dplyr in one-liner. dplyr可能是一种更简单的方法。

library(dplyr)
df_long %>%
  mutate_if(is.numeric, funs(. + 1))

I tried with sapply and apply to follow the method that you have originally asked for but the challenge with that is that it is trying to coerce the result into to a matrix. 我尝试了sapplyapply遵循您最初要求的方法,但是面临的挑战是它试图将结果强制为矩阵。 Which is either forcing all variables to be returned as characters or it is converting the country variable into numeric and is converting a to 1 , b to 2 and so on. 这要么强制所有变量都以字符形式返回,要么将country变量转换为数字,然后将a转换为1 ,将b转换为2 ,依此类推。

If you prefer a single line of code using one of the apply functions then I recommend using lapply . 如果您希望使用apply函数之一使用一行代码,那么我建议使用lapply lapply will return the result as a list, which can then be converted to a dataframe. lapply将结果作为列表返回,然后可以将其转换为数据框。 A solution is below: 解决方案如下:

as.data.frame(
  lapply(
    df_long, 
    function(col) 
      if(is.numeric(col)) {col + 1} else {col}))

The result is: 结果是:

   id country year amt
1   2       a    2   4
2   2       a    3   5
3   2       a    4  24
4   2       a    5   6
5   3       b    2  77
6   3       b    3   6
7   3       b    4   3
8   3       b    5   4
9   4       c    2   6
10  4       c    3   5
11  4       c    4   7
12  4       c    5   3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM