简体   繁体   中英

Apply a user defined function to a list of data frames

I have a series of data frames structured similarly to this:

df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))  
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))

In order to clean them I wrote a user defined function with a set of cleaning steps:

clean <- function(df){
  colnames(df) <- df[2,]
  df <- df[grep('^[0-9]{4}', df$year),]
  return(df)
}

I'd now like to put my data frames in a list:

df_list <- list(df,df2)

and clean them all at once. I tried

lapply(df_list, clean)

and

for(df in df_list){
  clean(df)
}

But with both methods I get the error:

Error in df[2, ] : incorrect number of dimensions

What's causing this error and how can I fix it? Is my approach to this problem wrong?

You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.

#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)  
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)

clean <- function(df){
  colnames(df) <- df[2,]
  #need to specify the column to select the rows
  df <- df[grep('^[0-9]{4}', df$year),]

  #convert the columns to numeric values
    df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)

  return(df)
}

df_list <- list(df,df2)
lapply(df_list, clean)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM