简体   繁体   中英

Apply a function to a certain column in a list of data frames

I'm trying to convert numeric months (1,2,3,4..12) to month abbreviations (see mymonths ) in a list of data frames df_list by using lapply and can't seem to get it to output properly. All data frames in the list have the same variables.

Using the code below, the new df_list2 contains only the new months column, and no other data from the original frames. Sorry for the poor example data, but I think I'm just missing a simple command for getting the whole original data set, rather than just the months column.

# create example data 
d1 <- data.frame(month = c(1:3), val = c(1,2,5))
d2 <- data.frame(month = c(1:5), val = c(1,2,5,6,8))
df_list <- list(d1, d2)

> df_list 
[[1]]
  month val
1     1   1
2     2   2
3     3   5

[[2]]
  month val
1     1   1
2     2   2
3     3   5
4     4   6
5     5   8

mymonths <- c("JAN","FEB","MAR",
              "APR","MAY","JUN",
              "JUL","AUG","SEP",
              "OCT","NOV","DEC")

df_list2 <- lapply(df_list , function(x) {
  x[,1] <- mymonths [ x[,1] ]
  })

> df_list2 
[[1]]
[1] "JAN" "FEB" "MAR"

[[2]]
[1] "JAN" "FEB" "MAR" "APR" "MAY"

Just need to output the data frame within your lapply bit

# create example data 
d1 <- data.frame(month = c(1:3), val = c(1,2,5))
d2 <- data.frame(month = c(1:5), val = c(1,2,5,6,8))
df_list <- list(d1, d2)

mymonths <- c("JAN","FEB","MAR",
              "APR","MAY","JUN",
              "JUL","AUG","SEP",
              "OCT","NOV","DEC")

If the month column refers to the month then...

df_list2 <- lapply(df_list , function(x) {
  x[,1] <- mymonths[ x[,1] ]
  x
})

df_list2

[[1]]
  month val
1   JAN   1
2   FEB   2
3   MAR   5

[[2]]
  month val
1   JAN   1
2   FEB   2
3   MAR   5
4   APR   6
5   MAY   8

If the value column refers to the month then...

df_list2 <- lapply(df_list , function(x) {
  x[,1] <- mymonths[ x[,2] ]
  x
})

df_list2

[[1]]
  month val
1   JAN   1
2   FEB   2
3   MAY   5

[[2]]
  month val
1   JAN   1
2   FEB   2
3   MAY   5
4   JUN   6
5   AUG   8

But you have to output each data.frame within the function defined in lapply

There is very minor mistake in your 'lapply` usase. Please change the code as:

df_list2 <- lapply(df_list , function(x) {
      x[,2] <- mymonths [ x[,2] ]
      x
 })

The actual value of the month column should be passed to mymonths vector. Hence please pass x[,2] .

One more point is that x should be returned from the function. Hence additional lines have been added.

Now the output of df_list2 will be:

> df_list2
[[1]]
  month val
1     1 JAN
2     2 FEB
3     3 MAY

[[2]]
  month val
1     1 JAN
2     2 FEB
3     3 MAY
4     4 JUN
5     5 AUG

Isn't that word you are looking for called join ?

library(dplyr)
library(purrr)

# create example data 
df_list <- list(data.frame(month = c(1:3), val = c(1,2,5)), 
                data.frame(month = c(1:5), val = c(1,2,5,6,8)))

mymonths <- data.frame(month_name=c("JAN","FEB","MAR",
              "APR","MAY","JUN",
              "JUL","AUG","SEP",
              "OCT","NOV","DEC"),
              month=seq(12))

map(df_list,left_join, mymonths)

We get list of dataframes back

[[1]]
  month val month_name
1     1   1        JAN
2     2   2        FEB
3     3   5        MAR

[[2]]
  month val month_name
1     1   1        JAN
2     2   2        FEB
3     3   5        MAR
4     4   6        APR
5     5   8        MAY

simply use the transform function: Depending on the name you want to assign to the new variable you can rewrite the existing variable or create a totally new variable:

rewriting an existing variable:

   lapply(df_list,transform,month=mymonths[month])
[[1]]
  month val
1   JAN   1
2   FEB   2
3   MAR   5

[[2]]
  month val
1   JAN   1
2   FEB   2
3   MAR   5
4   APR   6
5   MAY   8

creating a new variable: 

        lapply(df_list,transform,newcolumn=mymonths[month])
    [[1]]
      month val newcolumn
    1     1   1       JAN
    2     2   2       FEB
    3     3   5       MAR

    [[2]]
      month val newcolumn
    1     1   1       JAN
    2     2   2       FEB
    3     3   5       MAR
    4     4   6       APR
    5     5   8       MAY

Using tidyverse package, map function from purrr package and month.abb constant in base R:

library(tidyverse)
d1 <- data.frame(month = c(1:3), val = c(1,2,5))
d2 <- data.frame(month = c(1:5), val = c(1,2,5,6,8))
df_list <- list(d1, d2)

month_abbreviation <- function(x) 
    transform(x, MonthAbb = month.abb[month])

Let's use map function from purrr package to run iteratively your function without using for loops

list_of_df <- map(df_list, month_abbreviation)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM