简体   繁体   English

对R中的多个数据框应用相同的功能

[英]Applying the same function for multiple dataframes in R

I am a new R user and I meet problems with my code. 我是R的新用户,我的代码遇到问题。 I have 16 different dataframes and I would like to apply the same function for each dataframe. 我有16个不同的数据框,我想为每个数据框应用相同的功能。 Then, I want to put all the result in the new dataframe. 然后,我想将所有结果放入新的数据框中。 I wrote this code and it works well : 我写了这段代码,它运作良好:

    df2012<-as.data.frame(cprop(wtd.table(database2012$year,database2012$nivvie_dec,weights=database2012$wprm),total=FALSE))
    df2012$annee<-"2012"
    df2011<-as.data.frame(cprop(wtd.table(database2011$year,database2011$nivvie_dec,weights=database2011$wprm),total=FALSE))
    df2011$annee<-"2011"
    df2010<-as.data.frame(cprop(wtd.table(database2010$year,database2010$nivvie_dec,weights=database2010$wprm),total=FALSE))
    df2010$annee<-"2010"
    df2009<-as.data.frame(cprop(wtd.table(database2009$year,database2009$nivvie_dec,weights=database2009$wprm),total=FALSE))
    df2009$annee<-"2009"
    df2008<-as.data.frame(cprop(wtd.table(database2008$year,database2008$nivvie_dec,weights=database2008$wprm),total=FALSE))
    df2008$annee<-"2008"
    df2007<-as.data.frame(cprop(wtd.table(database2007$year,database2007$nivvie_dec,weights=database2007$wprm),total=FALSE))
    df2007$annee<-"2007"
    df2006<-as.data.frame(cprop(wtd.table(database2006$year,database2006$nivvie_dec,weights=database2006$wprm),total=FALSE))
    df2006$annee<-"2006"
    df2005<-as.data.frame(cprop(wtd.table(database2005$year,database2005$nivvie_dec,weights=database2005$wprm),total=FALSE))
    df2005$annee<-"2005"
    df2004<-as.data.frame(cprop(wtd.table(database2004$year,database2004$nivvie_dec,weights=database2004$wprm),total=FALSE))
    df2004$annee<-"2004"
    df2003<-as.data.frame(cprop(wtd.table(database2003$year,database2003$nivvie_dec,weights=database2003$wprm),total=FALSE))
    df2003$annee<-"2003"
    df2002<-as.data.frame(cprop(wtd.table(database2002$year,database2002$nivvie_dec,weights=database2002$wprm),total=FALSE))
    df2002$annee<-"2002"
    df2001<-as.data.frame(cprop(wtd.table(database2001$year,database2001$nivvie_dec,weights=database2001$wprm),total=FALSE))
    df2001$annee<-"2001"
    df2000<-as.data.frame(cprop(wtd.table(database2000$year,database2000$nivvie_dec,weights=database2000$wprm),total=FALSE))
    df2000$annee<-"2000"
    df1999<-as.data.frame(cprop(wtd.table(database1999$year,database1999$nivvie_dec,weights=database1999$wprm),total=FALSE))
    df1999$annee<-"1999"
    df1998<-as.data.frame(cprop(wtd.table(database1998$year,database1998$nivvie_dec,weights=database1998$wprm),total=FALSE))
    df1998$annee<-"1998"
    df1997<-as.data.frame(cprop(wtd.table(database1997$year,database1997$nivvie_dec,weights=database1997$wprm),total=FALSE))
    df1997$annee<-"1997"
    df1996<-as.data.frame(cprop(wtd.table(database1996$year,database1996$nivvie_dec,weights=database1996$wprm),total=FALSE))
    df1996$annee<-"1997"
    df19962012<-rbind(df1996,df1997,df1998,df1999,df2000,df2001,df2002,df2003,df2004,df2005,df2006,df2007,df2008,df2009,df2010,df2011,df2012)

However, it is a long code and I need to replicate for others variables like sex, educational levels and family structure instead of year... I looked for a shorter code using lapply , but all my tentatives failed. 但是,这是一个很长的代码,我需要复制其他变量,例如性别,学历和家庭结构,而不要复制年份……我使用lapply寻找了一个较短的代码,但是我所有的尝试都失败了。 Someone knows a way to shorten the code ? 有人知道缩短代码的方法吗?

Thank you very much for your help ! 非常感谢您的帮助 !

Again, see my comment to generate a new example, but the following should get at the core elements of your question and is reproducible. 同样,请参阅我的评论以生成一个新示例,但是以下内容应成为您问题的核心要素,并且是可重复的。 Walk through each portion slowly to understand what's going on. 慢慢地遍历每个部分以了解发生了什么。 In general, you should strive for DRY code when possible and get in the habit of writing small/simple functions anytime you find yourself repeating lines of code: 通常,您应该尽可能尝试DRY代码 ,并在发现重复的代码行时养成编写小型/简单函数的习惯:

Make two "fake" data.frames: 制作两个“假” data.frames:

df1 <- data.frame(x = 1:10)
df2 <- data.frame(x = 11:20)

A simple "dummy" function h(x) , rather, h(df) , takes a data.frame and creates a new column y by taking the dataframe's existing x column and adding 10 . 一个简单的“虚拟”函数h(x) ,而不是h(df) ,通过获取数据data.frame的现有x列并添加10获取一个data.frame并创建一个新列y

h <- function(df) {
  df$y <- df$x + 10
  df
}

Find all the objects of the pattern df-any-number and store them in dfs : 找到df-any-number模式的所有对象,并将它们存储在dfs

dfs <- ls(pattern = "df[0-9]")
dfs

Run lapply over dfs by searching by name (ie mget ) and apply function h to each of them. 通过按名称(即mget )搜索在dfs运行lapply ,并对每个函数应用函数h Finally, rbind the results via do.call . 最后,通过do.call rbind结果。

do.call(rbind, lapply(mget(dfs), h))

#         x  y
# df1.1   1 11
# df1.2   2 12
# df1.3   3 13
# df1.4   4 14
# df1.5   5 15
# df1.6   6 16
# df1.7   7 17
# df1.8   8 18
# df1.9   9 19
# df1.10 10 20
# df2.1  11 21
# df2.2  12 22
# df2.3  13 23
# df2.4  14 24
# df2.5  15 25
# df2.6  16 26
# df2.7  17 27
# df2.8  18 28
# df2.9  19 29
# df2.10 20 30

Some posts that will be helpful to guide your understanding: 一些有助于指导您理解的帖子:

for a list of Dataframes: 有关数据框的列表:

yDF <- function(y) {
  db <- get(paste0("database", y))
  df <- as.data.frame(cprop(wtd.table(db$year,db$nivvie_dec,weights=db$wprm),total=FALSE))
  df$annee <- y
  df
}
years <- 1996:2012
L <- lapply(years, yDF)

... normaly I am not a friend of get(). ...通常我不是get()的朋友。 you also can do rbind() for a long dataframe: 您还可以对较长的数据帧执行rbind():

DF <- yDF(1996)
for (y in 1997:2012) DF <- rbind(DF, yDF(y))

You can do something like complete_dataframe <- rbind(...) to combine all your data frames together, especially if they have a separate column that defines each dataframe (here it will be annee ). 您可以执行诸如complete_dataframe <- rbind(...)来将所有数据帧组合在一起,尤其是当它们具有定义每个数据帧的单独列(此处为annee )时。 Then you can use either the data.table package or dplyr package to apply a function over specific groups. 然后,您可以使用data.table包或dplyr包在特定组上应用功能。

In dplyr , the workflow would be dplyr ,工作流程为

complete_dataframe %>% group_by(annee) %>% mutate(new_var = somefunction(columns_to_pass_into_function))

to generate new variables, or 生成新变量,或

complete_dataframe %>% group_by(annee) %>% summarise(new_var = somefunction(columns_to_pass_into_function))

to create a summary table over the groups. 在组上创建摘要表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM