简体   繁体   English

使用数据框列表将功能应用于列名称

[英]apply function to column names using a list of data frames

I'm trying to apply a very complex function to a list of more than 50 Data Frames. 我正在尝试将非常复杂的功能应用于50多个数据帧的列表。 Let's use a very simple function to lowercase names and just 3 data frames for the sake of clarity, but my general approach is coded below 为了清楚起见,让我们使用一个非常简单的函数来小写名称和3个数据帧,但是我的通用方法编码如下

[EDITED NAMES]
# Data Sample. Every column name is different accross Data Frames


quality <- data.frame(FIRST=c(1,5,3,3,2), SECOND=c(3,6,1,5,5))
thickness <- data.frame(THIRD=c(6,0,9,1,2), FOURTH=c(2,7,2,2,1))
distance <- data.frame(ONEMORE=c(0,0,1,5,1), ANOTHER=c(4,1,9,2,3))


# list of dataframes

dfs <- list(quality, thickness, distance)


# a very simple function (just for testing)
# actually a very complex one is used on real data

BetterNames <- function(x) {
    names(x) <- tolower(names(x))
  x
}


# apply function to data frame list

dfs <- lapply(dfs, BetterNames)

# I know the expected R behaviour is to modify a copy of the object,
# instead of the original object itself. So if you get the names
# you get the original version, not the needed one

names(quality)

[1] "FIRST"  "SECOND"

is there any way of using any function inside a loop or "apply" in place for a huge amount of data frames? 有没有办法在循环内使用任何函数或“应用”大量数据帧?
As a result we must get the modified one replacing the original one for every data frame in the list (big list) 结果,我们必须获得修改后的数据,以替换列表(大列表)中的每个数据帧的原始数据。

I know there's a trick using Data Table, but I wonder if using base R is that possible. 我知道使用数据表有一个技巧,但是我想知道是否可以使用基数R。

Expected Results: 预期成绩:

 names(quality)

    [1] "first"  "second"

[EDITED] Pointed out to this answer: Rename columns in multiple dataframes, R [编辑]指出了这个答案: 重命名多个数据框中的列,R

But not working. 但是没有用。 You can't use a vector of string names in my case because my new names are not a fixed list of strings.[EDITED DATA] 在我的情况下,您不能使用字符串名称的向量,因为我的新名称不是固定的字符串列表。[编辑数据]

for(df in dfs) {
  df.tmp <- get(df)
  names(df.tmp) <- BetterNames(df)
  assign(df, df.tmp)
}

> names(quality)
[1] "quality" NA  

Thanks 谢谢

You already have the best case scenario: 您已经有了最好的情况:

Let's add some names to your list: 让我们为列表添加一些名称:

names(dfs) <- c("quality", "thickness", "distance")
dfs <- lapply(dfs, BetterNames)

dfs[["quality"]]
#   first second
# 1     1      3
# 2     5      6
# 3     3      1
# 4     3      5
# 5     2      5

This works great. 这很好。 And all your data is in a list, so if there are other things you want to do to all your data frames it is very easy. 而且所有数据都在列表中,因此,如果要对所有数据框执行其他操作,这非常容易。

If you are done treating these data frames similarly and really want them back in the global environment to work with individually, you can do it with 如果您完成了对这些数据帧的类似处理,并且确实希望它们在全局环境中可以单独使用,则可以使用

list2env(dfs, envir = .GlobalEnv)

I would recommend keeping them in a list though---in most cases if you have 50 data frames you are working with, in a list it is easy to use lapply or for loops to use them, but as individual objects you will be copy/pasting code and making mistakes. 我建议,虽然他们留在列表---在大多数情况下,如果你有你正在使用50个的数据帧,在list很容易使用lapplyfor循环使用它们,而是作为单独的对象,你会抄/粘贴代码并犯错误。


I would consider even starting with 50 data frames in your workspace a problem - see How do I make a list of data frames? 我甚至会考虑从工作空间中的50个数据帧开始,这是一个问题-请参阅如何制作数据帧列表? for recommendations on finding an upstream fix: going straight to a list from the start. 有关查找上游修补程序的建议:从头开始直接查找列表。

i'd use a simple yet effective parse & eval approach. 我会使用一种简单而有效的解析与评估方法。

Let's use a for loop to compose a command that suited your needs: 让我们使用一个for循环来编写适合您需求的命令:

for(df in dfs) {

command <- paste0("names(",df,") <- BetterNames(",df,")")
# print(command)
eval(parse(text=command))

}

names(quality)
[1] "first"  "second"

names(thickness)
[1] "third"  "fourth"

names(distance)
[1] "onemore"  "another"

This is for sure not optimal and I hope something better comes up but here it goes: 这肯定不是最佳选择,我希望能有更好的结果,但是可以解决:

BetterNames <- function(x, y) {

    names(x) <- tolower(names(x))
    assign(y, x, envir = .GlobalEnv)

}

dfs <- list(quality, thickness, distance)
dfs2 <- c("quality", "thickness", "distance")
mapply(BetterNames, dfs, dfs2)

> names(quality)
[1] "first"  "second"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM