[英]Rename columns in multiple dataframes, R
I am trying to rename columns of multiple data.frame
s. 我正在尝试重命名多个data.frame
的列。
To give an example, let's say I've a list of data.frame
s dfA
, dfB
and dfC
. 举个例子,假设我有一个data.frame
s dfA
, dfB
和dfC
。 I wrote a function changeNames
to set names accordingly and then used lapply
as follows: 我写了一个函数changeNames
来相应地设置名称然后使用lapply
,如下所示:
dfs <- list(dfA, dfB, dfC)
ChangeNames <- function(x) {
names(x) <- c("A", "B", "C" )
}
lapply(dfs, ChangeNames)
However, this doesn't work as expected. 但是,这不能按预期工作。 It seems that I am not assigning the new names to the data.frame
, rather only creating the new names. 似乎我没有将新名称分配给data.frame
,而只是创建新名称。 What am I doing wrong here? 我在这做错了什么?
Thank you in advance! 先感谢您!
There are two things here: 这里有两件事:
1) You should return the value you want from your function. 1)您应该从函数中返回所需的值。 Else, the last value will be returned. 否则,将返回最后一个值。 In your case, that's names(x)
. 在你的情况下,这是names(x)
。 So, instead you should add as the final line, return(x)
or simply x
. 所以,你应该添加最后一行, return(x)
或简单地x
。 So, your function would look like: 所以,你的功能看起来像:
ChangeNames <- function(x) { names(x) <- c("A", "B", "C" ) return(x) }
2) lapply
does not modify your input objects by reference. 2) lapply
不会通过引用修改输入对象。 It works on a copy. 它适用于副本。 So, you'll have to assign the results back. 因此,您必须重新分配结果。 Or another alternative is to use for-loops
instead of lapply
: 或者另一种方法是使用for-loops
而不是lapply
:
# option 1 dfs <- lapply(dfs, ChangeNames) # option 2 for (i in seq_along(dfs)) { names(dfs[[i]]) <- c("A", "B", "C") }
Even using the for-loop
, you'll still make a copy (because names(.) <- .
does). 即使使用for-loop
,你仍然会复制(因为names(.) <- .
)。 You can verify this by using tracemem
. 您可以使用tracemem
验证这tracemem
。
df <- data.frame(x=1:5, y=6:10, z=11:15)
tracemem(df)
# [1] "<0x7f98ec24a480>"
names(df) <- c("A", "B", "C")
tracemem(df)
# [1] "<0x7f98e7f9e318>"
If you want to modify by reference, you can use data.table
package's setnames
function: 如果要通过引用进行修改,可以使用data.table
包的setnames
函数:
df <- data.frame(x=1:5, y=6:10, z=11:15)
require(data.table)
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
setnames(df, c("A", "B", "C"))
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
You see that the memory location df
is mapped to hasn't changed. 您看到映射到的内存位置df
未更改。 The names have been modified by reference. 名称已通过参考修改。
If the dataframes were not in a list but just in the global environment, you could refer to them using a vector of string names. 如果数据帧不在列表中但仅在全局环境中,则可以使用字符串名称向量引用它们。
dfs <- c("dfA", "dfB", "dfC")
for(df in dfs) {
df.tmp <- get(df)
names(df.tmp) <- c("A", "B", "C" )
assign(df, df.tmp)
}
EDIT 编辑
To simplify the above code you could use 为简化上述代码,您可以使用
for(df in dfs)
assign(df, setNames(get(df), c("A", "B", "C")))
or using data.table
which doesn't require reassigning. 或使用不需要重新分配的data.table
。
for(df in c("dfA", "dfB"))
data.table::setnames(get(df), c("G", "H"))
I had the problem of importing a public data set and having to rename each dataframe and rename each column in each dataframe to trim whitespaces, lowercase, and replace internal spaces with periods. 我遇到了导入公共数据集并且必须重命名每个数据帧并重命名每个数据帧中的每一列以修剪空格,小写和用句点替换内部空格的问题。
Combining the above methods got me: 结合上述方法让我:
for (eachdf in dfs)
df.tmp <- get(eachdf)
for (eachcol in 1:length(df.tmp))
colnames(df.tmp)[eachcol] <-
str_trim(str_to_lower(str_replace_all(colnames(df.tmp)[eachcol], " ", ".")))
}
assign(eachdf, df.tmp)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.