简体   繁体   English

向表或数据框列表中的每个元素添加一个新列

[英]Adding a new column to each element in a list of tables or data frames

I have a list of files.我有一个文件列表。 I also have a list of "names" which I substr() from the actual filenames of these files.我还有一个“名称”列表,我从这些文件的实际文件名中substr() I would like to add a new column to each of the files in the list.我想为列表中的每个文件添加一个新列。 This column will contain the corresponding element in "names" repeated times the number of rows in the file.此列将包含“名称”中相应元素的重复次数乘以文件中的行数。

For example:例如:

df1 <- data.frame(x = 1:3, y=letters[1:3])
df2 <- data.frame(x = 4:6, y=letters[4:6])
filelist <- list(df1,df2)
ID <- c("1A","IB")

Pseudocode伪代码

  for( i in length(filelist)){

       filelist[i]$SampleID <- rep(ID[i],nrow(filelist[i])

  }

// basically create a new column in each of the dataframes in filelist, and fill the column with repeted corresponding values of ID // 基本上在文件列表中的每个数据帧中创建一个新列,并用重复的相应 ID 值填充该列

my output should be like:我的输出应该是这样的:

filelist[1] should be: filelist[1]应该是:

   x y SAmpleID
 1 1 a       1A
 2 2 b       1A
 3 3 c       1A

fileList[2]

   x y SampleID
 1 4 d       IB
 2 5 e       IB
 3 6 f       IB

and so on.....等等.....

Any Idea how it could be done.任何想法如何做到。

An alternate solution is to use cbind, and taking advantage of the fact that R will recylce values of a shorter vector.另一种解决方案是使用 cbind,并利用 R 将回收较短向量的值这一事实。

For Example例如

x <- df2  # from above
cbind(x, NewColumn="Singleton")
 #    x y NewColumn
 #  1 4 d Singleton
 #  2 5 e Singleton
 #  3 6 f Singleton

There is no need for the use of rep .不需要使用rep R does that for you. R 为你做这件事。

Therfore, you could put cbind(filelist[[i]], ID[[i]]) in your for loop or as @Sven pointed out, you can use the cleaner mapply :因此,您可以将cbind(filelist[[i]], ID[[i]])放入for loop或者如@Sven 指出的那样,您可以使用更清洁的mapply

filelist <- mapply(cbind, filelist, "SampleID"=ID, SIMPLIFY=F)

This is a corrected version of your loop:这是循环的更正版本:

for( i in seq_along(filelist)){

  filelist[[i]]$SampleID <- rep(ID[i],nrow(filelist[[i]]))

}

There were 3 problems:有3个问题:

  • A final ) was missing after the command in the body.在正文中的命令之后缺少最后一个)
  • Elements of lists are accessed by [[ , not by [ .列表的元素由[[访问,而不是由[ [ returns a list of length one. [返回长度为 1 的列表。 [[ returns the element only. [[仅返回元素。
  • length(filelist) is just one value, so the loop runs for the last element of the list only. length(filelist)只是一个值,所以循环只针对列表的最后一个元素运行。 I replaced it with seq_along(filelist) .我用seq_along(filelist)替换了它。

A more efficient approach is to use mapply for the task:更有效的方法是对任务使用mapply

mapply(function(x, y) "[<-"(x, "SampleID", value = y) ,
       filelist, ID, SIMPLIFY = FALSE)

The purrr way, using map2 purrr方式,使用map2

library(dplyr)
library(purrr)

map2(filelist, ID, ~cbind(.x, SampleID = .y))

#[[1]]
#  x y SampleId
#1 1 a       1A
#2 2 b       1A
#3 3 c       1A

#[[2]]
#  x y SampleId
#1 4 d       IB
#2 5 e       IB
#3 6 f       IB

Or can also use或者也可以使用

map2(filelist, ID, ~.x %>% mutate(SampleId = .y))

If you name the list, we can use imap and add the new column based on it's name.如果您命名列表,我们可以使用imap并根据它的名称添加新列。

names(filelist) <- c("1A","IB")
imap(filelist, ~cbind(.x, SampleID = .y))
#OR
#imap(filelist, ~.x %>% mutate(SampleId = .y))

which is similar to using Map这类似于使用Map

Map(cbind, filelist, SampleID = names(filelist))

This one worked for me:这个对我有用:

Create a new column for every dataframe in a list;为列表中的每个数据框创建一个新列; fill the values of the new column based on existing column.根据现有列填充新列的值。 (In your case IDs). (在您的情况下是 ID)。

Example:例子:

# Create dummy data
df1<-data.frame(a = c(1,2,3))
df2<-data.frame(a = c(5,6,7))

# Create a list
l<-list(df1, df2)

> l
[[1]]
  a
1 1
2 2
3 3

[[2]]
  a
1 5
2 6
3 7

# add new column 'b'
# create 'b' values based on column 'a' 
l2<-lapply(l, function(x) 
  cbind(x, b = x$a*4))

Results in:结果是:

> l2
[[1]]
  a  b
1 1  4
2 2  8
3 3 12

[[2]]
  a  b
1 5 20
2 6 24
3 7 28

In your case something like:在你的情况下是这样的:

filelist<-lapply(filelist, function(x) 
  cbind(x, b = x$SampleID))

A tricky way:一个棘手的方法:

library(plyr)

names(filelist) <- ID
result <- ldply(filelist, data.frame)
data_lst <- list(
  data_1 = data.frame(c1 = 1:3, c2 = 3:1),
  data_2 = data.frame(c1 = 1:3, c2 = 3:1)
)

f <- function (data, name){
  data$name <- name
  data
}

Map(f, data_lst , names(data_lst)) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 向data.frames列表的每个元素添加新变量 - Adding new variable to each element of a list of data.frames 将新列添加到数据框列表 - Adding new column to list of data frames 读取数据框似乎是数据表和数据框 - 如何将新数据文件作为新元素添加到此表列表中? - Reading in data frames seems to be data table and data frame - how to add new datafile as new element into this list of tables? 反复向数据帧列表中添加新列 - Iteratively adding new columns to a list of data frames 将函数计算列添加到数据框列表 - Adding Function Calculated Column to List of Data Frames 向 data.frames 列表中的每个 data.frame 添加新列 - Add new column to each data.frame in list of data.frames 向列表中的数据框添加新列 - Add new column to data frames in list 数据框列表,尝试为每个 dataframe 创建具有归一化值的新列 - list of data frames, trying to create new column with normalisation values for each dataframe 如何在R中为数据框列表中的每个元素创建新变量,其数据框的名称及其值等于元素的位置 - How to create in R new variable for each element in a list of data frames with the name of data frame and its value equal to position of the element 添加一个新列,其中每个元素是另一列的元素的累积列表 - Adding a new column where each element is a cumulative list of elements of another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM