简体   繁体   English

我可以要求 R 识别数据框的名称,然后将该名称添加到同一数据框的列中吗?

[英]Can I ask R to identify the name of a data frame and then add that name to a column in the same data frame?

Apologies in advance, but I'm new both here and to R.提前道歉,但我在这里和 R 都是新手。 What I'm trying to do is automate adding a column into a data frame that is filled with the actual name of the data frame.我正在尝试做的是自动将一列添加到填充有数据框的实际名称的数据框中。 For example, if I have the following data frame:例如,如果我有以下数据框:

> q103
  a  b  c  d
d 1  4  6  9
e 2  8  3  12
f 3 12  8  16

How can I add a column to the end of it that has the character string q103 in each row (without specifically naming it as such, since I will need to repeat this for several hundred data frames), so that I end up with:如何在每行中添加一个包含字符串 q103 的列的末尾(没有具体命名它,因为我需要对数百个数据帧重复此操作),所以我最终得到:

> q103
  a  b  c  d  X
d 1  4  6  9 q103
e 2  8  3 12 q103
f 3 12  8 16 q103

The problem is that there are a lot of these data frames and they inside a list of lists (eg, something like List[[list]][["q100277']] is a data frame in the list). Also, their names are somewhat random, but are important to keep (I can't just rename them sequentially). So, I need a way to tell R to basically "look at the name of data frame X and add that character string to a new column in the data frame, then do this for every data frame in the list"). It feels like some sort of lapply would work, but I have no idea what to actually tell it to do in order to get there.问题是这些数据框有很多,并且它们在列表列表中(例如,像List[[list]][["q100277']]是列表中的数据框)。此外,它们的名称有点随机,但保留很重要(我不能只是按顺序重命名它们)。所以,我需要一种方法来告诉 R 基本上“查看数据帧 X 的名称并将该字符串添加到数据框,然后对列表中的每个数据框执行此操作”)。感觉某种 lapply 会起作用,但我不知道实际告诉它要做什么才能到达那里。

Any help in figuring out how to get a column into each data frame that is just populated by the name of that data frame without doing so manually for each data frame is greatly appreciated!非常感谢您在弄清楚如何将一列放入每个数据帧中的任何帮助,该列仅由该数据帧的名称填充,而无需为每个数据帧手动执行此操作!

EDIT: I've tried to create a reproducible example (per comments) below.编辑:我试图在下面创建一个可重现的示例(根据评论)。 This will create something similar to what I'm looking at (except the example is a much smaller list!)这将创建类似于我正在查看的内容(除了示例是一个小得多的列表!)

library(CTT)
library(dplyr)
library(tidyverse)
library(purrr)

## Create student response patterns for a fake test

q102 <- c("A", "B", "C", "D", "O", "A", "A", "C", "D", "A", "C", "D", "O", "D", "A", "B", "A", "C", "D", "A")
q107 <- c("C", "D", "O", "D", "A", "B", "A", "C", "D", "A", "A", "B", "C", "D", "O", "A", "A", "C", "D", "A")
q1045 <- c("B", "O", "C", "A", "D", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D")
q101 <- c("A", "B", "C", "D", "O", "A", "A", "C", "D", "A", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D")
q1064 <- c("C", "D", "O", "D", "A", "B", "A", "C", "D", "A", "A", "B", "C", "D", "O", "A", "A", "C", "D", "A")
q104 <- c("A", "B", "C", "D", "O", "A", "A", "C", "D", "A", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D")

## Create an assessment key to identify the test
AssessmentKey <- c("ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW")

## Assign response pattern to the assessment key
Students1 <- data.frame(q102, q107, q1045, q101, q1064, q104, AssessmentKey)
remove(AssessmentKey)

## Create a second assessment key to identify a different test
AssessmentKey <- c("XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ")

## Assign the response pattern to the second assessment key
Students2 <- data.frame(q102, q107, q1045, q101, q1064, q104, AssessmentKey)
remove(q102, q107, q1045, q101, q1064, q104, AssessmentKey)
## Create a data frame combining the two different assessments
StudentAnswers <- rbind(Students1, Students2)

## Create a data frame with the answer key for both tests
AnswerKey <- c("A", "B", "A", "A", "C", "D", "A", "B", "A", "A", "C", "D")
QuestionKey <- c("q102", "q107", "q1045", "q101", "q1064", "q104",
                 "q102", "q107", "q1045", "q101", "q1064", "q104")
AssessmentKey <- c("ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ")
AnswerKeys <- data.frame(QuestionKey, AnswerKey, AssessmentKey)
remove(AnswerKey, QuestionKey, AssessmentKey)



X <- c("ADW", "XYZ")
y <- lapply(
  (X), function(x) 
  {
    ## This will filter the data file to a specific assessment and 
    ## select the columns needed for analysis
    StudentResponse <- StudentAnswers %>%
      dplyr::filter(AssessmentKey == x) %>%
      dplyr::select(q102, q107, q1045, q101, q1064, q104)
  
      
      AKey <- AnswerKeys %>%
        dplyr::filter(AssessmentKey == x) %>%
        dplyr::select(AnswerKey)

      ## using safely from the purr package to run the distractorAnalyis
      ## function from CTT in case of errors

      safeDA = safely(.f=distractorAnalysis)
      safeDA(StudentResponse, AKey)
      
      
    }
)

## This part removes the empty "error" data frames from the list generated above.
Z <- c(1:length(y))
Results <- lapply(
  (Z), function(Z)
  {  y[[Z]][["result"]]
    
  })
  

If I understand correctly, you want to extract the names of a number of (nested) list members, then assign a column into a dataframe contained in that list member.如果我理解正确,您想提取一些(嵌套)列表成员的名称,然后将一列分配到该列表成员中包含的 dataframe 中。

This is a quick and dirty solution using example data.这是使用示例数据的快速而肮脏的解决方案。 It is not best-practices, but it will do in a hurry.这不是最佳实践,但它会很快完成。 Note the <<- to travel up the various environment levels until you find the list in the global environment.注意<<-向上移动各种环境级别,直到在全局环境中找到列表。

# Example data
data(mtcars)

first_list <- list()
first_list[["item1"]] <- list()
first_list[["item2"]] <- list()
first_list[["item1"]][["level2_item1"]] <- mtcars
first_list[["item1"]][["level2_item2"]] <- mtcars

# Iterate through the names of "item1", look up the corresponding dataframe, add a column

lapply(
  names(first_list[["item1"]]),
  function(x) {
    first_list[["item1"]][[x]]$NewCol <<- x
  }
)

If I interpret this corrrectly, you would like to find all dataframes in a list and add a column with the name of that element.如果我正确地解释了这一点,您希望在列表中找到所有数据框并添加一个带有该元素名称的列。

You can do this with a combination of rlang::squash and purrr::map2 function.您可以结合使用rlang::squashpurrr::map2 function。

  1. squash will recursively flatten your list into a single list of dataframes . squash将递归地将您的列表展平为单个dataframes列表。
  2. Then you can map over each element and add a column with the name of the list element.然后您可以在每个元素上 map 并添加一个带有列表元素名称的列。

I have provided a solution to remove the hierarchies from the list and one where you maintain the structure of the list.我提供了一种从列表中删除层次结构的解决方案,并提供了一个您维护列表结构的解决方案。

my_list <- list(
  q0 = mtcars,
  sub_list_1 = list(
      q1 = mtcars
    , q2 = mtcars
  )
  , sub_list_2 = list(
      sub_sub_list_1 = list(
        q3 = mtcars,
        q4 = mtcars
      )
      , sub_sub_list_2 = list(
        q5 = mtcars,
        q6 = mtcars
      )
  )
)
# function to add name col
add_col <- function(table, name) {
  if(!is.data.frame(table)) return(table) # If not a dataframe just return
  
  table$X <- name # add column
  
  return(table)
}

Solution 1解决方案 1

# Using pipes (%>%), purrr, rlang
library(rlang)
library(purrr)

my_list %>% 
  squash() %>% 
  map2(names(.), add_col)

# Using rlang and base R
flat_list <- squash(my_list)
mapply(add_col, flat_list, names(flat_list), SIMPLIFY = F)

If you want to maintain the structure of the list you can recursively go through and apply our add_col function如果您想维护列表的结构,您可以通过递归 go 并应用我们的add_col function

Solution 2解决方案 2

library(purrr)
library(rlang)

recursive_add_col <- function(x) {
  map2(x, names(x), 
      function(x, y) if(is.list(x) & !is.data.frame(x)) recursive_add_col(x) else add_col(x, y)
      )
}

my_list %>% 
  recursive_add_col()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM