简体   繁体   English

如何从R中的现有列值动态创建新列

[英]How to create new column from existing column value in R dynamically

i have data frame called df,how to create new column from existing list column data frame. 我有一个称为df的数据框,如何从现有列表列数据框创建新列。

my data frame. 我的数据框。

Policy             Item

Checked           list(Processed = "Valid", Gmail = "yy@gmail", Information = list(list(Descrption = "T1, R1", VID = "YUY")))

Sample            list(Processed = "Valid", Gmail = "tt@gmail", Information = list(list(Descrption = "D3, Y3", VID = "RT")))

Processed         list(Processed = "Valid", Gmail = "pp@gmail", Information = list(list(Descrption = "Y2, LE", VID = "UIU")))

my expected data frame. 我期望的数据帧。

Policy          Processed    Gmail        Descrption  VID

Checked           Valid      yy@gmail       "T1,R1"  "YUY"

Sample            Valid      tt@gmail       "D3,Y3"  "RT"

Processed         Valid      pp@gmail       "Y2,LE"  "UIU"

i'm using below code to get my expected dataframe . 我正在使用下面的代码来获取我期望的数据框。

na_if_null <- function(x) if (is.null(x)) NA else x

new_cols <- lapply(
  Filter(is.list, df),
  function(list_col) {
    names_ <- setNames(nm = unique(do.call(c, lapply(list_col, names))))
    lapply(names_, function(name) sapply(list_col, function(x) 
      trimws(na_if_null(as.list(x)[[name]]))))
  }
)

res <- do.call(
  data.frame,
  c(
    list(df, check.names = FALSE, stringsAsFactors = FALSE),
    do.call(c, new_cols)
  )
)

But i'm getting below Data frame.please help me to done my post. 但是我已经低于数据框架。请帮助我完成我的文章。

Policy       Item                                                                                                          Item.Processed    Item.Gmail     Item.Information

Checked      list(Processed = "Valid", Gmail = "yy@gmail", Information = list(list(Descrption = "T1, R1", VID = "YUY")))    Processed        yy@gmail      list(Descrption = "T1, R1", VID = "YUY")

Sample       list(Processed = "Valid", Gmail = "tt@gmail", Information = list(list(Descrption = "D3, Y3", VID = "RT")))     Processed        tt@gmail      list(Descrption = "D3, Y3", VID = "RT")  

Processed    list(Processed = "Valid", Gmail = "pp@gmail", Information = list(list(Descrption = "Y2, LE", VID = "UIU")))    Processed        pp@gmail      list(Descrption = "Y2, LE", VID = "UIU")

dput 输出

    structure(list(Policy = c("Checked", "Sample", "Processed"), Item = list(
    structure(list(Processed = "Valid", Gmail = "yy@gmail", Information = list(
        structure(list(Descrption = "T1, R1", VID = "YUY"), .Names = c("Descrption", 
        "VID"), class = "data.frame", row.names = 1L))), .Names = c("Processed", 
    "Gmail", "Information"), class = "data.frame", row.names = 1L), 
    structure(list(Processed = "Valid", Gmail = "tt@gmail", Information = list(
        structure(list(Descrption = "D3, Y3", VID = "RT"), .Names = c("Descrption", 
        "VID"), class = "data.frame", row.names = 1L))), .Names = c("Processed", 
    "Gmail", "Information"), class = "data.frame", row.names = 1L), 
    structure(list(Processed = "Valid", Gmail = "pp@gmail", Information = list(
        structure(list(Descrption = "Y2, LE", VID = "UIU"), .Names = c("Descrption", 
        "VID"), class = "data.frame", row.names = 1L))), .Names = c("Processed", 
    "Gmail", "Information"), class = "data.frame", row.names = 1L))), row.names = c(NA, 
3L), class = "data.frame", .Names = c("Policy", "Item"))

Sample data frame 样本数据框

Policy             colval                                 Item     

Checked         list(PID="4",Bdetail ="ui,89")      list(Processed = "Valid", Gmail = "yy@gmail", Information = list(list(Descrption = "T1, R1", VID = "YUY")))

Sample          list(PID="7",Bdetail ="ju,78")      list(Processed = "Valid", Gmail = "tt@gmail", Information = list(list(Descrption = "D3, Y3", VID = "RT")))

Processed       list(PID ="8",Bdetail ="nj,45")     list(Processed = "Valid", Gmail = "pp@gmail", Information = list(list(Descrption = "Y2, LE", VID = "UIU")))

Here a solution in base R: 这是基于R的解决方案:

dd <- 
cbind(
  dx$Policy,
  do.call(rbind,
          lapply(seq_len(nrow(dx)), function(i)unlist(dx$Item[i]))
  )
)

colnames(dd) <- c("Policy","Processed","Gmail","Descrption","VID")

dd

#       Policy      Processed Gmail      Descrption VID  
# [1,] "Checked"   "Valid"   "yy@gmail" "T1, R1"   "YUY"
# [2,] "Sample"    "Valid"   "tt@gmail" "D3, Y3"   "RT" 
# [3,] "Processed" "Valid"   "pp@gmail" "Y2, LE"   "UIU"

Basically I am using unlist for each item. 基本上,我对每个项目都使用unlist and Then joining them using the classic d.call(rbind,llist) . 然后使用经典的d.call(rbind,llist)加入他们。

edit 编辑

in case you want tu use the same names as the original sub lists you can do something like : 如果您希望tu使用与原始子列表相同的名称,则可以执行以下操作:

colnames(dd) <- c("Policy",gsub(".*[.]","",colnames(dd)[-1]))

data.table solution 数据表解决方案

library(data.table)
setDT(dx)
dx[, rbindlist(lapply(.SD,function(x)data.table(t(unlist(x))))),Policy]

Easily done with unnest from tidyr : 与轻松完成unnesttidyr

library(dplyr)
library(tidyr)

df %>%
  unnest() %>%
  unnest()

Result: 结果:

     Policy Processed    Gmail Descrption VID
1   Checked     Valid yy@gmail     T1, R1 YUY
2    Sample     Valid tt@gmail     D3, Y3  RT
3 Processed     Valid pp@gmail     Y2, LE UIU

Data: 数据:

df =     structure(list(Policy = c("Checked", "Sample", "Processed"), Item = list(
  structure(list(Processed = "Valid", Gmail = "yy@gmail", Information = list(
    structure(list(Descrption = "T1, R1", VID = "YUY"), .Names = c("Descrption", 
                                                                   "VID"), class = "data.frame", row.names = 1L))), .Names = c("Processed", 
                                                                                                                               "Gmail", "Information"), class = "data.frame", row.names = 1L), 
  structure(list(Processed = "Valid", Gmail = "tt@gmail", Information = list(
    structure(list(Descrption = "D3, Y3", VID = "RT"), .Names = c("Descrption", 
                                                                  "VID"), class = "data.frame", row.names = 1L))), .Names = c("Processed", 
                                                                                                                              "Gmail", "Information"), class = "data.frame", row.names = 1L), 
  structure(list(Processed = "Valid", Gmail = "pp@gmail", Information = list(
    structure(list(Descrption = "Y2, LE", VID = "UIU"), .Names = c("Descrption", 
                                                                   "VID"), class = "data.frame", row.names = 1L))), .Names = c("Processed", 
                                                                                                                               "Gmail", "Information"), class = "data.frame", row.names = 1L))), row.names = c(NA, 
                                                                                                                                                                                                               3L), class = "data.frame", .Names = c("Policy", "Item"))

Note: 注意:

Notice I used two passes of unnest because there are two levels of lists in your original dataframe. 注意,我使用了两次unnest因为原始数据帧中有两个级别的列表。 unnest automatically flattens all lists in the dataframe and reuses the names, but it does not do it recursively, so you will have to have as many unnest as there are list levels. unnest自动拉平数据帧中的所有列表并重用名称,但不会递归执行,因此您将必须拥有与列表级别一样多的unnest

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM