简体   繁体   English

R:拆分 data.frame 但保留类

[英]R: split a data.frame but keep classes

I would like to split a dataframe into a list of dataframes and keep the classes of the variables.我想将 dataframe 拆分为数据框列表并保留变量的类。

# create sample data
df <- data.frame(
  id=c("1","2"),
  site_name = c("Zero Hedge", "Free Software Foundation"),
  site_url = c("https://www.zerohedge.com", "https://www.fsf.org")
)

# specify class for site_url 
class(df$site_url) <- "formula"

# split
dataframes <- split(df, df$id)

Now I wonder, why the splitted data changed the class:现在我想知道,为什么拆分的数据会改变 class:

class(dataframes[[1]]$site_url)
[1] "character"

My questions:我的问题:

  1. Why does that happen?为什么会这样?
  2. How can I split a dataframe into a list of dataframes and keep the classes of the variables?如何将 dataframe 拆分为数据框列表并保留变量的类?

Thank you for your help.谢谢您的帮助。

Additional info:附加信息:

I came across this problem when I wanted to automatically write hyperlinks to excel files with R and openxlsx according to this very helpful post: Openxlsx hyperlink output display in Excel I came across this problem when I wanted to automatically write hyperlinks to excel files with R and openxlsx according to this very helpful post: Openxlsx hyperlink output display in Excel

We can set the attributes我们可以设置属性

dataframes2 <- lapply(dataframes, function(x) {
      attributes(x$site_name) <- attributes(df$site_name)
     x}) 

The issue is not related to split or methods of it.该问题与split或其方法无关。 In this case, it is split.data.frame .在这种情况下,它是split.data.frame If we look at the source code, it is splitting based on the sequence of rows based on the grouping 'f' and then doing the extraction ( [ )如果我们查看源代码,它是基于分组 'f' 的行序列拆分,然后进行提取( [

split.data.frame
function (x, f, drop = FALSE, ...) 
lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), 
    function(ind) x[ind, , drop = FALSE])

But, the split.data.table keeps the class但是, split.data.table保留了 class

split(as.data.table(df), df$id) %>% str
#List of 2
# $ 1:Classes ‘data.table’ and 'data.frame':    1 obs. of  3 variables:
# ..$ id       : chr "1"
# ..$ site_name: chr "Zero Hedge"
# ..$ site_url : 'formula' chr "https://www.zerohedge.com"
 # ..- attr(*, ".internal.selfref")=<externalptr> 
# $ 2:Classes ‘data.table’ and 'data.frame':    1 obs. of  3 variables:
#  ..$ id       : chr "2"
#  ..$ site_name: chr "Free Software Foundation"
#  ..$ site_url : 'formula' chr "https://www.fsf.org"

-checking the structure of the original data with the extracted rows data - 使用提取的行数据检查原始数据的结构

str(df)
'data.frame':   2 obs. of  3 variables:
 $ id       : chr  "1" "2"
 $ site_name: chr  "Zero Hedge" "Free Software Foundation"
 $ site_url : 'formula' chr  "https://www.zerohedge.com" "https://www.fsf.org" str(df[1,]) # with one row selected
'data.frame':   1 obs. of  3 variables:
 $ id       : chr "1"
 $ site_name: chr "Zero Hedge"
 $ site_url : chr "https://www.zerohedge.com" # lost attribute
str(df[1:2,]) # with more than one row
'data.frame':   2 obs. of  3 variables:
 $ id       : chr  "1" "2"
 $ site_name: chr  "Zero Hedge" "Free Software Foundation"
 $ site_url : chr  "https://www.zerohedge.com" "https://www.fsf.org" # lost attribute

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM