简体   繁体   中英

R: split a data.frame but keep classes

I would like to split a dataframe into a list of dataframes and keep the classes of the variables.

# create sample data
df <- data.frame(
  id=c("1","2"),
  site_name = c("Zero Hedge", "Free Software Foundation"),
  site_url = c("https://www.zerohedge.com", "https://www.fsf.org")
)

# specify class for site_url 
class(df$site_url) <- "formula"

# split
dataframes <- split(df, df$id)

Now I wonder, why the splitted data changed the class:

class(dataframes[[1]]$site_url)
[1] "character"

My questions:

  1. Why does that happen?
  2. How can I split a dataframe into a list of dataframes and keep the classes of the variables?

Thank you for your help.

Additional info:

I came across this problem when I wanted to automatically write hyperlinks to excel files with R and openxlsx according to this very helpful post: Openxlsx hyperlink output display in Excel

We can set the attributes

dataframes2 <- lapply(dataframes, function(x) {
      attributes(x$site_name) <- attributes(df$site_name)
     x}) 

The issue is not related to split or methods of it. In this case, it is split.data.frame . If we look at the source code, it is splitting based on the sequence of rows based on the grouping 'f' and then doing the extraction ( [ )

split.data.frame
function (x, f, drop = FALSE, ...) 
lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), 
    function(ind) x[ind, , drop = FALSE])

But, the split.data.table keeps the class

split(as.data.table(df), df$id) %>% str
#List of 2
# $ 1:Classes ‘data.table’ and 'data.frame':    1 obs. of  3 variables:
# ..$ id       : chr "1"
# ..$ site_name: chr "Zero Hedge"
# ..$ site_url : 'formula' chr "https://www.zerohedge.com"
 # ..- attr(*, ".internal.selfref")=<externalptr> 
# $ 2:Classes ‘data.table’ and 'data.frame':    1 obs. of  3 variables:
#  ..$ id       : chr "2"
#  ..$ site_name: chr "Free Software Foundation"
#  ..$ site_url : 'formula' chr "https://www.fsf.org"

-checking the structure of the original data with the extracted rows data

str(df)
'data.frame':   2 obs. of  3 variables:
 $ id       : chr  "1" "2"
 $ site_name: chr  "Zero Hedge" "Free Software Foundation"
 $ site_url : 'formula' chr  "https://www.zerohedge.com" "https://www.fsf.org" str(df[1,]) # with one row selected
'data.frame':   1 obs. of  3 variables:
 $ id       : chr "1"
 $ site_name: chr "Zero Hedge"
 $ site_url : chr "https://www.zerohedge.com" # lost attribute
str(df[1:2,]) # with more than one row
'data.frame':   2 obs. of  3 variables:
 $ id       : chr  "1" "2"
 $ site_name: chr  "Zero Hedge" "Free Software Foundation"
 $ site_url : chr  "https://www.zerohedge.com" "https://www.fsf.org" # lost attribute

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM