简体   繁体   English

通过在R中对数据帧进行子集来缩小对象存储器的大小

[英]Downsize the object memory by subsetting a data frame in R

So I'm using the database from https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1 and I don't understand why I can't downsize the object size by subsetting the dataset 所以我使用的数据库来自https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1我不明白为什么我不能通过子集化数据集来缩小对象大小

df = read.csv('Hotel_Reviews.csv')
object.size(df)

200503848 bytes 200503848字节

object.size(df[sample(1:nrow(df),500),])

157225848 bytes 157225848字节

By taking 0.1% of the data, I only downsized the data to 75%. 通过获取0.1%的数据,我只将数据缩小到75%。 I don't understand why... 我不明白为什么......

Ok after looking more deeply at it, it seems it's because my data frame was made of factors and even by subsetting, it keeps the empty levels 好看之后,看起来好像是因为我的数据框是由因素构成的,甚至是通过子集化,它保持空白的水平

df = read.csv('Hotel_Reviews.csv',stringsAsFactors = FALSE)
object.size(df)

210584168 bytes 210584168字节

object.size(df[sample(1:nrow(df),500),])

394464 bytes 394464字节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM