[英]Downsize the object memory by subsetting a data frame in R
So I'm using the database from https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1 and I don't understand why I can't downsize the object size by subsetting the dataset 所以我使用的数据库来自https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1我不明白为什么我不能通过子集化数据集来缩小对象大小
df = read.csv('Hotel_Reviews.csv')
object.size(df)
200503848 bytes
200503848字节
object.size(df[sample(1:nrow(df),500),])
157225848 bytes
157225848字节
By taking 0.1% of the data, I only downsized the data to 75%. 通过获取0.1%的数据,我只将数据缩小到75%。 I don't understand why...
我不明白为什么......
Ok after looking more deeply at it, it seems it's because my data frame was made of factors and even by subsetting, it keeps the empty levels 好看之后,看起来好像是因为我的数据框是由因素构成的,甚至是通过子集化,它保持空白的水平
df = read.csv('Hotel_Reviews.csv',stringsAsFactors = FALSE)
object.size(df)
210584168 bytes
210584168字节
object.size(df[sample(1:nrow(df),500),])
394464 bytes
394464字节
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.