简体   繁体   English

从R中的data.frame创建新类

[英]Create new class from data.frame in R

I am toying around with functions, classes and methods in R. To have a "hand on" exercise that could also be useful, I have decided to create my "package" for taking care of my household budget. 我正在玩弄R中的功能,类和方法。为了进行“实践”练习也很有用,我决定创建我的“包”以照顾我的家庭预算。 Simply put, I want a series of functions, classes and methods to calculate stuff, plot different kind of charts and what not. 简单地说,我想要一系列函数,类和方法来计算东西,绘制不同类型的图表,什么不是。 The first thing that I wanted to do is creating a "Budget" class: this should take in a csv with certain columns and return an object "Budget" that inherit the same method of a data frame but to whom I can apply a set of "Budgets" methods. 我想要做的第一件事就是创建一个“预算”类:这应该包含一个带有某些列的csv并返回一个对象“Budget”,该对象继承了数据框的相同方法,但我可以向其应用一组“预算”方法。 Here is my take 这是我的看法

prepareData = function (csv, type=1) {

if (type == 1) {
Data = read.csv(csv,dec = ".")}
else if (type == 2) {
Data = read.csv2(csv,dec = ",")}
else {stop ("Accetable value for type are 1 and 2")}

NamesToHave = c("Date","Title","Amount","Category")

if (sum(as.numeric(colnames(Data) %in% NamesToHave)) < 4) {
    stop ("The csv file has not the mandatory columns (Data, Title, Amount, Category)")}




if (class(try(tolower(Data$Title),silent = T)) == "try-error" | class(try(tolower(Data$Category),silent = T)) == "try-error") {
    stop("Are you sure there are no special character in your csv file ?")} 

Data$Day = sapply(strsplit(as.character(Data$Date), "/"),"[[",1)
Data$Month = month.abb[as.numeric(sapply(strsplit(as.character(Data$Date), "/"),"[[",2))]
Data$Year = sapply(strsplit(as.character(Data$Date), "/"),"[[",3)

Data = Data[with(Data, order(Year, Month, Day)), ]
Data$Amount = as.character(Data$Amount)
Data$Amount = as.numeric(as.character(Data$Amount))

class(Data) <- append(class(Data),"Budget")
return(Data)
}

Now, this return a data frame with all the necessary modifications, and overall it works fine as a function, but if I take a csv as follows 现在,这将返回一个包含所有必要修改的数据框,总体而言它作为一个函数正常工作,但如果我采用如下csv

structure(list(Date = structure(c(22L, 1L, 1L, 1L, 1L, 1L), .Label = c("01/10/2016", 
"01/11/2016", "02/10/2016", "04/10/2016", "04/11/2016", "05/10/2016", 
"05/11/2016", "06/10/2016", "06/11/2016", "07/10/2016", "08/10/2016", 
"08/11/2016", "09/10/2016", "09/11/2016", "10/10/2016", "10/11/2016", 
"11/10/2016", "12/11/2016", "14/10/2016", "16/10/2016", "18/10/2016", 
"20/09/2016", "20/10/2016", "21/10/2016", "22/09/2016", "22/10/2016", 
"23/09/2016", "23/10/2016", "25/09/2016", "25/10/2016", "26/09/2016", 
"26/10/2016", "27/10/2016", "28/10/2016", "29/10/2016", "30/10/2016"
), class = "factor"), Title = structure(c(20L, 6L, 36L, 29L, 
30L, 11L), .Label = c("Bagpiper", "beer debaser", "Br", "brewdog", 
"Burger King", "Clas", "coop", "Coop", "Eriksdalbadet", "etc", 
"ETC", "Flippin", "Fotografiska", "Gateau Agneta", "Grekisk fastfood", 
"Grill", "Gunnarson", "Gunnarsson", "hemkop", "HK", "Hotorhallen", 
"ICA", "ICA Skinnskat", "Igor Sport", "Intersport", "Kak", "klattercentret", 
"LullesFagel", "Mae Thai", "MamaWolf", "Material", "Matrerial", 
"Oriental Supermarket", "Paradiset", "Pendeltag Uppsala", "PGW", 
"Pressbyran", "Primeburger", "Primo Ciao ciao", "R Asia", "Systembolaget", 
"taxi Skinnskat", "The Cure drinks", "Udden pensionat", "Ugglan", 
"Wentzels hobby"), class = "factor"), Amount = c(167.27, 331, 
971, 99, 192, 3289), Category = structure(c(10L, 3L, 3L, 6L, 
6L, 3L), .Label = c("Drink", "extra", "Extra", "Extra_Fede", 
"extra_food", "Extra_food", "extra_laure", "Extra_Laure", "food", 
"Food"), class = "factor")), .Names = c("Date", "Title", "Amount", 
"Category"), row.names = c(NA, 6L), class = "data.frame")

and I run 然后我跑

Data = prepareData("name.csv")
class(Data)

The output is just "data.frame". 输出只是“data.frame”。 But if I then run again from terminal the second to last line of the function 但是如果我再从终端再次运行该功能的第二行到最后一行

class(Data) <- append(class(Data),"Budget")
class(Data)

I got "data.frame" and "Budget" as output. 我将“data.frame”和“Budget”作为输出。

What am I doing wrong ? 我究竟做错了什么 ?

Your problem was here: 你的问题在这里:

if (as.numeric(colnames(Data) %in% NamesToHave) != 4) {}

The first comparation will be vectorized performed and return TRUE TRUE TRUE TRUE , which will become 1 1 1 1 when gone throw as.numeric() . 第一个比较将进行矢量化并返回TRUE TRUE TRUE TRUE ,当抛出as.numeric()时将变为1 1 1 1 Then, this vector will be compared to != 4 , which is vectorized performed and return TRUE TRUE TRUE TRUE (all the 'one's are different from four). 然后,将该矢量与!= 4进行比较,执行矢量化并返回TRUE TRUE TRUE TRUE (所有'一个与四个不同)。 The if()` statement will not evaluet the whole vector, just it's first element (and throw you a warning message). if()`语句不会评估整个向量,只是它的第一个元素(并向你发出一条警告信息)。

To solve this issue, you just have to switch the as.numeric() function to sum() . 要解决此问题,您只需将as.numeric()函数切换为sum()

if (sum(colnames(Data) %in% NamesToHave) != 4) {}

When you sum a logical vector, R will coerce it to numerical: all TRUE become 1 and all FASLE become 0 . 当你对逻辑向量求和时, R会将它强制转换为数字:所有TRUE变为1 ,所有FASLE变为0 Now you will have the 4 sum that will evaluet FALSE in the if statement, and the function it run smoothly. 现在,您将获得在if语句中评估为FALSE的4和,以及它运行顺畅的函数。 Once I solved it, it has both classes when I first run it. 一旦我解决了它,它在我第一次运行它时就有两个类。

As said in this article , it good to restart R before posting your question and make sure you're still having the problem you're reporting. 正如本文所述,在发布问题之前重启R并确保您仍然遇到报告问题是很好的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM