I can't find any good documentation about dataset(), so that's why I want to ask you guys, I'll keep the question short:
Can I set headers (column titles) in a dataset, without entering data into the dataset yet? I guess not, so the 2nd part of the question would be:
Can I make a one-row dataset, in which I name the headers, with empty data, and overwrite it later?
Let me show you what I was trying, but did not work:
dmsdb = dataset({ 'John','Name'},{'Amsterdam','City'},{10,'number' });
produces:
Name City number
John Amsterdam 10 --> Headers are good!
Problem is, that when I am going to add more data to the dataset, it expects all strings to be of the same length. So I use cellstr():
dmsdb(1,1:3) = dataset({ cellstr('John'),'Name'},{cellstr('Amsterdam'),'City'},{10,'number' });
Produces:
Var1 Var2 Var3
'John' 'Amsterdam' 10
Where did my headers go? How do I solve this issue, and what is causing this?
You can set up an empty dataset like either
data = dataset({[], 'Name'}, {[], 'City'}, {[], 'number'});
or
data = dataset([], [], [], 'VarNames', {'Name', 'City', 'number'});
Both will give you:
>> data
data =
[empty 0-by-3 dataset]
But we can see that the column names are set by checking
>> get(data, 'VarNames')
ans =
'Name' 'City' 'number'
Now we can add rows to the dataset:
>> data = [data; dataset({'John'}, {'Amsterdam'}, 10, 'VarNames', get(data, 'VarNames'))]
data =
Name City number
'John' 'Amsterdam' 10
You had the basic idea, but just needed to put your string data in cells. This replacement for your first line works:
>> dmsdb = dataset({ {'John'},'Name'},{{'Amsterdam'},'City'},{10,'number' });
dmsdb =
Name City number
'John' 'Amsterdam' 10
The built-in help for dataset()
is actually really good at laying out the details of these and other ways of constructing datasets. Also check out the online documentation with examples at:
http://www.mathworks.com/help/toolbox/stats/dataset.html
One of the Mathworks blogs has a nice post too:
http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/
Good luck!
Here is an example:
%# create dataset with no rows
ds = dataset(cell(0,1),cell(0,1),zeros(0,1));
ds.Properties.VarNames = {'Name', 'City', 'number'};
%# adding one row at a time
for i=1:3
row = {{'John'}, {'Amsterdam'}, 10}; %# construct new row each iteration
ds(i,:) = dataset(row{:});
end
%# adding a batch of rows all at once
rows = {{'Bob';'Alice'}, {'Paris';'Boston'}, [20;30]};
ds(4:5,:) = dataset(rows{:});
The dataset at the end looks like:
>> ds
ds =
Name City number
'John' 'Amsterdam' 10
'John' 'Amsterdam' 10
'John' 'Amsterdam' 10
'Bob' 'Paris' 20
'Alice' 'Boston' 30
Note: if you want to use concatenation instead of indexing, you have to specify the variable names:
vars = {'Name', 'City', 'number'};
ds = [ds ; dataset(rows{:}, 'VarNames',vars)]
I agree, the help for dataset is hard to understand, mainly because there are so many ways to create a dataset and most methods involve a lot of cell arrays. Here are my two favorite ways to do it:
% 1) Create the 3 variables of interest, then make the dataset.
% Make sure they are column vectors!
>> Name = {'John' 'Joe'}'; City = {'Amsterdam' 'NYC'}'; number = [10 1]';
>> dataset(Name, City, number)
ans =
Name City number
'John' 'Amsterdam' 10
'Joe' 'NYC' 1
% 2) More compact than doing 3 separate cell arrays
>> dataset({{'John' 'Amsterdam' 10} 'Name' 'City' 'number'})
ans =
Name City number
'John' 'Amsterdam' [10]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.