简体   繁体   English

数据集中的标题(Matlab)

[英]Headers in dataset (Matlab)

I can't find any good documentation about dataset(), so that's why I want to ask you guys, I'll keep the question short: 我找不到关于dataset()的任何好的文档,所以这就是为什么我想问你们,我会保持简短的问题:

Can I set headers (column titles) in a dataset, without entering data into the dataset yet? 我可以在数据集中设置标题(列标题),而无需将数据输入数据集吗? I guess not, so the 2nd part of the question would be: 我猜不是,所以问题的第二部分是:
Can I make a one-row dataset, in which I name the headers, with empty data, and overwrite it later? 我可以创建一个单行数据集,在其中我用空数据命名标题,并在以后覆盖它吗?

Let me show you what I was trying, but did not work: 让我告诉你我在尝试什么,但没有奏效:

dmsdb = dataset({ 'John','Name'},{'Amsterdam','City'},{10,'number' });  
produces:  
    Name    City         number  
    John    Amsterdam    10 --> Headers are good!  

Problem is, that when I am going to add more data to the dataset, it expects all strings to be of the same length. 问题是,当我要向数据集添加更多数据时,它希望所有字符串具有相同的长度。 So I use cellstr(): 所以我使用cellstr():

dmsdb(1,1:3) = dataset({ cellstr('John'),'Name'},{cellstr('Amsterdam'),'City'},{10,'number' });  
Produces:  
    Var1          Var2               Var3  
    'John'        'Amsterdam'        10  

Where did my headers go? 我的标题在哪里? How do I solve this issue, and what is causing this? 我该如何解决这个问题,以及导致这个问题的原因是什么?

You can set up an empty dataset like either 您可以设置任何一个空数据集

data = dataset({[], 'Name'}, {[], 'City'}, {[], 'number'});

or 要么

data = dataset([], [], [], 'VarNames', {'Name', 'City', 'number'});

Both will give you: 两者都会给你:

>> data

data = 

[empty 0-by-3 dataset]

But we can see that the column names are set by checking 但我们可以看到列名是通过检查设置的

>> get(data, 'VarNames')                                             

ans = 

    'Name'    'City'    'number'

Now we can add rows to the dataset: 现在我们可以向数据集添加行:

>> data = [data; dataset({'John'}, {'Amsterdam'}, 10, 'VarNames', get(data, 'VarNames'))]

data = 

    Name          City               number
    'John'        'Amsterdam'        10    

You had the basic idea, but just needed to put your string data in cells. 你有基本的想法,但只需要将你的字符串数据放在单元格中。 This replacement for your first line works: 这替换你的第一线工程:

>> dmsdb = dataset({ {'John'},'Name'},{{'Amsterdam'},'City'},{10,'number' }); 

dmsdb = 

    Name          City               number
    'John'        'Amsterdam'        10    

The built-in help for dataset() is actually really good at laying out the details of these and other ways of constructing datasets. dataset()的内置帮助实际上非常擅长于详细介绍这些以及构建数据集的其他方法。 Also check out the online documentation with examples at: 另请查看在线文档,并提供以下示例:

http://www.mathworks.com/help/toolbox/stats/dataset.html http://www.mathworks.com/help/toolbox/stats/dataset.html

One of the Mathworks blogs has a nice post too: 其中一个Mathworks博客也有一个不错的帖子:

http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/ http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/

Good luck! 祝好运!

Here is an example: 这是一个例子:

%# create dataset with no rows
ds = dataset(cell(0,1),cell(0,1),zeros(0,1));
ds.Properties.VarNames = {'Name', 'City', 'number'};

%# adding one row at a time
for i=1:3
    row = {{'John'}, {'Amsterdam'}, 10};  %# construct new row each iteration
    ds(i,:) = dataset(row{:});
end

%# adding a batch of rows all at once
rows = {{'Bob';'Alice'}, {'Paris';'Boston'}, [20;30]};
ds(4:5,:) = dataset(rows{:});

The dataset at the end looks like: 最后的数据集如下:

>> ds
ds = 
    Name           City               number
    'John'         'Amsterdam'        10    
    'John'         'Amsterdam'        10    
    'John'         'Amsterdam'        10    
    'Bob'          'Paris'            20    
    'Alice'        'Boston'           30    

Note: if you want to use concatenation instead of indexing, you have to specify the variable names: 注意:如果要使用串联而不是索引,则必须指定变量名称:

vars = {'Name', 'City', 'number'};
ds = [ds ; dataset(rows{:}, 'VarNames',vars)]

I agree, the help for dataset is hard to understand, mainly because there are so many ways to create a dataset and most methods involve a lot of cell arrays. 我同意,数据集的帮助很难理解,主要是因为有很多方法可以创建数据集,而且大多数方法涉及大量的单元格数组。 Here are my two favorite ways to do it: 以下是我最喜欢的两种方法:

% 1) Create the 3 variables of interest, then make the dataset.  
% Make sure they are column vectors!
>> Name = {'John' 'Joe'}';  City = {'Amsterdam' 'NYC'}'; number = [10 1]';
>> dataset(Name, City, number)

ans = 

    Name          City               number
    'John'        'Amsterdam'        10    
    'Joe'         'NYC'               1    

% 2) More compact than doing 3 separate cell arrays
>> dataset({{'John' 'Amsterdam' 10} 'Name' 'City' 'number'})

ans = 

    Name          City               number  
    'John'        'Amsterdam'        [10]    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM