繁体   English   中英

使用arff文件存储数据

[英]Using a arff file for storing data

我正在使用此示例为我的weka projext 输入链接描述在此处创建我的.arff文件。

double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, 
                       {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};

    int numInstances = data[0].length;

    FastVector atts = new FastVector();
    ArrayList<Instance> instances = new ArrayList<Instance>();
    for (int dim = 0; dim < 2; dim++) {
        // Create new attribute / dimension
        Attribute current = new Attribute("Attribute" + dim, dim);
        // Create an instance for each data object


        if (dim == 0) {
            for (int obj = 0; obj < numInstances; obj++) {
                instances.add(new SparseInstance(0));

            }
        }

        // Fill the value of dimension "dim" into each object
        for (int obj = 0; obj < numInstances; obj++) {
            instances.get(obj).setValue(current, data[dim][obj]);
            System.out.println(instances.get(obj));
        }

        // Add attribute to total attributes
        atts.addElement(current);

    }

     // Create new dataset
    Instances newDataset = new Instances("Dataset", atts, instances.size());

    // Fill in data objects
    for (Instance inst : instances) {
        newDataset.add(inst);       
    }

    BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
    writer.write(newDataset.toString());
    writer.flush();
    writer.close();
}

我已经注意到结果格式将vector的row元素放在.arff文件的列中。 我想将整个行放在.arff文件的第一行中。 我该怎么办? 就我而言,二维向量的最后一列代表行数据的标签。

我的arff文件的预期结果:

4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0, 1 // for example the first row
 19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0,  
 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 
 354.0, 356.0, 357.0, 358.0, 0 // the second row.

该示例中的代码将表中的每一列都视为一个实例(因此,有29个实例,每个实例都有两个属性)。 听起来您想将每一行都视为一个实例(给出两个实例,每个实例具有29个属性):

double[][] data = {
                    {4058.0, 4059.0, ... }, /* first instance */
                    {19.0, 20.0, ... }      /* second instance */
                  };

int numAtts = data[0].length;
FastVector atts = new FastVector(numAtts);
for (int att = 0; att < numAtts; att++)
{
    atts.addElement(new Attribute("Attribute" + att, att));
}

int numInstances = data.length;
Instances dataset = new Instances("Dataset", atts, numInstances);
for (int inst = 0; inst < numInstances; inst++)
{
    dataset.add(new Instance(1.0, data[inst]));
}

BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(dataset.toString());
writer.flush();
writer.close();

我用Instance替换了SparseInstance ,因为几乎所有属性值都不为零。 请注意,在Weka 3.7中, Instance已成为接口,应改用DenseInstance 而且,不赞成使用FastVector ,而推荐使用Java的ArrayList

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM