簡體   English   中英

使用arff文件存儲數據

[英]Using a arff file for storing data

我正在使用此示例為我的weka projext 輸入鏈接描述在此處創建我的.arff文件。

double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, 
                       {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};

    int numInstances = data[0].length;

    FastVector atts = new FastVector();
    ArrayList<Instance> instances = new ArrayList<Instance>();
    for (int dim = 0; dim < 2; dim++) {
        // Create new attribute / dimension
        Attribute current = new Attribute("Attribute" + dim, dim);
        // Create an instance for each data object


        if (dim == 0) {
            for (int obj = 0; obj < numInstances; obj++) {
                instances.add(new SparseInstance(0));

            }
        }

        // Fill the value of dimension "dim" into each object
        for (int obj = 0; obj < numInstances; obj++) {
            instances.get(obj).setValue(current, data[dim][obj]);
            System.out.println(instances.get(obj));
        }

        // Add attribute to total attributes
        atts.addElement(current);

    }

     // Create new dataset
    Instances newDataset = new Instances("Dataset", atts, instances.size());

    // Fill in data objects
    for (Instance inst : instances) {
        newDataset.add(inst);       
    }

    BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
    writer.write(newDataset.toString());
    writer.flush();
    writer.close();
}

我已經注意到結果格式將vector的row元素放在.arff文件的列中。 我想將整個行放在.arff文件的第一行中。 我該怎么辦? 就我而言,二維向量的最后一列代表行數據的標簽。

我的arff文件的預期結果:

4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0, 1 // for example the first row
 19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0,  
 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 
 354.0, 356.0, 357.0, 358.0, 0 // the second row.

該示例中的代碼將表中的每一列都視為一個實例(因此,有29個實例,每個實例都有兩個屬性)。 聽起來您想將每一行都視為一個實例(給出兩個實例,每個實例具有29個屬性):

double[][] data = {
                    {4058.0, 4059.0, ... }, /* first instance */
                    {19.0, 20.0, ... }      /* second instance */
                  };

int numAtts = data[0].length;
FastVector atts = new FastVector(numAtts);
for (int att = 0; att < numAtts; att++)
{
    atts.addElement(new Attribute("Attribute" + att, att));
}

int numInstances = data.length;
Instances dataset = new Instances("Dataset", atts, numInstances);
for (int inst = 0; inst < numInstances; inst++)
{
    dataset.add(new Instance(1.0, data[inst]));
}

BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(dataset.toString());
writer.flush();
writer.close();

我用Instance替換了SparseInstance ,因為幾乎所有屬性值都不為零。 請注意,在Weka 3.7中, Instance已成為接口,應改用DenseInstance 而且,不贊成使用FastVector ,而推薦使用Java的ArrayList

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM