简体   繁体   中英

How can I validate an ARFF data file?

As part of our data ingestion and processing, there is a need for validating the data based on constraints (such as type, permissible values, etc). ARFF seems to be a format that at least can be used for representing this kind of metadata. When using WEKA, it seems to be doing some form of validation and reporting an error when loading an "invalid" ARFF file.

I'd like to do a similar check in a standalone program. Looking at the WEKA API, it's not evident if the validation is exposed as a public interface in their Java code.

Alternative suggestions to ARFF also welcome.

I'd could do it in SQL I guess, with CHECK constraints declared in the table defs. But ARFF would be a neat solution as it's self contained, and is easily convertible.

There is no validation of ARFF files available. If the parser fails to read it, it is invalid. It is usually recommended to generate ARFF files through the Weka API (by creating weka.core.Instances objects) rather than by string manipulation. That way, one automatically generates valid data structures which in turn generate valid ARFF files (using the ArffSaver).

A long time ago, an antlr syntax was contributed (see bottom of wiki page: https://waikato.github.io/weka-wiki/formats_and_processing/arff_stable/ ). But I don't think so that that syntax covers the relational attribute type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM