简体   繁体   English

使用Weka:训练数据可以是多长度吗?

[英]Using Weka: Can Training Data be of Multiple Length?

Being relatively new to Weka I'm wondering if it's possibly to train a classifier based on a CSV file containing variable length rows of data. 作为Weka的新手,我想知道是否可能基于包含可变长度数据行的CSV文件来训练分类器。 For example a CSV file that looked like the following: 例如,CSV文件如下所示:

1, 2, 3, 4, 3, 2, 1
1, 2, 4, 3, 2, 1
...

Whilst basic, both of these lines show a clear pattern. 虽然是基本的,但这两条线都显示出清晰的图案。 Will a Weka classifier work effectively with a CSV file that looked like this if it received a similar pattern? Weka分类器是否能够有效地处理如果收到类似模式的CSV文件?

No. you need to explicitly specify which feature is missing value. 不,您需要明确指定哪个功能缺少值。 for example, if 例如,如果

1,2,3,4,3,2,1 is a row with all data; then
1,,2,4,3,2,1  is another row in which the 2nd feature is missing value. 

In short - no, this is a difficult case which cannot be simply approached with defaul WEKA models. 简而言之 - 不,这是一个难以解决的问题,不能简单地使用默认的WEKA模型。 Such data requires either preprocessing in order to get the fixed length representations which WEKA can handle (which can have missing values) or using some more complex models which can work with such data. 这样的数据需要预处理以获得WEKA可以处理的固定长度表示(可以具有缺失值)或使用可以处理这些数据的一些更复杂的模型。 It looks like a time series, so you should look for tools/models that can work with it. 它看起来像一个时间序列,所以你应该寻找可以使用它的工具/模型。 I would suggest looking at DTW (Dynamic Time Warping) and classifiers that work with custom distance measure (for example KNN) instead of raw data representation. 我建议查看DTW(动态时间扭曲)和使用自定义距离测量(例如KNN)而不是原始数据表示的分类器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM