Tuesday, July 31, 2007

divide my dataset to subsets to perform some experiments on each

You can add an ID attribute to all your data using the AddID filter in
weka.filters.unsupervised.attribute. Following this you can create your
splits explicitly using filters in the weka.filters.unsupervised.instance package (e.g. RemovePercentage, RemoveRange and RemoveFolds) or use the cross-validation (or
percentage split) evaluation options in the Explorer. In order to make
sure that the ID attribute is not used by the learned models you can
use the weka.classifiers.meta.FilteredClassifier in conjunction with
your chosen classifier and the weka.filters.unsupervised.attribute.Remove filter in order to remove the ID attribute just prior to constructing a classifier (and at
testing time too). With the current snapshot of the developer
version of Weka, you can also output additional attributes alongside the
predictions (in your case, the ID attribute).

No comments: