Sample generation

Sampling generates learning and test samples from a data file.

Java interface: Data menu, Generate sample option.

The files are generated in the FisPro working directory, or in the bin subdirectory of the FisPro root directory, depending on the operating system.

Command line, sample program:

Argument:

Optional arguments:

 -nNs where Ns is the number of sample pairs (default value: 1 creates on learning sample and one test sample
 -pApp where App is the data file size ratio used to determine the learning sample size (default value: 0.75)
 -sSeed
the same Seed value will reproduce the same samples
(default value, 0 gives new random samples at each call)
 -c to create samples which respect the class proportion in the data file
 -oNumC
used with the -c option, to give the column number used to assign classes in the data file (default value, last column)
 -eTol
used with the -c option, Tol=tolerance to assign classes (default value: 1e-2)
 -a for detailed display

Command line example:

sample iris -n4 -c

generates the 8 following files using the iris data file:

iris.lrn.sample.0,iris.lrn.sample.1,..., iris.tst.sample.0,iris.tst.sample.1, ..., with class proportions identical to the iris data (species 1,2,3).

The iris.lrn.sample.0 and iris.tst.sample.0 files form one learning-test pair, and so on.