Sample generation

This option creates sample files by random sampling from a data file. Two possibilities: to generate learning and test pairs, or K sample files. Let N be the number of rows.

In the first case, each pair includes a sample file and its complement. One can choose the number of pairs, the relative sample file size, and the random sampling seed. The procedure creates as many file pairs as asked for, they have the data file name, followed by textitlrn.sample.n for one file, and by tst.sample.n for the other one, where n varies from 0 (first pair) to N-1 (Nth pair).

In the second case, the procedure splits the data file into K blocks, each of them of size floor(N/K) if the constant size option is checked, or else of size floor(N/K) for the first K-1 files, and N-K*floor(N/K) for the last one.

Other options (valid in both cases):

A choice of zero (0) for the seed means a new sampling, another value (1 for instance) sets the seed to a fixed value, allowing to repeat a given sampling.

The classif. checkbox imposes sampling to respect the class proportions in a data file column, by default the last one. In that case, a tolerance value (default=0.01) can be set, and will be used to determine classes from data.