Reducing the output vocabulary

In the case of a crisp regression output, the rule conclusion values are all different from each other. Reducing the output vocabulary improves the readability of the rule base.

Two choices are available. With the first one (default one), a clustering is performed using the rule conclusions, with the second one it is done by using the data file output values. The clustered values are chosen as the new rule conclusions.

The number of distinct conclusions can be set, or the tolerated loss of performance. Indeed reducing the voabulary usually goes with a loss of accuracy.

Java interface:

FIS menu, Reduce the output vocabulary option.

Command line, vocreduc program:

Argument:

Options:

 -oNumOutput where NumOutput is the output number (default: 0=first output)
 -dType data used to make vocabulary reduction
  • -d1: rule conclusions are generated by clustering the initial rule conclusions
  • -d0: rule conclusions are generated by clustering the output data values in the data file
 -lPerfLoss: PerfLoss is the relative performance loss allowed by vocabulary reduction (default: 0.1)
 -cConc: Conc is the number of elements in the reduced vocabulary (default: 10000).

Remark: these 2 argument default values are 0.1 for PerfLoss, which yields an automatic determination of Conc.

 -sMuMin where MuMin is the activity threshold for an item not to be blank (default: 0.2)
 -a: detailed output

Command line example

vocreduc rice.fis rice

The number of conclusions passes from 25 to 6.