Architecture

Rule Learner integrates 3 major components:

A rule engine that analyses large volumes of data in accordance with “training rules” created by a SME and uses this data to generate new training sets (Optional)

An ML algorithm that extracts classification rules from the training sets and represents them in an executable format
A rule engine that can execute the generated rules against new instances. It is usually a component of a larger decision management system

The architectural schema is presented on the picture above.

  • Training Sets. Rule Learner implements a supervised Machine Learning approach that requires Training Sets that usually consists of examples indicating when the desired result has been achieved (positive examples) and counter examples indicating cases when the desired result has not been achieved (negative examples).  Training sets are used by the Rule Learner to discover and represent new rules and to measure the accuracy and effectiveness of the rules once they have been learned.  If the results are satisfactory, the rules can then be used to predict results for new, previously unseen cases.
Optional Trainer. Usually a Trainer is a subject matter expert (SME) who has extensive experience dealing with the historical enterprise data and has the competence and skills to establish goals, concepts, and/or criteria, for detecting patterns and rules.  It is also possible to automate the Trainer function and make it an integral part of the system architecture.  The Trainer also can be implemented as a special rule engine that automatically analyses large volumes of data in accordance with “training rules” created by a SME and uses this data to generate new training sets.  This is especially important for rules-based applications that frequently update enterprise data and would like their rule learner to keep up to date with the latest changes.
The Trainer allows domain experts to incorporate their knowledge into Rule Learner by presenting it in a form of domain-specific training rules. Training Rules usually cover the following common data pre-processing tasks:
    • Selection of issues and attributes to be considered by a Rule Learner
    • Generation of new attributes that generalize the existing attributes by adding nominal attributes, ratios, etc.
    • Preliminary classification of issues
    • Instance filtering rules.

It is important to stress the fact that it is business specialists (NOT software developers) who are normally responsible for maintaining training sets and for evaluating automatically generated rules.

Rule Learner. The second component applies a selected ML algorithm to extract classification rules from the training sets. After rules generations, it shows the results in the format specific for the ML implementation tool, e.g. here is the protocol of WEKA’s ML system using the C4.5 algorithm:

Along with the generated rules it shows statistical metrics – read more here.  It also generates an Excel file (by default “GeneratedRules.xls”) in which the generated rules are presented as a decision table ready to be executed by a rule engine. 

Rule Engine. The third component is a rule engine that can execute the generated rules against new instances. It is usually a component of a larger decision management system such as OpenRules Decision Manager.