Module learnrdr_adhoc :: Class RDRLearner
[hide private]
[frames] | no frames]

Class RDRLearner

source code

Class for learning ripple-down rules from data using the greedy set covering.

Instance Methods [hide private]
 
learn(self, data_file, features_file, max_depth, cutoff, pos_symbol, neg_symbol, min_length, max_length, regenerate)
Learn RDR from data using greedy set covering and display the results.
source code
boolean
check_data(self, data, depth)
Check if any of the data points need to be classified at a particular depth.
source code
integer
count_errors(self, data, depth)
Count the number of elements wrongly classified at a given depth.
source code
dict
pos_data_for_depth(self, data, depth)
Find data points to be classified at a particular depth.
source code
dict
data_for_rule(self, data, rule)
Find data points covered by rule.
source code
Ruleset
greedy_set_cover(self, data, pos_data, ruleset, possible_rules, covered_data)
Find greedy set covering.
source code
(list of Rules, Rule, integer)
find_possible_rules(self, features, data, depth, covered_data, pos_length)
Find possible rules given a set of data points and features.
source code
Ruleset
find_rules(self, data, features, depth)
Find rules from data.
source code
Method Details [hide private]

learn(self, data_file, features_file, max_depth, cutoff, pos_symbol, neg_symbol, min_length, max_length, regenerate)

source code 

Learn RDR from data using greedy set covering and display the results.

Parameters:
  • data_file (string) - file containing the training data.
  • features_file (string) - file containing the features or "all" if all possible substrings of given lengths are to be generated.
  • max_depth (integer) - maximum depth of rules.
  • cutoff (float) - cutoff value for cost of sets to be added to the covering.
  • pos_symbol (string) - symbol signifying positive example in data.
  • neg_symbol (string) - symbol signifying negative example in data.
  • min_length (integer) - minimum length for substrings to be generated.
  • max_length (integer) - maximum length for substrings to be generated.
  • regenerate (boolean) - whether to regenerate possible rules after each time a rule is added to the ruleset

check_data(self, data, depth)

source code 

Check if any of the data points need to be classified at a particular depth.

Parameters:
  • data (dict) - data points and their classifications.
  • depth (integer) - depth to look at.
Returns: boolean
True, if there exists a data point currently incorrectly classified

count_errors(self, data, depth)

source code 

Count the number of elements wrongly classified at a given depth.

Parameters:
  • data (dict) - data points and their classifications.
  • depth (integer) - depth to look at.
Returns: integer
Number of wrongly classified elements.

pos_data_for_depth(self, data, depth)

source code 

Find data points to be classified at a particular depth.

Parameters:
  • data (dict) - data points and their classifications.
  • depth (integer) - depth to look at.
Returns: dict
Data points to be classified at a particular depth.

data_for_rule(self, data, rule)

source code 

Find data points covered by rule.

Parameters:
  • data (dict) - data points to look from.
  • rule (L(Rule))
Returns: dict
Data points covered by the rule.

greedy_set_cover(self, data, pos_data, ruleset, possible_rules, covered_data)

source code 

Find greedy set covering.

Parameters:
  • data (dict) - data points and their classifications.
  • pos_data (dict) - data points to be classified by the rules and their classifications.
  • ruleset (L(Ruleset)) - rules already found.
  • possible_rules (list) - possible rules that can be used for classification.
  • covered_data (dict) - data points already covered by a rule.
Returns: Ruleset
Ruleset that was found.

find_possible_rules(self, features, data, depth, covered_data, pos_length)

source code 

Find possible rules given a set of data points and features.

Parameters:
  • data (dict) - data points and their classifications.
  • depth (integer) - maximum depth for exceptions.
  • covered_data (dict) - data points already covered by a rule.
  • pos_length (integer) - number of unclassified elements.
  • features (list)
Returns: (list of Rules, Rule, integer)
Possible rules, best rule and its index.

find_rules(self, data, features, depth)

source code 

Find rules from data.

Parameters:
  • data (dict) - data points and their classifications.
  • features (list) - possible features for classification
  • depth (integer) - maximum depth for exceptions for the rules to be found.
Returns: Ruleset
Ruleset that was found.