Assignment 5
1. Consider a binary classification problem with the following set of attributes and attribute values:
- Air Conditioner = {Working, Broken}
- Engine = {Good, Bad}
- Mileage = {High, Medium, Low}
- Rust = {Yes, No}
Suppose a rule-based classifier produces the following rule set:
1. Mileage = High -> Value = Low
2. Mileage = Low -> Value = High
3. Air Conditioner = Working, Engine = Good -> Value = High
4. Air Conditioner = Working, Engine = Bad -> Value = Low
5. Air Conditioner = Broken -> Value = Low
(a) Are the rules mutually exclusive?
No, the rules are not mutually exclusive. For example, a car could match either of the two mileage rules and also have a working air conditioner, matching one of the third or fourth rules.
(b) Is the rule set exhaustive?
Two of the mileage attributes are covered by the first two rules; a car with Medium mileage will fall through to the next rules. In that case, when the mileage is Medium, the next 3 rules cover all possible cases of air conditioning status. All possible engine scenarios have been covered as well. Therefore, the rule set is exhaustive - there are no cases which do not fit at least one rule.
(c) Is ordering needed for this set of rules?
As I hinted at in part (b), yes, some ordering is needed. For example, by falling through to rule 3, we're implicitly specifying the mileage as Medium.
(d) Do you need a default class for the rule set?
Since the rule set is exhaustive, a default rule is not required.
2. The RIPPER algorithm (by Cohen [170]) is an extension of an earlier algorithm called IREP (by Fürnkranz and Widmer [184]). Both algorithms apply the reduced-error pruning method to determine whether a rule needs to be pruned. The reduced error pruning method uses a validation set to estimate the generalization error of a classifier. Consider the following pair of rules:
R1: A -> C
R2: A ^ B -> C
R2 is obtained by adding a new conjunct, B, to the left-hand side of R1. For this question, you will be asked to determine whether R2 is preferred over R1 from the perspectives of rule-growing and rule-pruning. To determine whether a rule should be pruned, IREP computes the following measure:
vIREP = (p + (N - n)) / (P + N)
where P is the total number of positive exmaples in the validation set, N is the total number of negative examples in the validation set, p is the number of positive examples in the validation set covered by the rule, and n is the number of negative examples in the validation setcovered by the rule. vIREP is actually similar to classification accuracy for the whole set. IREP favors rules that have higher values of vIREP. On the other hand, RIPPER applies the following measure to determine whether a rule should be pruned:
vRIPPER = (p - n) / (p + n)
(a) Suppose R1 is covered by 350 positive examples and 150 negative examples, while R2 is covered by 300 positive examples and 50 negative examples. Compute the FOIL's information gain for the rule R2 with respect to R1.
FOIL = p1 * (log2 (p1 / (p1 + n1)) - (log2 (p0 / (p0 + n0))))
= 300 * ( ln(6/7) / ln(2) - ln(7/10) / ln(2)) = 87.6542
(b) Consider a validation set that contains 500 positive examples and 500 negative examples. For R1, suppose the number of positive examples covered by the rule is 200, and the number of negative examples coverd by the rule is 50. For R2, suppose the number of positive examples covered by the rule is 100 and the number of negative examples is 5. Compute vIREP for both rules. Which rule does IREP prefer?
R1: (200 + (500 - 50)) / (500 + 500) = 0.65
R2: (100 + (500 - 5)) / (500 + 500) = 0.595
Since IREP favors rules that have higher values of vIREP, R1 is preferred.
(c) Compute vRIPPER for the previous problem. Which rule does RIPPER prefer?
R1: (200 - 50) / (200 + 50) = 0.6
R2: (100 - 5) / (100 + 5) = 19/21 (about 0.905)
RIPPER prefers R2, with it's much higher vRIPPER.
7. Consider the data set shown in Table 5.10.
(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-), and P(C|-).
P(A=0|+) = 2/5
P(A=1|+) = 3/5
P(B=0|+) = 4/5
P(B=1|+) = 1/5
P(C=0|+) = 1/5
P(C=1|+) = 4/5
P(A=0|-) = 3/5
P(A=1|-) = 2/5
P(B=0|-) = 3/5
P(B=1|-) = 2/5
P(C=0|-) = 0
P(C=1|-) = 1
(b) Use the estimate of conditional probabilities given in the previous question to predict the class label for a test sample (A = 0, B = 1, C = 0) using the naïve Bayes approach.
Prior probabilities:
P(+) = 5/10 = 0.5
P(-) = 5/10 = 0.5
Class-conditional probabilites:
P(A=0|+) * P(B=1|+) * P(C=0|+) = (2/5) * (1/5) * (1/5) = 2/125 = 0.016
P(A=0|-) * P(B=1|-) * P(C=0|-) = (3/5) * (2/5) * 0 = 0
Posterior probabilities:
P(+|X) = a * 0.5 * 0.016 = 0.008a
P(-|X) = a * 0.5 * 0 = 0
Therefore, since P(+|X) > P(-|X), the test sample belongs in the group '-'.
(c) Estimate the conditional probabilities using the m-estimate approach, with p = 1/2 and m = 4.
Still working on these calculations...
(d) Repeat part (b) using the conditional probabilities given in part (c).
(e) Compare the two methods for estimating probabilities. Which method is better and why?