Active Learning

2020-03-01 157浏览

1.ACTIVE LEARNING ‘- 张涛张、张张春阳、王张张张张张欣、阳文张张张张张、王张张成 2018/5/2 1
2.Outline Active Learning Introduction Query Strategy Frameworks ‘- Theoretical Guarantees Advanced Methods Related machine learning algorithms 2
3.Active Learning Introduction ‘- 张涛 51174500159 3
4.Motivation ‘- 4
5.Supervised Passive Learning Data Source Learning Algorithm Unlabele d ‘examples Expert / Oracle Labeled examples Algorithm outputs a classifier 5
6.Incorporating Unlabeled Data in the process • In many settings, unlabeled data is cheap & easy to obtain, labeled data is much more expensive. • Web page, document classification • OCR, Image classification ‘- 6
7.Semi-Supervised Passive Learning Learning Algorithm Data Source Unlabeled examples Unlabele d ‘examples Expert / Oracle Labeled Examples Algorithm outputs a classifier 7
8.Semi-Supervised Passive Learning • Several methods have been developed to try to use unlabeled data to improve performance,e.g.:- Transductive SVM [Joachims ’98] - Co-training [Blum & Mitchell ’98], [BBY04] - Graph-based methods [Blum & Chawla01], [ZGL03] ‘- 8 Maria-Florina Balcan
9.Active Learning Learning Algorithm Data Source Expert / Oracle Unlabele d examples Request for the Label of an Example A Label for that Example Request for the Label of an Example A Label for that Example ‘- ... Algorithm outputs a classifier 9
10.What Makes a Good Algorithm? • Guaranteed to output a relatively good classifier for most learning problems. • Doesn’t make too many label requests. ‘- Choose the label requests carefully, to get informative labels. 10
11.Can It Really Do Better Than Passive? • YES! (sometimes) • We often need far fewer labels for active learning than for passive. ‘- • This is predicted by theory and has been observed in practice. 11 Maria-Florina Balcan
12.Scenarios • Membership Query Synthesis • Stream-Based Selective Sampling • Pool-Based sampling ‘- 12 Maria-Florina Balcan
13.Membership Query Synthesis ‘- 13 Maria-Florina Balcan
14.Stream-Based Selective Sampling ‘- 14
15.Pool-Based sampling ‘- 15
16.Query Strategy Frameworks ‘- 赵春扬 51174500164 16
17.Query Strategy Frameworks 关键问题：选择合适的查询策略 Feature  Class label uncertainty  Disagreement between different learners Outlier ‘- Focus   focus directly on the error itself try to find samples that are representative of the underlying data 17
18.categories Heterogeneity-based models    Uncertainty sampling Query-by-Committee Expected Model Change ‘- Performance-based models   Expected Error Reduction Expected Variance Reduction Representativeness-based models  Density-Based Models 18
19.Uncertainty sampling Query the instances about which it is least certain how to label.  For binary classification ‘-  queries the instance whose posterior probability of being positive is nearest 0.5 .  For problems with three or more class labels    The least confident Margin sampling Entropy 19
20.Uncertainty sampling The least confident where ‘-  In this strategy, the learner selects the instance for which it has the least confidence in its most likely label.  缺点： It only takes into consideration the most probable label and disregards the other label probabilities. 20
21.Uncertainty sampling Margin sampling  selects the instance that has the smallest difference between the first and second most probable labels. ‘-  corrects for a shortcoming in the least confident strategy,by incorporating the posterior of the second most likely label.  For problems with very large label sets, the margin approach still ignores much of the output distribution for the remaining classes. 21
22.Uncertainty sampling Entropy （熵）    Entropy is an information-theoretic measure that represents the amount of information needed to “encode” a distribution. ‘So it is often thought of as a measure of uncertainty or impurity in machine learning measure The entropy based approach generalizes easily to probabilistic multi-label classifiers and probabilistic models for more complex structured instances 22
23.Examples ‘Least Confidence : 0.9<0.5 MarginSampling:'>Sampling: