Active Learning
2020-03-01 144浏览
- 1.ACTIVE LEARNING ‘- 张涛 张 、张 张 春阳、王 张张张张张 欣、阳文 张张张张张 、王 张张 成 2018/5/2 1
- 2.Outline Active Learning Introduction Query Strategy Frameworks ‘- Theoretical Guarantees Advanced Methods Related machine learning algorithms 2
- 3.Active Learning Introduction ‘- 张涛 51174500159 3
- 4.Motivation ‘- 4
- 5.Supervised Passive Learning Data Source Learning Algorithm Unlabele d ‘examples Expert / Oracle Labeled examples Algorithm outputs a classifier 5
- 6.Incorporating Unlabeled Data in the process • In many settings, unlabeled data is cheap & easy to obtain, labeled data is much more expensive. • Web page, document classification • OCR, Image classification ‘- 6
- 7.Semi-Supervised Passive Learning Learning Algorithm Data Source Unlabeled examples Unlabele d ‘examples Expert / Oracle Labeled Examples Algorithm outputs a classifier 7
- 8.Semi-Supervised Passive Learning • Several methods have been developed to try to use unlabeled data to improve performance,e.g.:- Transductive SVM [Joachims ’98] - Co-training [Blum & Mitchell ’98], [BBY04] - Graph-based methods [Blum & Chawla01], [ZGL03] ‘- 8 Maria-Florina Balcan
- 9.Active Learning Learning Algorithm Data Source Expert / Oracle Unlabele d examples Request for the Label of an Example A Label for that Example Request for the Label of an Example A Label for that Example ‘- ... Algorithm outputs a classifier 9
- 10.What Makes a Good Algorithm? • Guaranteed to output a relatively good classifier for most learning problems. • Doesn’t make too many label requests. ‘- Choose the label requests carefully, to get informative labels. 10
- 11.Can It Really Do Better Than Passive? • YES! (sometimes) • We often need far fewer labels for active learning than for passive. ‘- • This is predicted by theory and has been observed in practice. 11 Maria-Florina Balcan
- 12.Scenarios • Membership Query Synthesis • Stream-Based Selective Sampling • Pool-Based sampling ‘- 12 Maria-Florina Balcan
- 13.Membership Query Synthesis ‘- 13 Maria-Florina Balcan
- 14.Stream-Based Selective Sampling ‘- 14
- 15.Pool-Based sampling ‘- 15
- 16.Query Strategy Frameworks ‘- 赵春扬 51174500164 16
- 17.Query Strategy Frameworks 关键问题:选择合适的查询策略 Feature Class label uncertainty Disagreement between different learners Outlier ‘- Focus focus directly on the error itself try to find samples that are representative of the underlying data 17
- 18.categories Heterogeneity-based models Uncertainty sampling Query-by-Committee Expected Model Change ‘- Performance-based models Expected Error Reduction Expected Variance Reduction Representativeness-based models Density-Based Models 18
- 19.Uncertainty sampling Query the instances about which it is least certain how to label. For binary classification ‘- queries the instance whose posterior probability of being positive is nearest 0.5 . For problems with three or more class labels The least confident Margin sampling Entropy 19
- 20.Uncertainty sampling The least confident where ‘- In this strategy, the learner selects the instance for which it has the least confidence in its most likely label. 缺点: It only takes into consideration the most probable label and disregards the other label probabilities. 20
- 21.Uncertainty sampling Margin sampling selects the instance that has the smallest difference between the first and second most probable labels. ‘- corrects for a shortcoming in the least confident strategy,by incorporating the posterior of the second most likely label. For problems with very large label sets, the margin approach still ignores much of the output distribution for the remaining classes. 21
- 22.Uncertainty sampling Entropy (熵) Entropy is an information-theoretic measure that represents the amount of information needed to “encode” a distribution. ‘So it is often thought of as a measure of uncertainty or impurity in machine learning measure The entropy based approach generalizes easily to probabilistic multi-label classifiers and probabilistic models for more complex structured instances 22
- 23.Examples ‘Least Confidence : 0.9<0.5 MarginSampling:'>Sampling: