以管道作为估计器的VotingClassifier
我想VotingClassifier
从多个不同的模型(决策树,SVC和Keras网络)构建sklearn集成。它们都需要不同类型的数据预处理,这就是为什么我为它们每个创建了管道。
# Define pipelines
# DTC pipeline
featuriser = Featuriser()
dtc = DecisionTreeClassifier()
dtc_pipe = Pipeline([('featuriser',featuriser),('dtc',dtc)])
# SVC pipeline
scaler = TimeSeriesScalerMeanVariance(kind='constant')
flattener = Flattener()
svc = SVC(C = 100, gamma = 0.001, kernel='rbf')
svc_pipe = Pipeline([('scaler', scaler),('flattener', flattener), ('svc', svc)])
# Keras pipeline
cnn = KerasClassifier(build_fn=get_model())
cnn_pipe = Pipeline([('scaler',scaler),('cnn',cnn)])
# Make an ensemble
ensemble = VotingClassifier(estimators=[('dtc', dtc_pipe),
('svc', svc_pipe),
('cnn', cnn_pipe)],
voting='hard')
的Featuriser
,TimeSeriesScalerMeanVariance
而Flattener
类是一些定制变压器,所有雇用fit
,transform
和fit_transform
方法。
当我尝试ensemble.fit(X, y)
拟合整个集合时,我收到错误消息:
ValueError:估计器列表应为分类器。
我能理解,因为各个估算器不是专门的分类器,而是管道。有没有办法让它继续工作?
-
问题出在
KerasClassifier
。它不提供_estimator_type
已签入的_validate_estimator
。这不是使用管道的问题。管道将此信息作为属性提供。看这里。
因此,快速修复方法是设置
_estimator_type='classifier'
。一个可重现的示例:
# Define pipelines from sklearn.pipeline import Pipeline from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC from sklearn.preprocessing import MinMaxScaler, Normalizer from sklearn.ensemble import VotingClassifier from keras.wrappers.scikit_learn import KerasClassifier from sklearn.datasets import make_classification from keras.layers import Dense from keras.models import Sequential X, y = make_classification() # DTC pipeline featuriser = MinMaxScaler() dtc = DecisionTreeClassifier() dtc_pipe = Pipeline([('featuriser', featuriser), ('dtc', dtc)]) # SVC pipeline scaler = Normalizer() svc = SVC(C=100, gamma=0.001, kernel='rbf') svc_pipe = Pipeline( [('scaler', scaler), ('svc', svc)]) # Keras pipeline def get_model(): # create model model = Sequential() model.add(Dense(10, input_dim=20, activation='relu')) model.add(Dense(1, activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) return model cnn = KerasClassifier(build_fn=get_model) cnn._estimator_type = "classifier" cnn_pipe = Pipeline([('scaler', scaler), ('cnn', cnn)]) # Make an ensemble ensemble = VotingClassifier(estimators=[('dtc', dtc_pipe), ('svc', svc_pipe), ('cnn', cnn_pipe)], voting='hard') ensemble.fit(X, y)