忘備録 LightGBM、CatBoost:二値分類、目的関数への重みづけ
二値分類タスクでは、クラスの不均衡性で目的関数の重みづけが重要となる場合があります。
LightGBMの場合
params = { 'objective' : objective, 'metric' : metric, 'boosting_type': boosting_type, 'device': device, 'random_state':39, 'is_unbalance': True, #設定する 'verbose':-1 } # 重みを計算する関数 def calc_log_loss_weight(y_true): nc = np.bincount(y_true) w0, w1 = 1/(nc[0]/y_true.shape[0]), 1/(nc[1]/y_true.shape[0]) return w0, w1 #重みを計算 train_w0, train_w1 = calc_log_loss_weight(t_train) valid_w0, valid_w1 = calc_log_loss_weight(t_test) print(train_w0, train_w1) # 重みを反映 lgb_train = lgb.Dataset(X_train, t_train, weight=pd.Series(t_train).map({0: train_w0, 1: train_w1}) ) lgb_eval = lgb.Dataset(X_test, t_test, weight=pd.Series(t_test).map({0: valid_w0, 1: valid_w1}))
CatBoostの場合
基本は同じ
# 重みを計算する関数 def calc_log_loss_weight(y_true): nc = np.bincount(y_true) w0, w1 = 1/(nc[0]/y_true.shape[0]), 1/(nc[1]/y_true.shape[0]) return w0, w1 #重みを計算 train_w0, train_w1 = calc_log_loss_weight(y_train) valid_w0, valid_w1 = calc_log_loss_weight(y_eval) print(train_w0, train_w1) # 重みを反映 xgb_train = Pool(data=X_train, label=y_train, cat_features=cf, weight=pd.Series(y_train).map({0: train_w0, 1: train_w1}).values) xgb_eval = Pool(data=X_eval, label=y_eval, cat_features=cf, weight=pd.Series(y_eval).map({0: valid_w0, 1: valid_w1}).values) #確率で予測 pred = model.predict(xgb_eval, prediction_type='Probability',ntree_end=bst.best_iteration_)
その他
www.kaggle.com