Fast Train with TransTab
transtab is featured for accepting variable-column tables for training and predicting. This is easy to be done by this package. The full code is available at Notebook Example.
import transtab
# load multiple datasets by passing a list of data names
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data(['credit-g','credit-approval'])
# build transtab classifier model
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)
# specify training arguments, take validation loss for early stopping
training_arguments = {
'num_epoch':5,
'eval_metric':'val_loss',
'eval_less_is_better':True,
'output_dir':'./checkpoint'
}
One can take the validation loss on the validation data of the first dataset credit-g only:
transtab.train(model, trainset, valset[0], **training_arguments)
or take the macro average loss on the validation set of both two datasets:
transtab.train(model, trainset, valset, **training_arguments)
After the training completes, we can load the best checkpoint judged by validation loss from the predefined output_dir and make predictions.
model.load('./checkpoint')
x_test, y_test = testset[0]
ypred = transtab.predict(x_test)
Warning
Under this pure supervised learning setting, all the passed datasets should have the same number of label classes. For instance, here credit-g and credit-approval are both binary classification task. It is because the classifier of transtab only keeps one classification head during the training and predicting.