TransTabClassifier

class transtab.modeling_transtab.TransTabClassifier(categorical_columns=None, numerical_columns=None, binary_columns=None, feature_extractor=None, num_class=2, hidden_dim=128, num_layer=2, num_attention_head=8, hidden_dropout_prob=0, ffn_dim=256, activation='relu', device='cuda:0', **kwargs)[source]

The classifier model subclass from transtab.modeling_transtab.TransTabModel.

Parameters

categorical_columns (list) – a list of categorical feature names.
numerical_columns (list) – a list of numerical feature names.
binary_columns (list) – a list of binary feature names, accept binary indicators like (yes,no); (true,false); (0,1).
feature_extractor (TransTabFeatureExtractor) – a feature extractor to tokenize the input tables. if not passed the model will build itself.
num_class (int) – number of output classes to be predicted.
hidden_dim (int) – the dimension of hidden embeddings.
num_layer (int) – the number of transformer layers used in the encoder.
num_attention_head (int) – the numebr of heads of multihead self-attention layer in the transformers.
hidden_dropout_prob (float) – the dropout ratio in the transformer encoder.
ffn_dim (int) – the dimension of feed-forward layer in the transformer layer.
activation (str) – the name of used activation functions, support "relu", "gelu", "selu", "leakyrelu".
device (str) – the device, "cpu" or "cuda:0".

Returns

Return type

A TransTabClassifier model.

forward(x, y=None)[source]

Make forward pass given the input feature x and label y (optional).

Parameters

x (pd.DataFrame or dict) – pd.DataFrame: a batch of raw tabular samples; dict: the output of TransTabFeatureExtractor.
y (pd.Series) – the corresponding labels for each sample in x. if label is given, the model will return the classification loss by self.loss_fn.

Returns

logits (torch.Tensor) – the [CLS] embedding at the end of transformer encoder.
loss (torch.Tensor or None) – the classification loss.