TransTabForCL

class transtab.modeling_transtab.TransTabForCL(categorical_columns=None, numerical_columns=None, binary_columns=None, feature_extractor=None, hidden_dim=128, num_layer=2, num_attention_head=8, hidden_dropout_prob=0, ffn_dim=256, projection_dim=128, overlap_ratio=0.1, num_partition=2, supervised=True, temperature=10, base_temperature=10, activation='relu', device='cuda:0', **kwargs)[source]

The contrasstive learning model subclass from transtab.modeling_transtab.TransTabModel.

Parameters
  • categorical_columns (list) – a list of categorical feature names.

  • numerical_columns (list) – a list of numerical feature names.

  • binary_columns (list) – a list of binary feature names, accept binary indicators like (yes,no); (true,false); (0,1).

  • feature_extractor (TransTabFeatureExtractor) – a feature extractor to tokenize the input tables. if not passed the model will build itself.

  • hidden_dim (int) – the dimension of hidden embeddings.

  • num_layer (int) – the number of transformer layers used in the encoder.

  • num_attention_head (int) – the numebr of heads of multihead self-attention layer in the transformers.

  • hidden_dropout_prob (float) – the dropout ratio in the transformer encoder.

  • ffn_dim (int) – the dimension of feed-forward layer in the transformer layer.

  • projection_dim (int) – the dimension of projection head on the top of encoder.

  • overlap_ratio (float) – the overlap ratio of columns of different partitions when doing subsetting.

  • num_partition (int) – the number of partitions made for vertical-partition contrastive learning.

  • supervised (bool) – whether or not to take supervised VPCL, otherwise take self-supervised VPCL.

  • temperature (float) – temperature used to compute logits for contrastive learning.

  • base_temperature (float) – base temperature used to normalize the temperature.

  • activation (str) – the name of used activation functions, support "relu", "gelu", "selu", "leakyrelu".

  • device (str) – the device, "cpu" or "cuda:0".

Returns

Return type

A TransTabForCL model.

forward(x, y=None)[source]

Make forward pass given the input feature x and label y (optional).

Parameters
  • x (pd.DataFrame or dict) – pd.DataFrame: a batch of raw tabular samples; dict: the output of TransTabFeatureExtractor.

  • y (pd.Series) – the corresponding labels for each sample in x. if label is given, the model will return the classification loss by self.loss_fn.

Returns

  • logits (None) – this CL model does NOT return logits.

  • loss (torch.Tensor) – the supervised or self-supervised VPCL loss.

self_supervised_contrastive_loss(features)[source]

Compute the self-supervised VPCL loss.

Parameters

features (torch.Tensor) – the encoded features of multiple partitions of input tables, with shape (bs, n_partition, proj_dim).

Returns

loss – the computed self-supervised VPCL loss.

Return type

torch.Tensor

supervised_contrastive_loss(features, labels)[source]

Compute the supervised VPCL loss.

Parameters
  • features (torch.Tensor) – the encoded features of multiple partitions of input tables, with shape (bs, n_partition, proj_dim).

  • labels (torch.Tensor) – the class labels to be used for building positive/negative pairs in VPCL.

Returns

loss – the computed VPCL loss.

Return type

torch.Tensor