build_contrastive_learner

transtab.build_contrastive_learner(categorical_columns=None, numerical_columns=None, binary_columns=None, projection_dim=128, num_partition=3, overlap_ratio=0.5, supervised=True, hidden_dim=128, num_layer=2, num_attention_head=8, hidden_dropout_prob=0, ffn_dim=256, activation='relu', device='cuda:0', checkpoint=None, ignore_duplicate_cols=True, **kwargs)[source]

Build a contrastive learner for pretraining based on TransTab. If no cat/num/bin specified, the model takes ALL as categorical columns, which may undermine the performance significantly.

If there is one column assigned to more than one type, e.g., the feature age is both nominated as categorical and binary columns, the model will raise errors. set ignore_duplicate_cols=True to avoid this error as the model will ignore this duplicate feature.

Parameters
  • categorical_columns (list) – a list of categorical feature names.

  • numerical_columns (list) – a list of numerical feature names.

  • binary_columns (list) – a list of binary feature names, accept binary indicators like (yes,no); (true,false); (0,1).

  • feature_extractor (TransTabFeatureExtractor) – a feature extractor to tokenize the input tables. if not passed the model will build itself.

  • hidden_dim (int) – the dimension of hidden embeddings.

  • num_layer (int) – the number of transformer layers used in the encoder.

  • num_attention_head (int) – the numebr of heads of multihead self-attention layer in the transformers.

  • hidden_dropout_prob (float) – the dropout ratio in the transformer encoder.

  • ffn_dim (int) – the dimension of feed-forward layer in the transformer layer.

  • projection_dim (int) – the dimension of projection head on the top of encoder.

  • overlap_ratio (float) – the overlap ratio of columns of different partitions when doing subsetting.

  • num_partition (int) – the number of partitions made for vertical-partition contrastive learning.

  • supervised (bool) – whether or not to take supervised VPCL, otherwise take self-supervised VPCL.

  • temperature (float) – temperature used to compute logits for contrastive learning.

  • base_temperature (float) – base temperature used to normalize the temperature.

  • activation (str) – the name of used activation functions, support "relu", "gelu", "selu", "leakyrelu".

  • device (str) – the device, "cpu" or "cuda:0".

  • checkpoint (str) – the directory of the pretrained transtab model.

  • ignore_duplicate_cols (bool) – if there is one column assigned to more than one type, e.g., the feature age is both nominated as categorical and binary columns, the model will raise errors. set True to avoid this error as the model will ignore this duplicate feature.

Returns

Return type

A TransTabForCL model.