embeddings - LEPISZCZE

We recommend to read our NeurIPS paper (Augustyniak et al. 2022) where you can find our lessons learned from the process of designing and compiling LEPISZCZE benchmark.

from pathlib import Path

from embeddings.config.lightning_config import LightningBasicConfig
from embeddings.pipeline.lightning_classification import LightningClassificationPipeline

We will start with training a text classifier using embeddings.pipeline.lightning_classification.LightningClassificationPipeline

LightningClassificationPipeline

 LightningClassificationPipeline
                                  (embedding_name_or_path:Union[str,pathli
                                  b.Path], dataset_name_or_path:Union[str,
                                  pathlib.Path], input_column_name:Union[s
                                  tr,Sequence[str]],
                                  target_column_name:str,
                                  output_path:Union[str,pathlib.Path], eva
                                  luation_filename:str='evaluation.json', 
                                  config:Union[embeddings.config.lightning
                                  _config.LightningBasicConfig,embeddings.
                                  config.lightning_config.LightningAdvance
                                  dConfig]=LightningBasicConfig(use_schedu
                                  ler=True, optimizer='Adam',
                                  warmup_steps=100, learning_rate=0.0001,
                                  adam_epsilon=1e-08, weight_decay=0.0,
                                  finetune_last_n_layers=-1,
                                  classifier_dropout=None,
                                  max_seq_length=None, batch_size=32,
                                  max_epochs=None,
                                  early_stopping_monitor='val/Loss',
                                  early_stopping_mode='min',
                                  early_stopping_patience=3,
                                  tokenizer_kwargs={},
                                  batch_encoding_kwargs={},
                                  dataloader_kwargs={}), devices:Union[Lis
                                  t[int],str,int,NoneType]='auto', acceler
                                  ator:Union[str,pytorch_lightning.acceler
                                  ators.accelerator.Accelerator,NoneType]=
                                  'auto', logging_config:embeddings.utils.
                                  loggers.LightningLoggingConfig=Lightning
                                  LoggingConfig(output_path='.',
                                  loggers_names=[],
                                  tracking_project_name=None,
                                  wandb_entity=None,
                                  wandb_logger_kwargs={}, loggers=None), t
                                  okenizer_name_or_path:Union[pathlib.Path
                                  ,str,NoneType]=None, predict_subset:embe
                                  ddings.data.dataset.LightingDataModuleSu
                                  bset=<LightingDataModuleSubset.TEST:
                                  'test'>, load_dataset_kwargs:Optional[Di
                                  ct[str,Any]]=None, model_checkpoint_kwar
                                  gs:Optional[Dict[str,Any]]=None, compile
                                  _model_kwargs:Optional[Dict[str,Any]]=No
                                  ne)

Helper class that provides a standard way to create an ABC using inheritance.

We want to store submission data in a specific directory.

LEPISZCZE_SUBMISSIONS = Path("../lepiszcze-submissions")
LEPISZCZE_SUBMISSIONS.mkdir(exist_ok=True, parents=True)

Then we create a pipeline object. We will use LightningClassificationPipeline with dataset related to sentiment analysis and a very small transfomer model.

We want only run training for testing purposes, hence it would be good no to generate to much greenhouse gases, hence we narrow max epochs to only 1. In the real traning code it would be good to customize traning procedure with more configuration.

config = LightningBasicConfig(max_epochs=1)

pipeline = LightningClassificationPipeline(
    dataset_name_or_path="clarin-pl/polemo2-official",
    embedding_name_or_path="hf-internal-testing/tiny-albert",
    input_column_name="text",
    target_column_name="target",
    output_path=".",
    devices="auto",
    accelerator="cpu",
    config=config
)

No config specified, defaulting to: polemo2-official/all_text
Found cached dataset polemo2-official (/root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70)
100%|██████████| 3/3 [00:00<00:00, 625.58it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70/cache-2e61085076a665b0.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70/cache-ac057aeafd577fd0.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70/cache-502164b331496757.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70/cache-13cbbe9129f685fa.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70/cache-b1c5d1c8fe129da7.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70/cache-1f1e81ef3032c906.arrow

It took a couple of seconds but finally we have a pipeline objects ready and we need only run it.

results = pipeline.run()

Some weights of the model checkpoint at hf-internal-testing/tiny-albert were not used when initializing AlbertForSequenceClassification: ['predictions.decoder.bias', 'predictions.decoder.weight', 'predictions.LayerNorm.bias', 'predictions.LayerNorm.weight', 'predictions.dense.bias', 'predictions.bias', 'predictions.dense.weight']
- This IS expected if you are initializing AlbertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of AlbertForSequenceClassification were not initialized from the model checkpoint at hf-internal-testing/tiny-albert and are newly initialized: ['classifier.weight', 'albert.pooler.bias', 'classifier.bias', 'albert.pooler.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
No config specified, defaulting to: polemo2-official/all_text
Found cached dataset polemo2-official (/root/.cache/huggingface/datasets/clarin-pl___polemo2-official/all_text/0.0.0/2b75fdbe5def97538e81fb120f8752744b50729a4ce09bd75132bfc863a2fd70)
100%|██████████| 3/3 [00:00<00:00, 663.31it/s]
GPU available: True, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
/opt/conda/envs/embeddings/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1579: UserWarning: GPU available but not used. Set the gpus flag in your trainer `Trainer(gpus=1)` or script `--gpus=1`.
  rank_zero_warn(

  | Name          | Type                            | Params
------------------------------------------------------------------
0 | model         | AlbertForSequenceClassification | 352 K 
1 | metrics       | MetricCollection                | 0     
2 | train_metrics | MetricCollection                | 0     
3 | val_metrics   | MetricCollection                | 0     
4 | test_metrics  | MetricCollection                | 0     
------------------------------------------------------------------
352 K     Trainable params
0         Non-trainable params
352 K     Total params
1.410     Total estimated model params size (MB)
/opt/conda/envs/embeddings/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:623: UserWarning: Checkpoint directory /app/nbs/lepiszcze/checkpoints exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
/opt/conda/envs/embeddings/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/opt/conda/envs/embeddings/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/opt/conda/envs/embeddings/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Restoring states from the checkpoint path at /app/nbs/lepiszcze/checkpoints/epoch=0-step=205.ckpt
Loaded model weights from checkpoint at /app/nbs/lepiszcze/checkpoints/epoch=0-step=205.ckpt
/opt/conda/envs/embeddings/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, predict_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/opt/conda/envs/embeddings/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/envs/embeddings/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/envs/embeddings/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

                                                              Epoch 0: 100%|██████████| 232/232 [00:37<00:00,  6.26it/s, loss=1.35, v_num=, train/BaseLR=0.000, train/LambdaLR=0.000, val/MulticlassAccuracy=0.369, val/MulticlassPrecision=0.0923, val/MulticlassRecall=0.250, val/MulticlassF1Score=0.135]
Testing:  92%|█████████▏| 24/26 [00:00<00:00, 34.68it/s]--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test/Loss': 1.341328501701355,
 'test/MulticlassAccuracy': 0.4134146273136139,
 'test/MulticlassF1Score': 0.1462467610836029,
 'test/MulticlassPrecision': 0.10335365682840347,
 'test/MulticlassRecall': 0.25}
--------------------------------------------------------------------------------
Testing: 100%|██████████| 26/26 [00:00<00:00, 34.61it/s]
Predicting: 206it [00:00, ?it/s]

As we trained the model only for 1 epoch, the metrics are not too high and they are rather presented to show that the pipeline works.

results.metrics

{'accuracy': 0.41341463414634144,
 'f1_macro': 0.1462467644521139,
 'f1_micro': 0.41341463414634144,
 'f1_weighted': 0.2418422104842274,
 'recall_macro': 0.25,
 'recall_micro': 0.41341463414634144,
 'recall_weighted': 0.41341463414634144,
 'precision_macro': 0.10335365853658536,
 'precision_micro': 0.41341463414634144,
 'precision_weighted': 0.17091165972635333,
 'classes': {0: {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'support': 118},
  1: {'precision': 0.41341463414634144,
   'recall': 1.0,
   'f1': 0.5849870578084556,
   'support': 339},
  2: {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'support': 227},
  3: {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'support': 136}}}

References

Augustyniak, Lukasz, Kamil Tagowski, Albert Sawczyn, Denis Janiak, Roman Bartusiak, Adrian Dominik Szymczak, Arkadiusz Janz, et al. 2022. “This Is the Way: Designing and Compiling LEPISZCZE, a Comprehensive NLP Benchmark for Polish.” In Thirty-Sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=CZAd_6uiUx0.