The Trainer class features an extendable logging mechanism that can be used to log metrics to various backends.

On this page, you will learn about how to set up logging to the console via the Logging class as well as how to add Logger classes to the Trainer as well.

Prerequisites

Please ensure that you have read through the Cerebras Model Zoo Trainer Overview beforehand. The rest of this page assumes that you already have at least a cursory understanding of what the Cerebras Model Zoo Trainer is and how to use the python API.

Also, make sure that you’ve read through Customizing the Trainer with Callbacks as this page will assume that you are familiar with the Callback mechanism.

Logging to Console

The Trainer exposes a logger attr which returns a Python logger object which can be used to log various messages to the console with different levels.

For example,

from cerebras.modelzoo import Trainer

trainer = Trainer(...)

trainer.loggers.info("This is an INFO message")
trainer.loggers.debug("This is a DEBUG message")
trainer.loggers.warning("This is a WARNING message")
trainer.loggers.error("This is a ERROR message")

The logger can be configured by passing in a Logging object to the Trainer’s constructor.

In the above example, the logger has been configured to print INFO messages to the console by default.

See Control Logging Frequency for an explanation of the log_steps argument.

Logging Metrics

The way to log metrics using the Trainer is to construct and pass in Logger subclasses.

Included out-of-the-box are

These Logger subclasses can be constructed and passed into the trainer via the loggers argument:

With these loggers, you can now call trainer.log_metrics to log some metric to all loggers.

from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.loggers import TensorBoardLogger

trainer = Trainer(
    ...,
    loggers=[TensorBoardLogger()]
    ...,
)

trainer.log_metrics(loss=...)

In the above example, the loss is being logged to the TensorBoardLogger at the current global step.

Logging Name Scope

The trainer also features a name_scope mechanism for logging which is intended to be used to group related logs together.

from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.loggers import TensorBoardLogger

trainer = Trainer(
    ...,
    loggers=[TensorBoardLogger()]
    ...,
)

with trainer.name_scope("train"):
    trainer.log_metrics(loss=...)
    trainer.log_metrics(accuracy=...)

In the above example, the metrics get recorded in the log as train/loss and train/accuracy.

Control Logging Frequency

It is often the case in very long runs that logging metrics every step is undesirably verbose. To remedy this, you can specify log_steps to the Logging class.

In the above example, the trainer is configured to log metrics every 10 steps. This means that even if log_metrics is called every step, only every 10 steps does the metric actually get logged.

To query whether or not current step is a log step, you can call trainer.is_log_step.

Writing a Custom Logger

Now that you know all about the Logger class and how it’s integrated into the Trainer class, it is fairly straightforward to write your own custom loggers.

To write your own custom Logger class, all you need to do is inherit from the base Logger class and override the following methods:

  • log_metrics: Logs the provided metrics at the provided step.

  • flush: Flushes the logs

For example, let’s implement a simple logger that just logs the metrics to console

from cerebras.modelzoo.trainer.logger import Logger

class ConsoleLogger(Logger):
    def setup(self, trainer):
        self.trainer = trainer

    def flush(self):
        for handler in self.trainer.loggers.handlers:
            handler.flush()

    def log_metrics(self, metrics, step):
        for name, value in metrics.items():
            self.trainer.loggers.info(
                f"Step={step}, {name}={value}
            )

Note

All Logger instances inherit from Callback. This means that loggers may override any of the hooks that are exposed via the Callback mechanism too.

That is all there is to it. This logger can now be used inside the Trainer as follows:

In order for the callback class to exist in the Python global namespace, the Python interpreter must have seen it at some point. Implementing your custom logger in the run.py or in the same file as the model class are two ways to ensure that the logger is seen by the Python interpreter and loaded into the Python global namespace.

Conclusion

By this point, you should have a cursory understanding of how Loggers are integrated into the Trainer. There are a few useful loggers that come pre-packaged inside the Model Zoo. If there is someplace you wish to write logs to that is not covered, you should be confortable with writing your own to implement that logging functionality.