Logging
The Trainer
class features an extendable logging mechanism that can be used to log metrics to various backends.
On this page, you will learn about how to set up logging to the console via the Logging
class as well as how to add Logger
classes to the Trainer
as well.
Prerequisites
Please ensure that you have read through the Cerebras Model Zoo Trainer Overview beforehand. The rest of this page assumes that you already have at least a cursory understanding of what the Cerebras Model Zoo Trainer is and how to use the python API.
Also, make sure that you’ve read through Customizing the Trainer with Callbacks as this page will assume that you are familiar with the Callback
mechanism.
Logging to Console
The Trainer
exposes a logger
attr which returns a Python logger object which can be used to log various messages to the console with different levels.
For example,
The logger
can be configured by passing in a Logging
object to the Trainer
’s constructor.
In the above example, the logger
has been configured to print INFO
messages to the console by default.
See Control Logging Frequency for an explanation of the log_steps
argument.
Logging Metrics
The way to log metrics using the Trainer
is to construct and pass in Logger
subclasses.
Included out-of-the-box are
-
ProgressLogger
: Logs progress metrics to the console -
TensorBoardLogger
: Logs metrics to a TensorBoard event file.
These Logger
subclasses can be constructed and passed into the trainer via the loggers
argument:
With these loggers, you can now call trainer.log_metrics
to log some metric to all loggers.
In the above example, the loss
is being logged to the TensorBoardLogger
at the current global step.
Logging Name Scope
The trainer also features a name_scope
mechanism for logging which is intended to be used to group related logs together.
In the above example, the metrics get recorded in the log as train/loss
and train/accuracy
.
Control Logging Frequency
It is often the case in very long runs that logging metrics every step is undesirably verbose. To remedy this, you can specify log_steps
to the Logging
class.
In the above example, the trainer is configured to log metrics every 10 steps. This means that even if log_metrics
is called every step, only every 10 steps does the metric actually get logged.
To query whether or not current step is a log step, you can call trainer.is_log_step
.
Writing a Custom Logger
Now that you know all about the Logger
class and how it’s integrated into the Trainer
class, it is fairly straightforward to write your own custom loggers.
To write your own custom Logger class, all you need to do is inherit from the base Logger
class and override the following methods:
-
log_metrics
: Logs the provided metrics at the provided step. -
flush
: Flushes the logs
For example, let’s implement a simple logger that just logs the metrics to console
That is all there is to it. This logger can now be used inside the Trainer
as follows:
In order for the callback class to exist in the Python global namespace, the Python interpreter must have seen it at some point. Implementing your custom logger in the run.py
or in the same file as the model class are two ways to ensure that the logger is seen by the Python interpreter and loaded into the Python global namespace.
Conclusion
By this point, you should have a cursory understanding of how Loggers are integrated into the Trainer. There are a few useful loggers that come pre-packaged inside the Model Zoo. If there is someplace you wish to write logs to that is not covered, you should be confortable with writing your own to implement that logging functionality.