TensorBoardLogger
which is required for summaries to be written.
summarize_scalar
API which allows to summarize scalar model tensors. These summaries are written to Tensorboard events files and can be visualized using Tensorboard.
cerebras.modelzoo.trainer
package. To summarize a scalar tensor S
, add the following statement to the model definition code:
S
will be periodically written to the Tensorboard events file and can be visualized in TensorBoard.
Trainer
is not configured with a TensorBoardLogger callback, this method is a no-op and no summaries will be written.summarize_scalar
API, but for summarizing tensors of arbitrary shapes.
cerebras.modelzoo.trainer
package. To summarize a tensor T
, add the following statement to the model definition code:
SummaryReader
API (see below).
Here’s a simple example where we’d like to summarize the input features and last layer’s logits of a fully connected network:
SummaryReader
API which supports listing all available tensor names and fetching a tensor by name for a given step. SummaryReader
object takes as input a single argument denoting the path to a Tensorboard events file or a directory containing Tensorboard events files. Location of tensor summaries are inferred from these events files as there is a one-to-one mapping from Tensorboard events files and tensor summary directories.
In the example above, we added summaries for features
and last_layer_logits
. We can then use the SummaryReader
API to load the summarized values of these tensors at a given step:
SummaryReader.read_tensor()
returns one or more TensorDescriptor
objects. TensorDescriptor
is a POD structure which holds:
step
: The step at which this tensor was summarized.
utctime
: The UTC time at which the value was saved.
tensor
: The summarized value.
SummaryWriter
and DataExecutor
classes, along with the summarize_scalar
and summarize_tensor
APIs, users can effectively log and monitor key metrics and tensor values throughout the training lifecycle. These summaries not only aid in the immediate analysis and debugging of models but also facilitate long-term monitoring and evaluation of model performance over time. With the added functionality to retrieve and inspect these summarized values post-run, practitioners are equipped with a comprehensive toolset to optimize and refine their models, enhancing the overall efficiency and effectiveness of their training workflows in PyTorch.