Writing a Custom Training Loop
Profiling the Executor
We provide tools through the DataExecutor
to profile its performance during the run. Currently, the supported activities can be profiled as follows:
Activity | Description |
---|---|
total_samples | Total number of samples processed so far |
total_time | Elapsed time so far, in seconds |
rate | Client side smoothed samples/second of all the samples added since last queried |
global_rate | Non-smoothed samples/second since the beginning of when the executor context was entered. For a more detailed explanation see Measure throughput of your model |
samples_per_sec | Non-smoothed samples/second since the beginning of when the executor context was entered. This value is the same as global_rate |
flops_utilization | Real flops utilization for the run |
You can track activity performance using names and the DataExecutor
profiler.
For example: