Observed Error

Custom PyTorch training/evaluation script spawns multiple compile jobs (or custom PyTorch script recursively executing itself in infinite loop).

Explanation

The main reason why this happens is that the Python script is not guarded with an if __name__ == “__main__” section. In various places during execution, subprocesses are spun off (e.g., weight transfer, creating surrogate jobs, etc.) which could lead to the whole module being executed.

Work around

Add an if __name__ == “__main__” to your Python script.