Troubleshooting
Training Fails When Logged In As Root
Getting Started
Model Zoo
- Model Zoo Overview
- Model Zoo CLI Overview
- Trainer Overview
- Trainer Configuration Overview
- Core Workflows
- Tutorials
- Components
- Migration
CS Torch
- Writing a Custom Training Loop
Cluster Monitoring
Fundamentals
Support
- Previous Releases
- Troubleshooting
- Troubleshooting
- Cannot Load Cerebras Checkpoints In Gpus
- Custom Pt Training Script Spawns Multiple Compile Jobs
- Loss Compilation Issues With Autogen
- Error Parsing Metadata
- Error Receiving Activation
- Failed Mount Directory During Execution
- Failing To Automatically Load Checkpoints
- Failure To Trace Due To Functionalization Error
- Input Starvation
- Out Of Memory Errors And System Resources
- Model Is Too Large To Fit In Memory
- Modulenotfounderror
- Numerical Issues
- Throughput Spike After Saving Checkpoints
- Training Fails When Logged In As Root
- Vocabulary Size Troubleshooting
- Glossary
Troubleshooting
Training Fails When Logged In As Root
Observed Error
2023-05-04 10:15:40,465 ERROR: Uncaught exception:
Traceback (most recent call last):
...
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "User{username='root',UID=0} is not allowed to execute this method /cluster.cluster_mgmt_pb.ClusterManagement/InitCompileJob. Please use a non-root user."
Explanation
Running as root is considered a big security concern. Therefore, the root user is disallowed to run jobs in the Cerebras cluster.
On this page