Cerebras AI home pagelight logodark logo
  • Contact Us
  • Contact Us
Concepts & Guides
Release Notes
  • Community
  • API Reference
  • Getting Started
    • Get Started with Cerebras
    • Setup and Installation
    • Pre Train Your First Model
    • Fine Tune Your First Model
    • Current Release Highlights
    Concepts
    • Cerebras Wafer Scale Cluster
    • Weight Streaming Execution
    Model Zoo
    • Model Zoo Overview
    • Model Zoo CLI Overview
    • Trainer Overview
    • Trainer Configuration Overview
    • Core Workflows
    • Tutorials
    • Components
    • Migration
    CS Torch
    • Writing a Custom Training Loop
    Cluster Monitoring
    • Cerebras Job Scheduling and Monitoring
    • CLI for Job Monitoring
    • Job Priority
    • Cluster Monitoring With Grafana
    Fundamentals
    • Launch Your Job
    • Kernel Autogeneration with Autogen
    • Define Environment Variables For Input Workers
    • Import User Dependencies In Cerebras
    • Special Considerations For Cv Dataloaders
    • Measure Throughput of Your Model
    • Managing Cluster Access Controls
    Support
    • Previous Releases
    • Troubleshooting
      • Troubleshooting
      • Cannot Load Cerebras Checkpoints in GPUs
      • Custom Pt Training Script Spawns Multiple Compile Jobs
      • Loss Compilation Issues With Autogen
      • Error Parsing Metadata
      • Error Receiving Activation
      • Failed Mount Directory During Execution
      • Failing To Automatically Load Checkpoints
      • Failure To Trace Due To Functionalization Error
      • Input Starvation
      • Out Of Memory Errors And System Resources
      • Model Is Too Large To Fit In Memory
      • Modulenotfounderror
      • Numerical Issues
      • Throughput Spike After Saving Checkpoints
      • Training Fails When Logged In As Root
      • Vocabulary Size Troubleshooting
    • Glossary
    Troubleshooting

    Troubleshooting

    • Cannot load Cerebras checkpoints in GPUs
      • Work around
    • Custom PT training script spawns multiple compile jobs
      • Observed Error
      • Explanation
      • Work around
    • Loss compilation issues with Autogen
      • Custom loss functions with AutoGen
      • Improving loss function performance
    • Error parsing metadata
      • Observed Error
      • Explanation
      • Work around
    • Error Receiving Activation
      • cerebras.appliance.errors.ApplianceUnknownError: Ran into error while receiving activation tensor <custom-call …>
    • Failed mount directory during execution
      • Observed Error
      • Work around
    • Failing to automatically load checkpoints
      • Explanation
      • Work around
    • Failure to trace due to functionalization error
      • Observed Error
      • Explanation
      • Work around
    • Input Starvation
    • Out of memory errors and system resources
      • Determining if your job is queued
      • Determining if job failed because of an OOM error
      • Determining if job failed because of system could not fit requested memory
      • Troubleshooting OOM errors
    • Model is too large to fit in memory
      • Observed Error
      • Causes and Possible Solutions
    • ModuleNotFoundError
      • ModuleNotFoundError: No module named <’_bz2’, ‘_sqlite3’>
      • ModuleNotFoundError: No module named <…>
    • Numerical issues
      • Observed Error
      • Explanation
      • Work around
    • Throughput spike after saving checkpoints
    • Training fails when logged-in as root
      • Observed Error
      • Explanation
    • Vocabulary Size Troubleshooting
      • Large vocabulary size
      • Small vocabulary size
    Previous ReleasesCannot Load Cerebras Checkpoints in GPUs
    discordgithublinkedinyoutube
    Assistant
    Responses are generated using AI and may contain mistakes.