Grafana Dashboards
Use Grafana to monitor your Cerebras cluster.
Cerebras provides multiple Grafana dashboards to help you monitor your cluster.
Depending on your role, you may see some or all of the following dashboards described here.
Cluster Admin Dashboards
These dashboards are intended for administrators and cluster operators.
You’ll see a cluster-wide view of available resources (CSx systems, nodes, switches, links) and their health. Additionally, these dashboards highlight cluster problems at a glance, categorizing them as errors (requiring immediate attention) or warnings (potential issues).
Click on resource links to view a resources’ respective dashboard with associated metrics.
Session Admin Dashboards
These dashboards are intended for session owners who need to monitor the utilization and health of a session’s resources and job statuses.
You’ll see a session-specific view of the job queue, available resources (CSx systems, nodes, switches, links), and their health.
Click on resource links to navigate to specific Resource Details dashboards or Job Admin dashboards to access detailed metrics and job status info.
Job Admin Dashboards
These dashboards are intended for users who want to monitor the status of a job and troubleshoot, if needed.
You’ll see a single job in the dashboard with metrics on system utilization and the resources in use. The Job Debug dashboard highlights potential causes of failures and the Job Network Debug dashboard drills down on network resources in use by the job.
Click on resource links to navigate to Resource Details dashboards and their associated metrics.
Resource Details Dashboards
These dashboards are intended for administrators and cluster operators who want to view specific cluster resources and the ports/links that connect them.
You’ll see detailed metrics for cluster resources including CSx systems, nodes, servers, and connecting ports. The metrics cover CPU utilization, memory availability, thermal readings, network performance, link states, and device error counters.
Access Grafana
Ensure you have access to the user node in the Cerebras Wafer-Scale cluster. If you encounter system configuration issues, contact Cerebras Support.
Set Up Port Forwarding
Run the following command to start a port-forwarding SSH session through the user node from your machine:
This command forwards traffic through local port 8443
. You can use any unoccupied port on your machine.
Get Access Credentials
- Ask your system administrator to set up the Grafana database. URLs follow this format:
grafana.<CLUSTER-NAME>.<DOMAIN_NAME>
. For example:grafana.mb-systemf102.cerebras.com
. - Obtain authentication credentials (username and password) from your system administrator.
Add the Grafana TLS Certificate
The Grafana TLS certificate is located at /opt/cerebras/certs/grafana_tls.crt
on the user node. This certificate is copied during the user node installation.
Download it to your local machine and add it to your browser’s keychain.
On Chrome with MacOS
- Open Preferences > Privacy and Security > Security > Manage Certificates.
- Add
grafana-tls.crt
to the System keychain and set it to Always Trust.
Update the Hosts File
Edit your local machine’s /etc/hosts
file to point the user node’s IP address to Grafana:
Open Grafana
Navigate to the Grafana dashboards using the following URL: