Removing an Offline Node from a Proxmox VE Cluster
Affected Components
Section titled “Affected Components”- Proxmox VE (All versions with clustering)
Summary
Section titled “Summary”A Proxmox VE node has gone permanently offline and must be removed from the cluster to maintain a healthy cluster state. Standard removal commands may leave behind stale entries, causing the node to persist in the UI with errors.
Symptoms
Section titled “Symptoms”- An unresponsive node is marked as “down” or has a red ‘X’ in the Proxmox web interface.
pvecm status
shows the node as offline.- After attempting removal, the node may still appear in the UI, sometimes with a
pve-ssl.pem' does not exist! (500)
error.
When a node goes offline due to hardware failure or other issues, it must be formally evicted from the cluster. The standard pvecm delnode
command may not always clean up all configuration entries, particularly in the underlying Corosync configuration, leading to a desynchronized state.
Diagnosis
Section titled “Diagnosis”-
Verify Node Status On a healthy cluster node, confirm the target node is offline.
Terminal window pvecm nodes -
Attempt Standard Removal Try the standard node removal command. If this completes without error and the node disappears from the UI, no further action is needed.
Terminal window pvecm delnode <offline_node_name>
Resolution/Workaround
Section titled “Resolution/Workaround”This procedure outlines the forceful removal of a node that is permanently offline.
Prerequisites:
- The offline node must be powered down and will not come back online.
- SSH and root access to a healthy node in the cluster.
Steps to Resolve:
-
Remove the Node from the Cluster From a healthy node, run the
delnode
command.Terminal window pvecm delnode <offline_node_name> -
Clean Up the Node’s Directory The
pvecm
command may leave behind the node’s configuration directory. This must be removed manually to clean up the UI.Terminal window rm -rf /etc/pve/nodes/<offline_node_name> -
Verify Removal Check that the node is no longer listed in the cluster’s node list.
Terminal window pvecm nodesAt this point, also perform a hard refresh of the Proxmox web UI. If the node is gone, the process is complete. If it persists with an error, the Corosync configuration must be manually edited (Advanced Steps).
Advanced Steps (If Node Persists in UI):
-
Stop the Cluster Filesystem on the Local Node This is a prerequisite for manually editing the Corosync configuration.
Terminal window systemctl stop pve-cluster -
Edit the Corosync Configuration Open
/etc/pve/corosync.conf
with a text editor. Backup this file before editing.Terminal window cp /etc/pve/corosync.conf /etc/pve/corosync.conf.baknano /etc/pve/corosync.conf- In the
nodelist
section, delete the entirenode { ... }
block corresponding to the offline node. - In the
quorum
section, decrease theexpected_votes
value by 1.
- In the
-
Start the Cluster Filesystem Restart the service to apply the changes.
Terminal window systemctl start pve-cluster
Important Considerations:
- This procedure is for permanently offline nodes only. If the node might return, it should be properly shut down and brought back online instead.
- Manually editing
corosync.conf
is dangerous. An incorrect edit can break the entire cluster.
References
Section titled “References”Keywords
Section titled “Keywords”proxmox, pve, cluster, remove node, delnode, offline, corosync, pvecm