146. Longhorn 节点维护指南
本指令描述了如何处理计划中的节点维护。Resolution 结局These are the steps needed to shut down all volumes without having to modify the scale of the end user/application deployments in the cluster.这些步骤是关闭所有卷所需的步骤而无需修改集群中终端用户/应用部署的规模。Cordon the node.Longhorn will automatically disable node scheduling when a Kubernetes node is cordoned.封锁节点。当 Kubernetes 节点被围困时Longhorn 会自动禁用节点调度。Drain the nodeto move the workload to another node. You will need to use any of the following documentation for the respective commands, depending on the version:排空节点以将工作负载转移到另一个节点。根据版本不同你需要使用以下任一文档来描述相应的命令For Longhorn before 1.4.x (doc)关于 1.4.x 之前的 Longhorn 文档For Longhorn 1.4.x (doc) - the drain command has been simplified对于 Longhorn 1.4.x 文档 ——排水命令已被简化This will evict the workloads from the draining node.这样可以将工作负载从排水节点中逐出。The replica processes on the node will be stopped at this stage. Replicas on the node will be shown asFailed.节点上的副本进程在此阶段会被停止。节点上的副本将显示为失败。Note: By default, if there is one last healthy replica for a volume on the node, Longhorn will prevent the node from completing the drain operation, to protect the last replica and prevent the disruption of the workload. You can either override the behavior in the setting, or evict the replica to other nodes before draining.注意默认情况下如果节点上某个卷还有最后一个健康副本Longhorn 会阻止该节点完成排水操作以保护最后一个副本并防止工作负载中断。你可以在设置中覆盖该行为或者在抽空前将副本驱逐到其他节点。The engine processes on the node will be migrated with the Pod to other nodes.节点上的引擎进程会随 Pod 迁移到其他节点。Note: If there are Longhorn volumes that are manually attached by Longhorn UI on the node, Longhorn will prevent the node from completing the drain operation, please detach these volume using Longhorn UI.注意如果节点上有 Longhorn 用户界面手动附加的 Longhorn 卷Longhorn 会阻止节点完成排水操作请使用 Longhorn UI 分离这些卷。After thedrainis completed, there should be no engine or replica process running on the node. Two instance managers will still be running on the node, but they’re stateless and won’t cause interruption to the existing workload.清理完成后节点上不应运行任何引擎或副本进程。节点上仍会运行两个实例管理器但它们是无状态的不会对现有工作负载造成中断。Perform the necessary maintenance, including shutting down or rebooting the node.进行必要的维护包括关闭或重启节点。Uncordon the node.Longhorn will automatically re-enable the node scheduling.解开节点。Longhorn 会自动重新启用节点调度功能。Important note 1:Always refer to the respective product documentation for clear steps and details. The KB content is only a pointer to the main aspects of the current issue being addressed.重要提示 1请务必参考相关产品文档了解清晰的步骤和细节。知识库内容只是指向当前问题的主要方面。Important note 2:Always refer to the documentation of the specific product version. Many features and functions have changed, so for the most accurate information, you have to ensure youre looking into the correct version documentation.重要提示 2请务必参考具体产品版本的文档。许多功能和功能都发生了变化因此为了获得最准确的信息你必须确保你查阅的是正确的版本文档。Cause 病因Additional Information 附加信息Documentation about Cordoning and Draining a Node:关于节点的护线和排空文档Cordoning a Node - SUSE Rancher documentation节点连接 - SUSE Rancher 文档Draining a Node - SUSE Rancher documentation排空节点 - SUSE Rancher 文档Safely Drain a Node - Kubernetes documentation安全清理节点 - Kubernetes 文档Node Maintenance - Longhorn节点维护 - LonghornNode Drain Policy recommendations - Longhorn节点排水策略建议 - LonghornGitHub issue with more context: https://github.com/longhorn/longhorn/issues/3304GitHub 问题及更多背景https://github.com/longhorn/longhorn/issues/3304Environment 环境SUSE Storage 1.4 SUSE 存储 1.4访问Rancher-K8S解决方案博主企业合作伙伴