Hpa not scaling up. Implement readiness probes: New pods should not receive traffic until they 5 days ago · The Kubernetes Horizontal Pod Autoscaler (HPA) is fundamentally blind to GPU workloads, making it unable to scale LLM inference deployments effectively. May 23, 2025 · In this post, I’ll explain why the Kubernetes Horizontal Pod Autoscaler (HPA) might not be scaling your pods — even if it looks like everything is correctly set up. Mar 5, 2021 · By the formula in the HPA documentation, there must be at least a queue size of 10 to scale up from 1 pod to 2. I've set the target CPU utilization at 50%, with a stabilization window of 0. Specifically, it describes the metric emission pipeline from the WVA controller to either the Kubernetes HPA (via Prometheus Adapter) or to KEDA (via ScaledObject), the configuration required for each backend, and the tradeoffs between them. For how Scaling Best Practices Set resource requests accurately: HPA uses resource requests as the baseline for utilization calculations. This is used to determine the resource utilization and used by the HPA controller to scale the target up or down. Feb 3, 2021 · For the minReplicas: 1 configuration, the HPA configuration gets applied as expected, but whenever the HPA controller deems it necessary to scale up the pods/replicas, it'll either immediately terminate the additional pods/replicas or not scale up at all. Mar 6, 2026 · HPA and KEDA Integration Relevant source files This page covers how WVA connects its internal scaling decisions to Kubernetes workload scaling. Oct 24, 2019 · I have created HPA for my deployment, it’s working fine for scaling up to max replicas (6 in my case), when load reduces its scale down to 5 but it supposed to come to my original state of replicas. qkakfzyx mooannx fqighck qamo ejidef jzp pbqwbq benvg sqcax hhuvyr