// product news

Release notes

See what’s new in the product.


  • Private Cluster Optimization & Workload Optimization Improvements

    Workload rightsizing
    Rebalancer
    Node configuration
    Evictor
    • Now, private EKS clusters—those without direct access to the internet—can be fully optimized by CAST AI via private link setup.
    • Support for Kubelet config and Init script has been added to the AKS cluster’s Node configuration.
    • We’ve improved the user experience by collapsing Rebalancing into 2 phases instead of 3. This was done because the deletion of the node now happens as soon as the drain is completed, instead of waiting for all nodes to drain.
    • Evictor and Rebalancer now support well-known safe-to-evict annotations. Check the documentation for more details.
    • One-click migration from Karpenter now supports v0.32+ objects.
    • Self-Service SSO now supports OIDC connections.
    • CAST AI’s node template name is now added as a tag on AWS nodes, a label on GCP VMs, and a tag on Azure’s VM Scaling set.
    • Workload optimization settings now support constraints—the minimum and maximum values for resources, dictating that the workload autoscaler cannot scale CPU/Memory above the max or below the min limits.
    • Introduced an event log for Workload Optimization, providing users with increased visibility into actions taken. Additionally, Workload Optimization now supports Argo rollouts.
    • Added the ability to filter by label to the Security Best Practice report.
    • The Image Security page now displays the image scan status.
    • Minor quality-of-life improvements have been made to make cluster and node list tables easier to use.
  • PrecisionPacker beta, Node OS updates, Organization-level allocation reports, GPU dimension in Cost Monitoring

    Workload rightsizing
    GPU
    Reservations
    PrecisionPacker
    Node configuration
    Security
    Evictor
    Cost monitoring
    Organization
    • PrecisionPacker (also known as Pod Pinner) has been released in beta and is now available in a limited availability capacity. It aims to address the issue of misalignment between the actions of the CAST AI Autoscaler and the Kubernetes cluster scheduler. For example, while the CAST AI Autoscaler efficiently binpacks pods and creates nodes in the cluster in a cost-optimized manner, it is the Kubernetes cluster scheduler that determines the actual placement of pods on nodes. Read more in the documentation. Customers willing to test this functionality should engage the support team.
    • Ubuntu SNAP-based AMIs for EKS are now supported.
    • We have updated the Advanced Evictor targeting logic so pods in the namespace can be targeted without the need to match labels. Check the documentation.
    • AKS users can now choose compatible OS disk type in the node configuration.
    • For GKE clusters we have added the support for Init script and Kubelet configuration. Custom instances with attached GPUs are now also supported.
    • We have released Organization-level Allocation group reporting, where users can create Allocation groups that span multiple clusters in the organization. Read more here.
    • A GPU cost dimension was introduced to all reports in the Cost monitoring suite.
    • Data about allocated and requested GPUs are now presented in the Cluster list and dashboard.
    • The Cluster list, Cluster dashboard, and Cluster efficiency screens will now display the amount of used computing resources in addition to allocated and requested ones.
    • Workload screens in the Workload Optimization menu now show labels and annotations. If a workload is managed via annotations set in the manifest file, it can’t be managed via the Workload optimization interface.
    • We updated the Azure Reserved Instance upload flow to support only the native Azure RI export document format. Support for a custom CAST AI format was removed.
    • A new Security product feature – Node OS updates, has been released. According to best security practices, the node OS should be patched frequently. From now on, users can identify nodes that are out of compliance and set a Rebalancer schedule that will target nodes for replacement based on their age. All details are available here.
    • We introduced a new image scanning status, “Pending,” which indicates that an image is to be scanned, or it might be that kVisor encountered an error and a user needs to take action.
    • To unify the user journey, cluster-level Security best practice and Image security screens were removed. These reports are available from the organizational level menu.
    • We have released a new version of our Terraform provider and updated some Terraform examples. You can find the list of changes here. The provider and modules are already in Terraform’s registry.
    • We have released a new version of the CAST AI agent. You can find the complete list of changes here. To update the agent in your cluster, please follow these steps.
  • Windows node support on AKS, Self-service SSO, GCP resource-based CUDs usable in autoscaling, the launch of CAST AI EU instance

    Reservations
    Autoscaler
    Rebalancer
    Workload rightsizing
    • We have launched support for Windows nodes in AKS clusters. Clusters running Windows 2019, as well as Linux machines, can now be fully managed by CAST AI.

    • Support for GCP custom instances with extended memory settings has been implemented.

    • We have launched a self-service SSO functionality for enterprises using Azure AD and Okta OIDC. More details and a setup guide are available in the documentation.

    • GCP resource-based CUDs can now be uploaded and assigned to clusters via APIs. This is an early release. Customers interested in using this feature should contact the support team.

    • From now on, when CAST AI initiates a drain as part of a spot interruption management event, it will label the node with the key autoscaling.cast.ai/draining, and the value will represent the reason. An additional taint autoscaling.cast.ai/draining=true will also be added. All details can be found in the documentation.

    • The CAST AI EU instance is now live. Customers who require all data to be local in the EU can onboard clusters to the CAST AI EU instance.

    • Topology spread constraints that use the kubernetes.io/hostname key are now supported. A full list of supported labels can be found here.

    • We’ve added an export functionality to the CPU usage report.

    • The cluster dashboard now shows used resources in addition to allocatable and requested resources.

    • Updated the audit log event that is triggered when an instance type or family gets blacklisted. It now includes the reason for blacklisting and the expiration date.

    • We’ve updated the Security Best Practice Report to use CIS Benchmarks for GKE 1.4.

    • We’ve introduced workload autoscaler management by workload annotations. Check the documentation for more details.

    • The workload autoscaler user interface has been updated.

    • We have released a new version of our Terraform provider and updated some Terraform examples. You can find the list of changes here. The provider and modules are already in Terraform’s registry.

    • We have released a new version of the CAST AI agent. You can find the complete list of changes here. To update the agent in your cluster, please follow these steps.

  • One click migration from Karpenter, Workload optimization settings, Security exceptions and more

    Autoscaler
    Security
    Organization
    Workload rightsizing
    • We have implemented a functionality that enables customers to migrate from Karpenter to CAST AI with just a few clicks. CAST AI now recognizes whether Karpenter is installed, its version, and configuration objects (provisioners and AWS node templates). Users can migrate these objects to CAST AI Node templates and Node configurations with a single click. Check the documentation.
    • We have introduced configurable settings to the Workload Optimization feature, allowing users to adjust overhead, recommendation percentile, and set a threshold trigger for applying recommendations to better fit their use case. Refer to the documentation for more details.
    • The available savings report has been enhanced to identify potential savings achievable by right-sizing workloads’ CPU and MEM requests.
    • Improved visibility into pending user invitations: Once a user is invited to an organization, they will appear in the organization’s member list with the status ‘Invite pending.’
    • The Autoscaler now supports bare metal instances on EKS clusters.
    • The Cluster dashboard has been enhanced to differentiate between CAST AI and non-CAST AI provisioned nodes, OS, and overprovisioning levels.
    • A new role, ‘Analyst’, has been introduced to the platform. Users who only need to work with the cost monitoring featureset will be able to view all reports and create Allocation groups without the need for edit Member level access rights.
    • In Cost Monitoring, the Workload Network cost tab now provides granular information about the amount of network traffic associated with the workload, destination workload, and associated costs. Refer to the documentation for more details.
    • In the Security domain, users can now exclude image repositories and resources from scanning for vulnerabilities or against the best practice framework.
    • Starting now, an outdated CAST AI agent version (earlier than v0.49.1) will be detected, and users will be informed before attempting to use advanced evictor configuration.
    • Workloads that have at least one pod on a node are now visible in the NodeList’s detailed node view.
    • We have released a new version of our Terraform provider and updated some Terraform examples. You can find the list of changes here. The provider and modules are already in Terraform’s registry.
    • We have released a new version of the CAST AI agent. You can find the complete list of changes here. To update the agent in your cluster, please follow these steps.
  • Workload Autoscaler Enters Beta

    Workload rightsizing
    • We are excited to announce that Workload Autoscaler is now available in Beta mode. This innovative feature automatically scales your workload requests up or down based on the demand, ensuring optimal performance and cost-effectiveness. Join our early adopters today and experience the benefits of intelligent autoscaling firsthand. For more details of Workload Autoscaler please refer to the documentation.
  • New Image Security Featureset

    Metrics
    Security
    Note templates
    Cost monitoring
    • We are thrilled to introduce our new Image Security featureset. This feature empowers you to monitor all images running within your clusters, identify problematic ones, obtain detailed information about vulnerabilities, and prioritize your tasks. You can assess vulnerabilities within your organization or limit the assessment to your team’s scope. To experience Image Security in action, simply install CAST AI kVisor (Security Agent) on your cluster. For more details, please refer to the documentation.
    • When handling AWS rebalancing recommendations for spot nodes, CAST AI will now greylist only the impacted zone, rather than affecting all zones in the cluster.
    • We have added new scrapable node and workload level Prometheus metrics. Please consult the documentation for detailed guidance on how to use them.
    • In cases where instances are unavailable in the cloud, the Rebalancer will dynamically replace planned instances with the best alternatives. As a result, the rebalancing completion screen will now display actually provisioned instances, planned instances, and achieved savings. Previously, only planned instances were shown.
    • The Network Cost feature is now available for AKS clusters. Additionally, you can use Cost Allocation groups to track network costs in addition to compute costs.
    • Daily vulnerability report is now available in the Notifications view.
    • The Notifications webhook can now be utilized to send security-related notifications as well.
    • Improved the Node Template’s inventory table to provide information about the availability of GPUs and their respective counts for specific instance types. This enhancement enables users to easily identify and select the instance type that best suits their GPU requirements.
    • Released a new API designed for CAST AI partners. This API is designed to streamline the onboarding experience for both partners and their customers.
    • We have released a new version of our Terraform provider and updated some Terraform examples. You can find the list of changes here. The provider and modules are already in Terraform’s registry.
    • We have released a new version of the CAST AI agent. You can find the complete list of changes here. To update the agent in your cluster, please follow these steps.
  • Autoscaling using ARM nodes in AKS, Graceful node eviction during rebalancing & Enhanced Workload efficiency report

    ARM
    Rebalancer
    Cost monitoring
    Evictor
    Autoscaler
    • Users can now select in the rebalancing setup if they want CAST AI to evict pods gracefully. Graceful eviction means that CAST AI will not forcefully drain nodes that fail to drain in time. Instead, they will get cordoned and annotated so the user can take corrective action and adjust pod disruption budgets. All rebalancing settings can be found here.
    • During rebalancing, CAST AI will delete nodes as soon as they have been drained. Previously, all nodes had to be drained before the node deletion phase would start.
    • Autoscaling using ARM nodes is now supported for AKS customers. The user has to have quotas for ARM nodes and be in the supported Azure region. That’s why we ask users to engage with our support team before enabling this feature.
    • The workload efficiency report has been uplifted and now provides information about funds wasted due to poorly set requests. On top of that, we have added the ability to take patching commands from the console and apply them to your workload, so resources are adjusted based on CAST AI recommendations.
    • GCP node configuration now supports the boot disk type selection. Check the documentation for more details.
    • Node Templates now support the “NoExecute” taint effect.
    • The advanced Evictor configuration now enables more granular protection and targeting of pods. Read more in our documentation.
    • We have further reduced the permissions levels required in the customer’s cloud account to run CAST AI. For more details, refer to our documentation.
    • The cluster dashboard now displays CAST AI autoscaler-generated events to highlight why pods are pending instead of relying on standard Kubernetes events. This change helps pinpoint the exact reason a pod is not scheduled.
    • Bottlerocket images are now supported in EKS clusters. This improvement is available in API and Terraform only for now.
    • We have released a new version of our Terraform provider and updated some Terraform examples. You can find the list of changes here. The provider and modules are already in Terraform’s registry.
    • We have released a new version of the CAST AI agent. You can find the complete list of changes here. To update the agent in your cluster, please follow these steps.
  • GPU Support on GKE & Launch of Network Cost Monitoring

    Rebalancer
    Node configuration
    Note templates
    Cost monitoring
    Evictor
    • CAST AI now supports autoscaling with GPU-attached nodes on GKE. This feature can be enabled and managed through the node template menu. The Autoscaler responds to pending pods that require GPU resources, upscaling the cluster as necessary. By using a manifest file, a workload can request specific GPU models and more. For detailed guidance, check our documentation.
    • Added support for Advanced Evictor configuration. Using node or pod selectors users can control what workloads should be targeted by or protected from Evictor. Read more in our documentation.
    • For EKS clusters Node configuration now supports the use of a customer-provided KMS key for the encryption of EBS volume.
    • The rollout of the Default node template has been completed, feature is now generally available Further details can be found in our documentation.
    • We’ve enhanced our Cost monitoring suite to report on network costs. This feature is accessible to EKS and GKE customers but requires the egressd to be installed on a cluster. Upon completion of the installation, users can view network costs and traffic quantities, aggregated by cluster, namespace, or individual workload. Check the documentation for more details
    • We’ve updated the Rebalancer screens to showcase both projected and actual savings.
    • When setting up a schedule for Scheduled rebalancing, users can now set a value for Guaranteed minimum savings. This will ensure that rebalancing terminates after the node creation phase if minimum savings can’t be achieved. This setting safeguards the cluster from unnecessary rebalancing if planned nodes are not available from the cloud provider. All configuration options are listed in the documentation.
    • Introduced a new Notification status labeled ‘Obsolete’ and enhanced our filtering capabilities.
    • Terraform support has been added for the Reserved instance management feature for AKS clusters.
    • Clusters managed through Terraform can now be distinguished in the cluster dashboard via the ‘managed by’ field. Disconnection of such clusters through the UI is no longer permissible; it must be carried out via Terraform or API.
    • The audit log events for node addition or removal now display the node ID from the cloud provider and an exhaustive list of labels.
    • Deprecated Features:
      • CAST AI no longer offers optimization for kOps clusters. While this change won’t affect current users, this feature will be unavailable to newcomers.
    • We have released a new version of our Terraform provider and updated some Terraform examples; the list of changes can be found here. The provider, together with modules, can be found in the registry.
    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.
  • Default node template, Scheduled rebalancing, Organizational level security reporting, Cluster efficiency report and much more

    Rebalancer
    Cost monitoring
    Autoscaler
    Reservations
    • We have launched an Organizational level view of Security reporting, enabling customers to see their organization’s compliance posture through the Best practices report, which is now generated across all clusters. Additionally, the organizational level Image security report flags vulnerable images and affected clusters.
    • We have created a new CAST AI Audit Logs receiver using Opentelemetry. This component can read audit logs and seamlessly send them to the customer’s central logging system. The best part is that it’s open-sourced and available on GitHub.
    • A beta version of the Reservations feature is now available for Azure customers. With this update, customers can upload their Reservations of Virtual Machine Instances, allowing CAST AI to prioritize them during upscaling decisions. For more details, please refer to the documentation.
    • We are replacing Default Autoscaler settings with the Default Node template capability. This update offers greater flexibility in setting up the behavior of the Autoscaler when custom node templates are not used. Now users can create ‘spot-only’ clusters without using tainted nodes, set inventory limits using various constraints, apply custom labels and taints, and more. The full configuration list can be found here. This change is being gradually rolled out to limited set of customers first.
    • Deprecated Features:
      • The Cluster Headroom feature has been deprecated as it had very low adoption. The speed of CAST AI Autoscaler made extra headroom capacity wasteful in the vast majority of uses cases.
      • We have also deprecated the AWS reliability score feature from spot instance configuration options. This feature was reliant on old AWS behavior that did not translate into favorable performance. After introducing the interruption prediction model, this feature is no longer necessary.
    • Launched user interface for Scheduled rebalancing. Customers can now set up schedules and run rebalancing automatically using the UI. The Rebalancer’s log also indicates when a scheduled rebalancing was executed and the achieved savings.
    • We have added an updated Cluster efficiency tab to the Cost Monitoring suite, providing data on the current and historical state of CPU/MEM overprovisioning in the cluster, as well as the cost of each provisioned/requested CPU or GiB of RAM.
    • The Audit log improvements:
      • The Audit log now offers improved usability with added filtering options and an hour picker.
      • The ‘Unscheduled pods policy applied’ event now includes details about the node template utilized by the pending pods.
      • ‘Node was added’, ‘Node was deleted’ – now has cloud provider ID as well label details
    • For GKE clusters, we have added support for regional GCP volumes. Previously, we only supported single availability zone volumes.
    • We have implemented multiple improvements to the Nodelist, including changes in the presentation of CPU/RAM data, the ability to see not only Labels applied to the nodes but also annotations, taints and IP.
    • We have released a new version of our Terraform provider and updated some Terraform examples; the list of changes can be found here. The provider, together with modules, can be found in the registry.
    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.
  • New Autoscaler Engine, Predictive Rebalancing, and ARM Node Support for GKE Clusters

    ARM
    Evictor
    Spot Instance
    Autoscaler
    Rebalancer
    • We’ve released a new version of the CAST AI Autoscaler engine, which introduces a host of improvements. Now, CAST AI can consider an even more diverse set of instance types. This update not only improves GPU support but also enhances node distribution across zones and subnets for our EKS and AKS customers. The new engine significantly bolsters the Rebalancer, enabling customers to achieve greater savings and overcome previous limitations when rebalancing multi-zone clusters. We’re now offering comprehensive support for both NodeAffinity and NodeSelectors, which includes the integration of Affinities like NotIn. This update ensures that customers can easily utilize their preferred labelling method, or even both, with minimal friction. There’s no need for customers to take any action to benefit from the updated engine—it works straight out of the box.
    • CAST AI now fully supports optimization of GKE clusters running ARM nodes. For more information, please refer to our documentation.
    • On the cluster dashboard, users can now view both the count of unscheduled pods and the reasons for pods remaining in the pending state.
    • The AWS Node configuration now cross-references the cloud provider for configured subnets and security groups, allowing users to easily select them.
    • We’ve launched a beta version of our predictive rebalancing feature set for AWS customers. To proactively manage spot interruptions, users can now select one of two interruption prediction models. CAST AI will handle notifications about upcoming interruptions, identifying impacted nodes and rebalancing them in advance. More information is available in our documentation.
    • We’ve improved our Notifications functionality. Now, notifications older than 24 hours will automatically expire, and those followed by a successful operation will automatically resolve.
    • We’ve enhanced Evictor logging. The logs now record the pods present on the node before the drain operation as well as those that remained on the node if draining failed.
    • The user interface for Node templates now supports the setup of custom taints. Previously, this functionality was only accessible via the API / Terraform.
    • In addition to the Azure container offering launched last month, CAST AI is now also available as a SaaS offering on the Azure Marketplace.
    • We have released a new version of our Terraform provider and updated some Terraform examples; the list of changes can be found here. The provider, together with modules, can be found in the registry.
    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.
  • Redesigned CAST AI Console, Azure Kubernetes Marketplace Offering, and More Updates

    ARM
    Evictor
    Spot Instance
    Workload rightsizing
    Rebalancer
    • We’ve launched a new design for the CAST AI console, making it more modern, sharp, and user-friendly.
    • The available savings report for EKS clusters now provides recommendations for optimal configurations using Graviton (ARM) nodes, even if the current configuration doesn’t include such nodes. This change allows users to simulate how their cluster would appear if migrated to nodes with ARM processors.
    • The available savings report for clusters in read-only mode will no longer display the full recommended optimal configuration. This feature is now exclusively available for clusters managed by CAST AI.
    • Modified the workload efficiency calculation to account for deliberately under-provisioned workloads. Efficiency is now capped at 100%. For instance, heavily memory under-provisioned workloads will show 100% efficiency, but they risk OOM kills.
    • Updated the workload efficiency calculation for multi-container pods – depending on the request size, pods will contribute proportionally to the overall efficiency score.
    • Fixed a bug that prevented the evictor from continuing optimization of a cluster when it encountered a faulty pod.
    • We’ve enhanced the detection of the Metrics server on a cluster to prevent instances where the workload efficiency page isn’t displayed even though the Metrics server is installed.
    • Created APIs for the upcoming AWS Spot rebalance recommendation handling and preventive rebalancing features. Users interested in testing these features should contact our support team.
    • We’ve created APIs and implemented Terraform changes for the scheduled rebalancing feature that will allow partial or full rebalance of a cluster based on a cron type schedule. Users interested in testing this feature should contact our support team.
    • For AKS users, we’ve created APIs for the upcoming Reserved Instance management feature. Users interested in testing this feature should contact our support team.
    • Added support for Google’s newly introduced G2 VMs.
    • AKS Govcloud and Azure CNI Overlay networking are now supported.
    • We’ve modified the partial rebalancing savings calculation logic. Previously, during partial rebalancing, savings were calculated based on total cluster cost. Now, it will be calculated based solely on the selected nodes.
    • Enhanced the spot diversity feature. It now includes a user-defined safeguard: a permitted percentage price increase that the Autoscaler should adhere to but not surpass when making autoscaling decisions that enhance spot instance family diversity on the cluster.
    • Now, in non-aggressive mode, the Evictor will no longer target nodes that have running jobs.
    • We’ve released the CAST AI agent as an Azure container offering on the Azure Kubernetes Marketplace.
    • We’ve included the ability to override the problematic pod check during the rebalancing process, even if normally such a pod would prevent a node from being rebalanced. For more details, please refer to our documentation.
    • Fixed several bugs in the Security report and kVisor agent.
    • We have released a new version of our Terraform provider and updated some Terraform examples; the list of changes can be found here. The provider, together with modules, can be found in the registry.
    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.
  • CAST AI is now listed on AWS marketplace

    Node configuration
    Note templates
    Terraform
    Rebalancer
    ARM
    • CAST AI can now be purchased via the AWS marketplace, offering seamless integration and an easy procurement process.
    • We have released Scheduled rebalancing APIs and updated the Terraform provider allowing users to run rebalancing on a schedule and limit the scope, e.g., rebalance a full cluster or only spot nodes.
    • GCP Custom instances are supported in the Node templates UI.
    • AKS autoscaling with Storage Optimized Nodes for AKS users – we have added support for autoscaling using storage-optimized nodes when the workload requests ephemeral storage in the pod definition (i.e., via nodeSelector and toleration). Read more in our docs.
    • Node template UI now supports multiple custom labels.
    • We have added support for preferred affinity, enabling pods to be scheduled on templated nodes and if they are not available, nodes added by the default autoscaler will be used.
    • Node configuration UI for GKE clusters now has a setting to pass Network tag values as well as to set a value for the MaxPods per node.
    • In the EKS Cluster Node Template UI, users can now select between ARM and/or x86_64 architectures.
    • For EKS clusters Node configuration UI now allows the selection of EBS disk type and specification of required IOPS values
    • Workload Efficiency Improvements. Previously, under-provisioned workloads were shown as equally inefficient as over-provisioned ones. As under-provisioning might be a deliberate customer strategy to save resources while accepting risk, such workloads are now capped at 100% efficiency.
    • We have released a new version of our Terraform provider and updated some Terraform examples; the list of changes can be found here. The provider, together with modules, can be found in the registry.
  • Release of Spot diversity feature and pod affinity support

    Node configuration
    Note templates
    Cost monitoring
    Spot Instance
    ARM
    • Implemented pod affinity support for well-known Kubernetes (and some cloud provider-specific) labels, check the documentation.
    • We have released the Spot diversity feature for community feedback (available across all supported cloud providers). When turned on, the Autoscaler will try to balance between the most diverse and cheapest instance types. By using a wider array of instance types, the overall node interruption rate in a cluster will be lowered, increasing the uptime of your workloads. API and Terraform support is already available, UI will follow. More in our documentation.
    • In Cost monitoring, Workload level reporting, we have optimized the workload grouping algorithm, for large clusters with a significant number of workloads that use repetitive naming patterns. Handling of short-lived workloads was also improved.
    • Updated Viewer role permissions, so viewers in the organizations now can generate rebalancing plans.
    • The Available savings report now recommends ARM nodes if the cluster’s current config has at least one ARM node.
    • AWS customers can now use nodes with ARM or x86_64 processors when creating Node templates. More in the documentation. This change is currently implemented in API and Terraform, UI changes will follow.
    • Multiple arbitrary labels are now supported in the Node templates, check our docs. This change is currently implemented in API and Terraform, UI changes will follow.
    • The unscheduled pods policy audit log event expanded to contain more data about the trigger, to provide more insights to a customer.
    • We have updated Terraform examples. Released a new version of our Terraform provider; the list of changes can be found here. The provider, together with modules, can be found in the registry.
    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.
  • Beta launch of Workload rightsizing recommendations, ARM processor support in EKS

    Node configuration
    Note templates
    Evictor
    RedHat Openshift
    Terraform
    ARM
    • Not sure what CPU and MEM requests to set on your application containers? We are here to help! In our Cost monitoring suite, we have launched the Workload rightsizing feature. Users can now access the Efficiency tab where the list view displays the workloads and their respective requested and used resources (calculated in resource hours). Every workload can be accessed for more resource requests and usage data. On top of that, CAST AI now provides CPU and RAM rightsizing recommendations for every container. The entire feature set is currently available in beta access, we are actively gathering feedback and making incremental improvements.

    • CAST AI agent image and Helm chart are now RedHat certified and available in the partner software catalog.

    • The Available savings report for RedHat Openshift clusters was adjusted to identify master and infra-worker nodes as not optimizable.

    • AWS customers who are using ARM nodes (Graviton processors) can now connect and autoscale such clusters using CAST AI. To initiate the autoscaling workload has to have a nodeSelector of affinity with the label kubernetes.io/arch: "arm64". Check the documentation for an example.

    • In GKE clusters MaxPods value for CAST AI added nodes can now be passed via Node configuration API. GCP Network tags can now be set via the same API as well. UI updates to follow.

    • We added node affinity support to Node templates, previously only nodeSelector was supported. More details can be found in the documentation. On top of that Node templates API now supports multiple arbitrary taints.

    • Removed permissions to create Security Groups from CAST AI ‘user’ in AWS. These permissions are no longer required as the creation of the Security Group was transferred to the onboarding script. Currently, required permissions can be checked here.

    • Added Node template support to the Terraform, it is available from v.4.5.0 of eks-module
      and 2.1.1 for CAST AI Terraform provider. 

    • Reworked how Evictor policy is managed via terraform. Until the user sets Evictor policies, Evictor installation is not modified in the cluster; users can still proceed by managing evictor via Helm if they choose so. Policy defaults will always be the same and no longer synced to evictor helm changes. When the Evictor policy is modified in the UI, the changes will be synced to Evictor as usual.

    • Updated metrics collection so widgets in the cluster dashboard take into account pods that are scheduled on the node but can’t start due to containers failing to initialize properly.

    • Node configuration API for EKS clusters now supports additional storage volume parameters: volume type, IOPS, and throughput. In addition, the API now supports the specification of the IMDS version to be used on CAST AI provisioned nodes. Documentation.

  • Deeper security insights, improved AKS image management, and OpenShift support

    Node configuration
    Note templates
    Security
    Autoscaler
    • CAST AI kVisor security agent has been released. Customers can now get even deeper security insights into their Kubernetes clusters by enabling the CAST AI kVisor agent. There is no need to wait until vulnerabilities and best practices reports are refreshed as kVisor assesses public and private container images for vulnerabilities once they appear in your cluster. It also provides a more thorough analysis of your cluster configuration based on CIS Kubernetes benchmarks and DevOps best practices.

    • Improved the way CAST AI manages AKS images used when creating new nodes. When onboarding an AKS cluster to CAST AI-managed mode, we will create and store in the customer’s gallery the image to be used when creating CAST AI-managed nodes. This new solution drastically reduces the time required to create and join new nodes to a cluster. Rebalancing execution times are also reduced.

    • Added ‘Read only’ support for Red Hat OpenShift Service on AWS (ROSA) clusters. Now customers running ROSA clusters can experience all the CAST AI reporting features for free.

    • CPU usage report was completely reworked. It now provides running CPU usage data as well as billable CPU counts on a daily basis. Billable CPU count is the foundation of CAST AI billing and now customers will be able to see the current as well as the forecasted end-of-month numbers.

    • Improved Autoscaling. Previously, if a required instance type was unavailable due to an ‘Insufficient Capacity Error’ received from the Cloud provider, it took CAST AI a considerable amount of time to find an alternative or initiate the creation of Spot Fallback. Now, CAST AI will choose the next best option straight away from the ‘candidate list’ without waiting for the next snapshot.

    • Reworked workload grouping algorithm, so Workload cost reports for clusters with a huge amount of workloads load faster.

    • Adjusted Autoscaler settings, Node template, and Node configuration UX by making minor changes to the user interface elements.

    • Set the default disk-to-CPU ratio to 0 (instead of 5). Now, by default CAST AI added nodes would have 1 CPU to 0 GiB root volume size ratio, so the default added disk would be of 100 GiB. Users can change this setting in Node configuration properties.

    • Users can now trigger Cluster reconcile, for any cluster that was onboarded to CAST AI-managed mode. This functionality aligns the actual cluster state with the state in the CAST AI central system, so any issues related to credentials or inconsistencies in cluster state can be resolved (or flagged).

    • Users can also now retrieve our credentials onboarding script when the previously onboarded cluster is in the ‘Failed’ or other states. Re-running this script would update CAST AI components and solve any IAM issues (e.g., missing permissions).

    • We have updated Terraform examples. Released a new version of our Terraform provider; the list of changes can be found here. The provider, together with modules, can be found in the registry.

    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.

    • Fixed various bugs and introduced minor UI/UX improvements.

  • Node template and Configuration support for AKS and GKE clusters

    Node configuration
    Note templates
    • Released Node template and Node configuration functionality for GKE and AKS clusters. Customers can now create required node pools and apply specific configuration parameters on CAST AI-created nodes. Over upcoming sprints, support for advanced configuration parameters will be added.

    • Enriched AKS cluster onboarding, so now the script returns detailed information about encountered errors, even if they are retriable. 

    • AWS Node configuration – added ability to pass kubelet configuration parameters in the JSON format.

    • Exposed the Kubernetes version in the cluster dashboard.

    • Removed the ability to set disk size in the Autoscaler policies page and transferred it to Node configuration. The default ratio for calculating disk size based on CPU is 1CPU:5GiB.

    • Added the ability for customers to modify helm values in our Terraform modules. Released a new version of our Terraform provider; the list of changes can be found here. The provider, together with modules, can be found in the registry.

    • Expanded our inventory to support all AKS regions. 

    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.

    • Fixed various bugs and introduced minor UI/UX improvements.

  • New user onboarding experience and many other improvements

    Evictor
    ARM
    Rebalancer
    Note templates
    • We have reworked the user onboarding flow to improve the experience. Now, users can explore a demo cluster available immediately after registration, providing a guided tour through CAST AI features.

    • When users connect an EKS cluster with GPU-attached nodes Savings report now displays the GPU count as a separate dimension.

    • Users now have the ability to swap between Nodes and Workloads view when preparing to rebalance the cluster. It makes it easier to identify problematic workloads in one go.

    • The minimum node count figure is now exposed in the Rebalancing plan screen so that users can configure the minimum desired number of nodes in the post-rebalanced cluster state. That way, customers have more control over the rebalancing outcome to align with their goal for high availability / compute resource distribution.

    • For EKS users, we have added the support for autoscaling using Storage optimized nodes when the workload requests ephemeral storage in the pod definition (i.e., via nodeSelector and toleration, read more in our docs).

    • Node templates now support the Fallback feature, so users who create node pools using templates consisting of spot nodes can benefit from CAST AI’s ability to guarantee capacity even when spot nodes are temporarily not available.

    • Fixed a bug that caused Evictor not to shut down when the Empty node policy is turned off.

    • Added ARM node support into CAST AI provisioner for EKS and GKE clusters. Now ARM nodes can be added to the cluster via API, autoscaling support is coming up next.

    • Made Evictor more cluster context-aware, so when it is used in the ‘aggressive mode,’ it will not remove single replica pods in big batches, to avoid downtime.

    • Improved Autoscaler logic: when the autoscaler has a choice of AZ (pods don’t require specific zone via selector or attached volumes), it will choose the zone where there’s less provisioned capacity (CPU) and fewer blacklisted instances. Both factors are taken into account – heavy underprovisioning will win against slightly higher blacklist count, and vice versa.

    • Fixed bugs and released several user experience improvements in the Security report.

    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.

    • Released a new version of our Terraform provider; the list of changes can be found here. The provider, together with modules, can be found in the registry.

  • Notifications are here!

    Evictor
    Autoscaler
    Node configuration
    Notifications
    • We have launched the Notifications feature to inform customers via UI or webhook about various issues, upgrades, new releases etc., affecting their clusters. Currently, the feature supports a single scenario: customers will be informed if CAST AI credentials are invalidated. In the upcoming weeks, more scenarios will be added. A detailed guide about how to set up a notification webhook can be found here.

    • Improved the performance of the binpacking algorithm (Evictor) to ensure that it is capable of quickly downscaling large / volatile clusters. Instead of targeting and draining nodes one node at a time, Evictor will validate that affected pods are reschedulable and then target multiple nodes in parallel in the same cycle.

    • Workload cost report now supports filtering by labels so customers can easily find cost information of specific workloads based on the labels applied. Furthermore, users can now see the cost over time information for every workload.

    • Made taint an optional setting when creating Node template, users can also now specify custom nodeSelector. These improvements enable more flexible use of Node templates based on the customer’s use case.

    • Improved CAST AI autoscaler algorithm by adding multiple optimizations. CAST AI now considers additional cost efficiency scenarios before satisfying pending pods, for example: in the past, CAST AI used to prefer bigger nodes, but this, in turn, was not always the cheapest option. Now our algorithm also considers a combination of smaller nodes as well, this new approach contributes to more cost savings and additional stability when handling spot instances.

    • Updates to Node configuration feature:

      • Added ability to specify containerd or Docker as a default container runtime engine to be installed in CAST AI provisioned nodes.

      • Created functionality to provide a set of values that will be overwritten in the Docker daemon configuration (available values).

      • Added support for Node configuration functionality to CAST AI Terraform modules.

    • The CAST AI cluster controller is now more resilient and now will restart on failure instead of failing silently.

    • Re-arranged UI elements in the console and made various changes to the onboarding flow for a better user experience.

    • Added support for AWS ap-southeast-3 (Jakarta) and me-central-1(UAE) regions.

    • Introduced multiple stability and performance improvements to the platform.

    • Released a new version of the CAST AI agent; the list of changes can be found here. To update the agent in your cluster, please follow these steps.

    • Released a new version of our Terraform provider; the list of changes can be found here. The provider, together with modules, can be found in the registry.

  • Launch of the Free Kubernetes Security report

    Security
    Node configuration
    • We have released a Free Kubernetes Security Report that contains:

      • Overview page – gives an overview of the historical trends in vulnerabilities and best practices configuration within the cluster and how those vulnerabilities are distributed across cluster resources.

      • Best practices page – gives insights into the cluster’s alignment with security and DevOps best practices. The insights are provided in the form of checks with a short description of the issue and remediation guidelines. This release only covers insights based on the read-only data collected by the CAST AI agent.

      • Vulnerabilities page – provides a list of vulnerable objects with detailed information about vulnerabilities found and information about available fixes. This release covers only vulnerability assessment of the images downloaded from public repositories.

    • Cost over time graphs in the Available savings report now supports reacting to the chosen configuration preference (i.e., Spot only, Spot-friendly, or only on-demand configuration settings).

    • Our recently released Node configuration functionality for EKS now also supports kubelet configuration parameters and the ability to pass user data in the form of a bash script.

    • GCP custom instances can now be provisioned with a lower CPU to RAM ratio of 1 : 0.5 , instead of 1 : 1 as it was previously.

    • The latest version of the CAST AI agent is v0.32.1; the list of changes can be found here. To update the agent in your cluster, please follow these steps.

    • Released a new version of our Terraform provider (v0.26.0); the list of changes can be found here. The provider, together with modules, can be found in the registry.

  • Launch of GPU support, Node configurations, and templates for EKS clusters

    Autoscaler
    Node configuration
    Note templates
    GPU
    • CAST AI can now autoscale workloads that require GPU-attached nodes. Currently, this feature supports Nvidia GPU-attached EKS nodes. To use this functionality, the workload needs to have defined GPU limits and toleration nvidia.com/gpu. For more information, please refer to documentation.

    • We released a new feature called Node configuration! It allows users to define configuration settings for each CAST AI provisioned node. This feature is currently enabled for EKS clusters only.

    • Also for EKS clusters, we have released a new feature called Node template, which allows the creation of node pools. Node pools can be used to run specific workloads on pre-defined list of nodes only. Cost-wise, this behaviour leads to sub-optimal state of the cluster but it gives users more control.

    • Another new feature! The Cost comparison report captures the state of the cluster (i.e. number of requested CPUs and cost) prior to the enablement of CAST AI optimization and extrapolates the savings by comparing the cluster’s historical versus current state. The report clearly shows the value of the CAST AI node selector and bin packer algorithms.

    • We launched a CAST AI offering in the GCP marketplace, so Google customers can purchase CAST AI directly from the well-known GCP platform.

    • For GKE clusters, the CAST AI autoscaler can now be instructed to scale the cluster with instances that have locally attached SSD disks. To do it, a workload has to have a node selector and toleration for label scheduling.cast.ai/storage-optimized defined in the spec. For more details, please refer to the documentation.

    • We have introduced temporary taints to prevent pods from being scheduled until the node creation phase is completely finished during rebalancing.

    • Revamped the design of the cluster list, added more details about CPU and memory resources used by individual clusters, as well as the organization as a whole.

    • We uplifted the design of the Autoscaler policies page.

  • Cost Report launch

    Cost monitoring
    Evictor
    • The Cost Reporting solution is now live. The report displays the compute costs of the cluster and cost allocation per workload or namespace. Customers can quickly assess the compute costs associated with an application, service, or team. Additional reporting dimensions like Cost per CPU were introduced to help customers analyze the cost. Further enhancements are coming in Q3.

    • Changes in user interface settings in the Available savings or Cost reports now persist when switching to another cluster.

    • The cluster list can now be filtered based on the cluster name or status.

    • The CAST AI agent adjusts the resources it needs to operate based on the size of the cluster. We have improved the memory size scaling logic to consider the CPU count and the node count.

    • Users can now set a custom replica count for the CAST AI agent, by simply scaling the deployment. At a point in time, only one replica will be running, and others will be in a passive mode. However, if an active replica crashes, another replica will become active, thus ensuring service availability.

    • Evictor now respects Pod Disruption Budgets (PDB) and won’t try to evict the pod if it would violate the PDB.

    • Exposed the blacklist information via API. Instances that can’t be used for autoscaling are now visible via the API. Instances affected by an insufficient capacity error in the cloud service provider are not visible yet; this improvement is in the works.

    • During the installation of our agent, a customer-managed secret can now be specified in CAST AI Helm charts; check the documentation for guidance.

    • Added support for kOps versions 1.21 and 1.22.

    • The latest version of the CAST AI agent is v0.31.0; the list of changes can be found here. To update the agent in your cluster, follow these steps.

    • Released a new version of our Terraform provider (v0.24.2); the list of changes can be found here. The provider, together with modules, can be found in the registry.

  • Partial rebalancing, pod topology spread constraint support, and Terraform module for AKS clusters

    Autoscaler
    Rebalancer
    Evictor
    • Introduced the partial rebalancing capability. Instead of rebalancing the whole cluster, customers can now select specific nodes to rebalance (replace with more optimal configuration). The user experience was reworked to focus on the nodes instead of the workloads.

    • Implemented temporary tainting on the new nodes created during rebalancing, so no workloads can land on them before the node creation phase is finished.

    • When autoscaling GKE clusters, CAST AI can now assess the workloads and pick custom instances if it is more beneficial than using standard instances. This is an optional policy setting customers can enable.

    • Added a search bar to the cluster list.

    • Improved the calculation for system memory overhead on GKE and EKS nodes, to ensure that pods always fit in the provisioned nodes.

    • Added support for scheduling.cast.ai/compute-optimized: "true" label. If this node selector is used, CAST AI will provision compute optimized nodes. The list of all supported labels can be found in the documentation.

    • Made the CAST AI agent more robust by ensuring that health check fails if it can’t deliver snapshots for a period of time.

    • Updated Evictor to not evict static pods.

    • Added automatic handling of CPU quota errors during autoscaling for GCP and Azure customers.

    • Added support for workloads on AKS that use persistent volumes and topology label topology.disk.csi.azure.com/zone

    • Improved the logic of recognizing not ready/unreachable nodes and removing them from the cluster.

    • The autoscaler now supports pod topology spread constraints on the topology.kubernetes.io/zone label. More information can be found in the documentation.

    • Removed ability to delete control plane nodes on kOps clusters directly from the node list.

    • If a node is cordoned, its status in the node list will change to “ShedulingDisabled”.

    • The latest version of the CAST AI agent is v0.28, the list of changes can be found here. To update the agent in your cluster, follow these steps.

    • Released a new version of our Terraform provider (v0.24.0) and module to support AKS cluster onboarding. Provider and the module can be found in the registry.

  • Evictor as a policy and many more improvements

    Autoscaler
    Evictor
    • Evictor – you can now install our algorithm that continuously bin packs pods into the smallest number of nodes via Autoscaler policy (in UI and Terraform). Previously, users had to follow a documented guide and install it via the command line. Please note: if you have Evictor already installed and configured it will continue to run, even though in the Autoscaler page it might indicate that it can’t be enabled. In order to correct this you would need to remove current Evictor installation manually and enable it from the Autoscaler page.

    • Cost per CPU’ reporting dimension was added to Available Savings report, as well as Node list and Rebalancing screens. This cost is calculated by dividing the compute cost by the number of provisioned CPUs. It is also exposed as a scrapeable metric for the whole cluster or per-instance life cycle: spot, on-demand, fallback. A full list of currently available metrics and the setup guide are available here.

    • We reacted to community feedback and improved the user interface and experience of our Cost report. The user interface of the Autoscaler policies page was also uplifted.

    • Added functionality that allows users to remove a team member from the organization.

    • Added the concept of a “Project” to the CAST AI console. Previously, if users had clusters with the same name across different GCP projects (or Azure Resource groups, AWS accounts) in the CAST AI console, there was no way to differentiate between these clusters. Now each cluster record also indicates the name of the GCP project / Azure Resource group / AWS account ID.

    • Node list view now displays the total node CPU and Memory capacity instead of allocatable values. We have also fixed an issue preventing the status of the cordoned node to be accurately presented in the node list.

    • We have updated the calculation formula for the root volume that is added to each CAST AI provisioned node. Before the change nodes could have had 100 GiB disks as a minimum or a larger disk based on CPU to Storage (GiB) ratio. This ratio couldn’t have been less than 1 CPU: 25 GiB. Now, a 100 GiB disk is a base and we add additional storage based on the CPU to Storage ratio, which can be as low as 1 CPU : 0 GiB.

    • For GKE clusters, the logic of the ‘Node constraints’ setting in the Autoscaler policy is now more flexible. We have removed CPU to RAM ratio validations so users can choose more flexible configurations.

    • For AWS EKS users who are using AWS CNI, we have improved the autoscaler logic, so when considering what node to add, autoscaler would react to an error if the target subnet is full and choose another available subnet (if any).

    • For AWS EKS users who are using CSI driver for Amazon EBS we have added the support for topology.ebs.csi.aws.com/zone label in the autoscaler so the new node is created in the correct zone, respecting the specification of the storage class.

    • We have optimized permissions required to onboard and run CAST AI in EKS clusters, more information can be found in the documentation.

    • If an EKS cluster is being onboarded using Terraform we now allow passing AWS credentials via environment variables to the agent helm chart.

    • Added Terraform examples for onboarding EKS cluster using custom IAM policies, creating EKS cluster with NAT Gateway and Application load balancer. All examples can be found here.

    • The latest version of the CAST AI agent is v0.27, the list of changes can be found here. To update the agent in your cluster, follow these steps.

    • The latest version of our Terraform provider is v0.23, the list of changes can be found here. The provider and the modules can be found in the registry.

  • Terraform support for GKE, Cost report, AWS Cross account role support, and more

    Cost monitoring
    Terraform
    Rebalancer
    • Implemented an additional way to onboard an EKS cluster. Now users can delegate access to CAST AI using the AWS cross-account IAM role.

    • Added scrapable metrics for Fallback nodes to display the requested and provisioned CPU and RAM resources. A full list of currently available metrics and the setup guide is available here.

    • Released the Cost report for a public preview. This report allows customers to track historical cost data of the cluster to understand how the cost fluctuated over the time period, what was the normalized cost per provisioned CPU, what is the forecasted cost at the end of the month, and more.

    • We have released a new version of our Terraform provider (v0.17.0) and modules to support GKE cluster onboarding. Provider and the modules can be found in the registry.

    • Updated our External clusters API so EKS customers can check and if necessary, update security groups.

    • Added an ability for a GKE cluster, which was paused using GCP console, to automatically get back to the ‘Ready’ state after being resumed.

    • Introduced new node status called “Detached,” so nodes that are no longer part of the K8s cluster but still running in the customer’s cloud account could be identified for the removal.

    • Cluster dashboard will now display spot, on-demand, and Fallback node counts separately.

    • If AWS custom tags were added into the cluster config, they will be replicated to the underlying volume attached to the EC2 instance.

    • Optimized performance of the Evictor in very large clusters and, as a consequence, users can bring costs down faster.

    • Exposed more details about the error when encountered during rebalancing operation.

    • Enhanced ‘Unscheduled pods policy applied’ audit log event to display the trigger (i.e. pods that caused autoscaling), as well as the filters that Autoscaler was working with to pick up a new node.

    • Also in the audit log, the event that indicates the Node addition failure is now followed by a rollback event.

    • Implemented enhanced JSON viewer, making it much easier to read JSON output when presented in the console.

    • If the cluster is in the ‘Warning’ state, it will now display a reason for it.

    • The latest version of the CAST AI agent is now v0.25.1. To update the agent in your cluster, follow these steps.

  • GCP network tags, ssh key support, and audit log improvements

    Autoscaler
    Evictor
    • Added support for GCP network tags (a concept used in the GCP world to manage network routing). Users can pass the tag 'network-tag.gcp.cast.ai/{network-tag-name}' as a label and newly created nodes will be tagged accordingly.
    • Now if CAST AI fails to add a node due to reasons outside of our control (e.g., the customer’s quota is too low) a specific event called “Adding node failed” will occur in the audit log to provide additional context about the failed operation.
    • Added support for ssh public keys. Using the updated cluster configuration API, users can set the public key (base64 encoded) or, in the case of AWS, also use the AWS key pair ID ("key-0123456789EXAMPLE") and connect to CAST AI provisioned nodes.

    • Spot nodes can now be interrupted directly from the node list. The interrupted node will change its status and eventually be removed from the cluster while the new spot node is provisioned instead. The whole process takes a few minutes to complete.

    • Added numerical values of requested and allocatable resources in CPU and GiB to the detailed node view in the Node list.

    • We have added an additional sheet that lists all workloads and their CPU & RAM requests in the Excel extract of the Available savings report.

    • If Evictor fails the leader election process, it will now restart automatically. Previously, users may have encountered a situation where the leader election process has failed, causing Evictor to fail silently.

    • “Unscheduled pods policy applied” audit log event JSON now has more context about what information was considered and led to the addition of a specific node, i.e. which nodes were skipped and why, which workload triggered the autoscaling event, what were the node constraints, etc. This feature greatly improves transparency into the decision-making process employed by CAST AI.

    • Introduced label selectors in the mutating webhook configuration, so customers can control the webhook in a much more flexible manner. Previously, users could set which pods should be scheduled on the on-demand nodes using regular expression values (namespaces). Now, they can use label selectors to force (or ignore) some pods to run on spot nodes (ability to force some pods to run on on-demand nodes based on a namespace remains).

    • Added another scrapable cluster metric – the hourly compute cost per pricing type (spot, on-demand, fallback). Check the documentation for more details.

    • Introduced a separate screen for the dashboard of the disconnected cluster.

    • Uplifted the design of console menu items.

  • Node list filtering, more Prometheus metrics and additional details about connected clusters

    Cost monitoring
    Autoscaler
    • Released a node list filtering and search capability that allows users to filter large node lists conveniently based on specific search criteria.

    • Cluster dashboard and a more detailed node list are now available as soon as a cluster is connected to CAST AI. It is no longer necessary to connect a cluster into the ‘managed’ mode in order to access these features.

    • In the Available Savings report, compute resources can now be viewed in a more detailed mode where they are broken into categories based on instance lifecycle type: spot, on-demand, fall-back (a temporary on-demand instance while spot is not available).

    • Introduced universal autoscaling.cast.ai/removal-disabled label and annotation that will be respected during Rebalancing or Evictor operations. Nodes or workloads marked this way will not be subject to migration. This label also replaces previously used beta.evictor.cast.ai/eviction-disabled which will be deprecated shortly. More information about Evictor overrides can be found in the documentation.

    • The Autoscaler now supports ‘topology.gke.io/zone label.

    • More Prometheus metrics. We have exposed for scraping all metrics visible in the cluster dashboard. A full list of currently available metrics and the setup guide can be found here.

    • Released a new version of our Terraform module for connecting EKS clusters to CAST AI. The module now supports cluster-wide tagging as well as the ability to configure Autoscaler policies.

    • To ensure that kOps nodes always have resources to run OS and kubelet we have implemented support for the system overhead settings.

    • Available savings report now has PDF export functionality.

  • Further enhancements of the Rebalancing feature

    Rebalancer
    • The Rebalancing feature received the following improvements:

      • Temporary on-demand nodes (aka Spot fallback nodes) will be considered during rebalancing plan generation if the ‘Spot fallback’ feature is turned on in the Autoscaler.

      • Applied various improvements to reduce the amount of time taken to create and execute Rebalancing plans. This performance enhancement is especially noticeable on large clusters.

      • Introduced a way to protect specific workloads and nodes from migration activity during rebalancing. Users can annotate pods or label nodes with 'autoscaling.cast.ai/removal-disabled' to ensure that they are not considered for migration.

      • Users can now generate new rebalancing plans even if the current plan is still relevant. Generating new plan would move previously active plan in to the obsolete state.

    • Latest version of the CAST AI agent is v0.22.8. To update the agent in your cluster, follow these steps.

    • Uplifted our signup and login pages for a better user experience.

    • Bug fixes and other performance improvements.

  • Scoped autoscaler, improved Available savings report, and more

    Autoscaler
    Rebalancer
    • Released Scoped autoscaler, a mode of operations where the CAST AI autoscaler co-exists with another autoscaler on the same cluster and manages only specific workloads. To restrict the scope of the autoscaler, workloads have to be modified as described in the documentation.

    • Improved the Available Savings report by adding additional interactive settings that enable customers to simulate further optimization of the cluster by using spot instances more aggressively or operating the cluster on a schedule. Automated capability to stop external clusters on schedule when they’re not in use is in development and coming soon.

    • The Available savings report can now be exported to the Excel format.

    • Released the following Rebalancer improvements:

      • Issues preventing workloads from being migrated to new nodes can now be seen in detail from the workloads screen.

      • Each rebalancing plan now has a visible generation date.

      • Rebalancing plans now become obsolete after 1 hour and move to the archive with the status ‘Obsolete’.

      • In case the rebalancing plan execution failed, a technical error message is now visible in the logs.

      • In case the Rebalancer fails during plan generation, an error will be displayed to the customer on a separate screen. Rebalancing operations will not progress further.

      • Added automatic handling for insufficient capacity error, i.e. when the originally planned node type is no longer available, CAST AI will choose the next best alternative and proceed.

      • Updated the Rebalancer documentation.

    • We have released a new version of our Terraform provider (v0.10.0). The provider now supports cluster-wide configuration changes (e.g., the addition of subnet, security group). Documentation on Terraform registry was updated as well.

    • Evictor now has an aggressive mode where it can evict pods even if they have just a single replica. Check the documentation for more details.

    • Nodes can now be manually deleted from the Node list using the ‘Delete node’ button. During this operation, nodes are drained and then deleted from the cluster.

    • Released the new version v0.22.6 of the CAST AI agent, where we have improved how spot instances are identified on GKE. To update the agent in your cluster, follow these steps.

  • Terraform provider update and improved nodelist

    Rebalancer
    Terraform
    Autoscaler
    • We have released an updated version of the Terraform provider (v0.8.1), it now supports EKS clusters. Release and example projects can be found on GitHub.

    • In our UI menu “Policies” page is now called “Autoscaler”. We have started the work on improving the experience of setting up and controlling the autoscaler, more changes will come.

    • Released cluster dashboard that displays key metrics related to each cluster.

    • Implemented the following Node list improvements:

      • The list is sorted in descending order by date and there is now a possibility to sort the list on most of the columns.

      • Ability to view labels attached to each node.

      • Spot fall-back nodes are now identified with an icon.

    • Improved error handling in Rebalancer, providing screens with more details about the encountered error and possible remediation.

    • New version of CAST AI agent v.0.22.5 is now available. To update the agent in your cluster follow these steps.

    • Fixed a bug in GCP custom spot instances pricing.

    • Fixed a bug in the Available savings report where sometimes workloads that are already running on spot instances would be suggested to be run on on-demand nodes.

    • Added records of spot fallback events to the audit log.

    • Evictor now has a setting to run in more “aggressive” mode, where it would also evict pods with a single replica. Check the documentation for more details.

    • Improved performance of our console UI and fixed various small bugs.

  • Spot fallback, enhanced cluster node list & private cluster support

    Spot Instance
    Autoscaler
    • Have you ever experienced Spot instance drought, when instances you need are temporarily not available and so your workloads become unschedulable? The Spot fallback feature guarantees capacity by temporarily moving impacted workloads onto on-demand nodes. After a period of time, CAST AI will check for Spot availability and move the workloads back to spot instances. This feature is available on the Policies page under the Spot instance section and supports EKS, Kops, and GKE clusters.

    • Added support for private kOps clusters that do not have K8s API IP exposed to the internet. CAST AI agent now supports “call-home mechanism” for private IP K8s clusters.

    • Node list went through a major upgrade and now contains much more detailed information about individual nodes in the cluster.

    • Autoscaler can now be instructed to scale the cluster with instances that have locally attached SSD disks, when the storage-optimized label is used in a workload spec. For details, please refer to the documentation.

    • Minor improvements to UI and bug fixes.

  • Release of Rebalancer & cluster cost graph

    Rebalancer
    • We have launched a new feature that we call Rebalancer. It allows users to automatically migrate clusters from the current state to the most optimal configuration. The migration is performed via three distinct phases: 1) during the preparation, the user can inspect all impacted workloads; 2) later the user gets a migration plan so they understand what nodes will be added & removed and what cost impact can be expected; 3) lastly – the migration plan is executed by adding new nodes, migrating workloads and deleting obsolete nodes.

    • The Available savings report is now enhanced with a graph that displays point in time actual and optimal cluster costs as well as other dimensions (i.e. CPU, Memory, node count).

    • For kOps clusters, we no longer consider master nodes in our available savings report recommendations.
    • Added support for kOps version 1.20.

    • For AWS/kOps clusters we previously deployed a Lambda function per cluster, its no longer the case. From now on a single Lambda function is deployed per account.

    • Implemented the handling of cases when customer has removed some permissions (or the cluster itself) in their cloud provider account. In such a scenario, the cluster would be displayed with status “Failed” in our console and user would have two options: remove the cluster from the console or fix the error in their cloud provider’s account.

    • Fixed various reported bugs and implemented other UI improvements.

  • AKS support is now available

    Autoscaler
    • Microsoft Azure users can now connect their AKS clusters to CAST AI and see how much they could save by using the CAST AI optimization engine. It’s completely free and safe as our agent operates in read-only mode. Try it out now.
    • Cluster onboarding flow is now fully automated and no longer requires manual entry of credentials.
    • Users can now generate read-only API access keys.
    • Cluster headroom policy based on instance lifecycle type. Until now, users could configure one set of headroom values for the cluster. Now they can set headroom values for on-demand and spot nodes separately.
    • Added support for the following AWS regions: ap-northeast-3 Asia Pacific (Osaka), ap-east-1 Asia Pacific (Hong Kong), af-south-1 Africa (Cape Town), and me-south-1 (Middle East (Bahrain)). We now support all AWS (and GCP) regions.
  • Introduction of roles and improved cluster onboarding flow

    Organization
    • Organizational roles have been released. Every organization now has Owner, Member, or Viewer (read-only) roles that can be managed in our console.

    • Cluster headroom and Node constraints policies are now independent and can be set separately.

    • Improved cluster onboarding flow. Customers are no longer required to enter the access key and secret details, the onboarding script takes care of these details now.

    • Customers can now set annotation on the pod level that would prevent Evictor from removing the node that hosts the pod. More details about annotations and labels used by Evictor can be found in our documentation.

    • The node deletion policy now removes nodes that are marked by Evictor immediately, ignoring the time delay set for empty nodes in the “Node deletion” policy. That way, customers can avoid paying for nodes that were marked as unschedulable.

    • Customers using AWS GovCloud regions (AWS GovCloud (US-East) and AWS GovCloud (US-West)) are now able to connect their clusters and check possible savings.

    • CPU hrs report is now available in the console. The report presents the total amount of CPU hours accumulated across all of the nodes in the organization.

    • GKE clusters running shielded nodes are now also fully supported in our platform.

    • Improved our inventory to support a wider range of instance types.

    • Delivered multiple Autoscaler improvements.

    • Minor UI improvements and bug fixes.

  • External GKE cluster optimization, Cluster metrics, and enhanced optimization policies

    Autoscaler
    • GKE cluster optimization. Customers running unshielded GKE clusters can now onboard them into CAST AI and benefit from all cost optimization policies.

    • Cluster metrics endpoint – we have released the first version of the metrics endpoint that provides visibility into the CAST AI-captured metrics. The initial description of metrics and setup guide can be found in Github. We will continue expanding the list of exposed metrics, so stay tuned.

    • Implemented Node Root Volume Policy policy that allows the configuration of root volume size based on the CPU count. This way nodes with a high CPU count can have a larger root disk allocated upon node creation.

    • We have enhanced the Spot policy for EKS and kOps, so customers can instruct CAST AI to provision the least interrupted spot instances, most cost-effective ones, or simply leave the default – balanced approach. We also support an ability to override this cluster-wide policy on the deployment level.

    • CAST AI agent v.0.20.0 was released – the agent now supports auto-discovery of GKE clusters, users are no longer required to enter any cluster details manually.

    • Cluster headroom and Node constraints policies are now separated and can be used simultaneously.

    • We made it easier for users to set correct node CPU and Memory constraints that adhere to supported ratios.

    • Bug fixes and small interface improvements.

  • Empty node time to live and new CAST agent version

    Autoscaler
    • Implemented a new feature that allows users to set the time for how long an empty node should be kept alive before deletion. This “empty node time-to-live” setting makes node deletion policy less aggressive in case users do not want to delete empty nodes right away. Read more about this feature in our docs.

    • CAST AI agent v0.19.2 was released – we removed managed fields and sensitive environment variables from objects as well as introduced compression of delta content sent by the agent. Ensure that you always update to the latest version of our agent. Check github for more details.

    • Quality of life improvements:

      • GKE connect cluster improved UX

      • Savings estimator now displays totals of nodes in current and optimized configurations

      • Savings estimator now displays the status of all Cost optimization policies

      • Spot instance recommendations for workloads from now on can be exported to .csv

      • Users can now investigate the content of yaml file in connect your cluster screen, before deploying it to the cluster

      • Improved UX for scenarios when Add-ons are not installed or can’t be found

    • Enhancement of our Audit log has continued, making it more detailed and useful.

    • Rolled out various bug fixes and small improvements.

  • Higher variety of SPOT instances, specification of CPU and RAM per node, Audit log improvements

    Autoscaler
    • Our Savings Estimator as well Autoscaler are now able to target higher variety of instance types when recommending SPOT instances. This improvement allows customers to unlock more savings from the use of instance families that previously would not be considered.

    • From now on users can rename the organization after the initial creation.

    • Audit log is now much more detailed and available for EKS and kOps clusters (previously this feature was available only on CAST AI created clusters).

    • We introduced annotation and label that protects a node from being considered for eviction and deletion, you can read more about it in our documentation.

    • During the migration in to CAST AI selected nodes, customers might want to specify minimum and maximum values of CPU and RAM for nodes to be added to a cluster. Now users can easily set these parameters in our Unscheduled pods policy and limit the possible pool of nodes that CAST AI considers. As before, other option is to use Cluster headroom settings.

    • We have added the support of kOps 1.11, 1.15 and 1.17.

    • Removed IAM permission to create new roles from our credentials script.

    • Implemented another quality of life improvement – clusters can now be sorted based on the name, region or status.

    • Fixed bugs and made minor improvements to UI.

  • Organizations, Cost analyzer for GKE clusters and Cost optimization functionality for kOps

    Autoscaler
    Organization
    • CAST AI now supports Organizations! Multiple team members from a company can now join CAST AI, create organization inside our console and collaboratively manage K8s clusters.

    • GCP customers can connect GKE clusters to CAST AI and see how much they could save by using CAST AI optimization engine. As always this is completely free and safe as our agent operates in read only mode. Try it out now. Functionality to optimize GKE cluster using CAST AI is currently in development.

    • Users running kOps clusters on AWS can now fully benefit from CAST AI cost analysis and optimization functionality. Connect your kOps cluster now, to see how much you can save and realize those savings by turning on AI driven optimization policies.

    • Connected AWS (EKS and kOps) clusters can now be paused and resumed as easily as CAST AI created clusters. Functionality to pause and resume on pre-set schedule is coming soon as well.

    • Node list is now accessible as soon as cluster is connected, customers no longer need to onboard cluster to access this functionality.

    • Additional Control plane nodes can now be added to CAST AI created clusters.

    • Clusters that were onboarded to CAST AI can now be disconnected via UI, customers have an option to delete or leave CAST AI created nodes.

    • We have reacted to user feedback and made minor adjustments in UI as well as fixed bugs.

  • Release of Add-ons and more agile CAST AI agent

    • We have released the Add-ons management functionality for CAST AI clusters. Now CAST AI clusters will be created faster without any add-ons pre-installed. Afterward, users will be able to choose the add-ons they wish to use. The Add-ons feature is available in the cluster dashboard, try it out! 

    • We increased the frequency of communication between the agent deployed on the client’s cluster and CAST AI and reduced the amount of data the agent sends via the network. Now CAST AI can react in as little as 15 seconds and scale the cluster as required.

    • We have applied minor improvements and fixes to increase the accuracy of our Available savings report.

    • Improved experience for selecting and managing your subscription.

    • Created a guide on how to disconnect your EKS cluster from CAST AI.

    • Last but not least, we fixed some bugs and made small improvements to the UI.

  • Release of Cost optimization functionality for EKS clusters

    Cost monitoring
  • Save a lot by pausing and resuming your clusters on schedule

    • Save costs by stopping your clusters when they’re idle! We have launched a “Cluster schedule” functionality to pause and resume clusters based on the user-defined schedule. Find this feature in your cluster dashboard or check the documentation.

    • The node autoscaler policy now supports GCP Preemptive Instances.

    • We introduced additional validations in GCP credentials onboarding.

    • As always our team took care of bug fixes, performance optimizations, and small UI improvements.

  • Release of CAST AI agent and “Savings” feature

    • Launched an agent to connect the EKS cluster (that was not created by CAST AI) to our console. Users can now connect clusters in read-only mode and use the “Savings” feature to analyze proposed optimizations and their impact on the cloud bill.

    • Revamped dashboard UI.

    • Node interruptions made visible in the logs data via Audit log UI.

    • Canada East (Montréal) is now a supported region in our cluster creation flow.

    • Fixed minor bugs.

  • Improved GCP credentials creation & Launch of CAST CLI

    • We have simplified the user credentials creation process for GCP.

    • You can now control your clusters using our own Command Line Interface (CLI)

    • Improved handling of Kubernetes nodes and load balancers, so the status of the nodes is tracked, and load balancers are removed when appropriate.

    • Improved Unschedulable Pods policy to peak in to the future and consider nodes that are being created.

    • Now users can process subscription payments without leaving our console.
    • Improved structure of our documentation; check it at docs.cast.ai.

    • Updated UI elements in our console and, as always, our team shipped some bug-fixes.

    • Launched the status page so our customers can check the health of our platform.

  • Master Node Configuration & General Improvements

    • Now you can Add/Remove additional master nodes on the live cluster. Convert a single non-high availability control plane to 1, 3, or 5 nodes and vice versa.
    • The newly updated and easier to understand policy is now included as part of our Unschedulable Pods policy configuration. Read more in our documentation.
    • Digital Ocean cluster deletion is improved by handling dependencies timing better.
    • Other small and various improvements.

     

  • New Upgrades & Visible Improvements

    • We upgraded Kubernetes to version 1.19 and bumped Cilium up to version 1.9. Take it for a spin here.
    • If you’re creating a new cluster with Azure as one of the providers, it will now use non-burstable Azure instance types.
    • Get more control if you see the need: interrupt and add a Spot Instance Node right from your Node list.
    • And, as always, we’ve shipped some bug-fixes and performance improvements.
  • CAST AI welcomes the beloved Developer Cloud!

    • You asked, we’ve delivered: DigitalOcean is now part of our ever-growing list of supported cloud service providers. Starting now, you can stretch your Kubernetes clusters across DO, AWS, GCP and Azure. Sign up here !
  • Support for Spot/Preemptive Instances added

    • Spot instances, if applied correctly, can yield up to 60-80% cloud savings and are really useful for stateless workloads (think, micro-services). So, starting now, if you want to, we can tell our optimization engine to start purchasing Spot (Preemptive on GCP) instances for you. And if these instances are interrupted by the cloud provider, we automatically replace them! GCP & Azure instances will follow very shortly. Read more in our documentation.
  • Support for Wireguard

    • If you want to use Wireguard as an alternative to Cloud VPN, you can now! Read more in our documentation.
  • CAST AI joined Cloud Native Computing Foundation

    • We’ve joined CNCF as full members. You’ll see more of us talking about true Multi-Cloud in CNCF events from now on!
  • Additional changes to CAST AI console

    • Create your API authentication tokens in the console
    • CAST AI API is moved to a more intuitive domain – api.cast.ai
  • New Terraform provider

  • New documentation hub

    • Access CAST AI documentation at docs.cast.ai. We’ve reworked it so you can find what you need more easily
  • Free AWS and Google Cloud credentials

    • You can claim your free credentials for AWS and Google Cloud in our Slack community. Try out our product for free for a limited time!
  • Improved Azure cloud credentials

    • Improvements in how Azure cloud credentials are created
  • A new cloud region in South America East

    • You’ve asked, we’ve delivered: choose Sao Paulo (South America East) to set up your clusters
  • Lots of additional changes in CAST AI console

    • You can now see Virtual Machine types and CPU/RAM usage in your Nodes dashboard
    • Easily copy your DNS records by accessing Global Server Load Balancer link from your cluster info widget
    • We’ve updated links to CAST AI documentation, API, and your cloud credentials
    • A new sign-up flow for easier setup
    • Initial costs are now visible when you are creating a cluster
    • Audit log tracks what actions are being performed on your cluster
  • We’ve made some changes in your cluster screen

    • CAST CSI (storage drivers) now support cloud native storage snapshots
    • We’ve increased security of your K8s clusters
    • You can now scale your apps easier with KEDA add-on installed with pod autoscaler policies
    • Now, when autoscaler scales down cluster nodes, RAM is considered more
    • CPU policy acts as a hand-rail, limiting minimum & maximum CPU cores per your cluster
    • Prometheus in your cluster was moved to Control-Plane (Master) node

// get started

Proof of Concept in 5 days

CAST AI starts saving the moment you onboard. Complete your PoC in days, not months and get an ROI report right after.

The 10 Coolest Cloud Computing Startups of 2023

40 Top Cloud Trends and Private Companies

Users love CAST AI on G2 CAST AI is a leader in Cloud Cost Management on G2 CAST AI is a leader in Cloud Cost Management on G2 CAST AI is a leader in Small-Business Cloud Cost Management on G2