Publicado el Deja un comentario

Amazon EC2 R8i and R8i-flex instances are now available in additional regions

Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R8i and R8i-flex instances are available in the Asia Pacific (Malaysia, Singapore) and Europe (Frankfurt) regions. These instances are powered by custom Intel Xeon 6 processors, available only on AWS, delivering the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. The R8i and R8i-flex instances offer up to 15% better price-performance, and 2.5x more memory bandwidth compared to previous generation Intel-based instances. They deliver 20% better performance than R7i instances, with even higher gains for specific workloads. They are up to 30% faster for PostgreSQL databases, up to 60% faster for NGINX web applications, and up to 40% faster for AI deep learning recommendation models compared to R7i.

R8i-flex, our first memory-optimized Flex instances, are the easiest way to get price performance benefits for a majority of memory-intensive workloads. They offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources.

R8i instances are a great choice for all memory-intensive workloads, especially for workloads that need the largest instance sizes or continuous high CPU usage. R8i instances offer 13 sizes including 2 bare metal and the new 96xlarge size for the largest applications. R8i instances are SAP-certified and deliver 142,100 aSAPS, the highest among all comparable machines in on-premises and cloud environments, delivering exceptional performance for mission-critical SAP workloads.

To get started, sign in to the AWS Management Console. Customers can purchase these instances via Savings Plans, On-Demand instances, and Spot instances. For more information about the new R8i and R8i-flex instances visit the AWS News blog.

 

​Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R8i and R8i-flex instances are available in the Asia Pacific (Malaysia, Singapore) and Europe (Frankfurt) regions. These instances are powered by custom Intel Xeon 6 processors, available only on AWS, delivering the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. The R8i and R8i-flex instances offer up to 15% better price-performance, and 2.5x more memory bandwidth compared to previous generation Intel-based instances. They deliver 20% better performance than R7i instances, with even higher gains for specific workloads. They are up to 30% faster for PostgreSQL databases, up to 60% faster for NGINX web applications, and up to 40% faster for AI deep learning recommendation models compared to R7i. R8i-flex, our first memory-optimized Flex instances, are the easiest way to get price performance benefits for a majority of memory-intensive workloads. They offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources. R8i instances are a great choice for all memory-intensive workloads, especially for workloads that need the largest instance sizes or continuous high CPU usage. R8i instances offer 13 sizes including 2 bare metal and the new 96xlarge size for the largest applications. R8i instances are SAP-certified and deliver 142,100 aSAPS, the highest among all comparable machines in on-premises and cloud environments, delivering exceptional performance for mission-critical SAP workloads. To get started, sign in to the AWS Management Console. Customers can purchase these instances via Savings Plans, On-Demand instances, and Spot instances. For more information about the new R8i and R8i-flex instances visit the AWS News blog.  

Publicado el Deja un comentario

Aplicabilidad frente a desplazamiento laboral: más notas sobre nuestra investigación reciente sobre IA y ocupaciones

septiembre 16, 2025

Aplicabilidad frente a desplazamiento laboral: más notas sobre nuestra investigación reciente sobre IA y ocupaciones

Tres íconos blancos sobre un fondo degradado que va del azul al verde. De izquierda a derecha: una estructura de red con círculos conectados, una gráfica de línea ascendente con barras y una flecha, y una lista de verificación con líneas horizontales y marcas de verificación.

Por: Kiran Tomlinson, investigador principal; Sonia Jaffe, investigadora principal; Will Wang;  Scott Counts, gerente principal senior de investigación; Siddharth Suri, investigador principal senior.

De manera reciente, publicamos un artículo (Trabajar con IA: Medición de las implicaciones ocupacionales de la IA generativa) que estudió qué ocupaciones podrían encontrar útiles los chatbots de IA y en qué grado. El documento provocó una discusión significativa, lo cual no es una sorpresa, ya que las personas se preocupan de manera importante por el futuro de la IA y los empleos, esa es parte de la razón por la que creemos que es esencial estudiar estos temas.

Por desgracia, no toda la discusión fue precisa en su descripción del alcance o las conclusiones del estudio. En específico, nuestro estudio no saca ninguna conclusión sobre la eliminación de empleos; en el documento, advertimos de manera explícita contra el uso de nuestros hallazgos para llegar a esa conclusión.

Dada la importancia de este tema, queremos aclarar cualquier malentendido y proporcionar un resumen más digerible del artículo, nuestra metodología y sus limitaciones.

¿Qué encontró nuestra investigación?

Nos propusimos comprender mejor cómo las personas usan la IA, destacamos dónde la IA podría ser útil en diferentes ocupaciones. Para hacer esto, analizamos cómo las personas usan en la actualidad la IA generativa, en específico Microsoft Bing Copilot (ahora Microsoft Copilot), para ayudar con las tareas. Luego comparamos estos conjuntos de tareas con la base de datos O*NET, un sistema de clasificación ocupacional utilizado de manera amplia, para comprender la posible aplicabilidad a varias ocupaciones.

Descubrimos que la IA es más útil para tareas relacionadas con el trabajo del conocimiento y la comunicación, en particular tareas como escribir, recopilar información y aprender.

Aquellos en ocupaciones con estas tareas pueden beneficiarse al considerar cómo se puede usar la IA como una herramienta para ayudar a mejorar sus flujos de trabajo. Por otro lado, no es sorprendente que las tareas físicas como realizar cirugías o mover objetos tuvieran una aplicabilidad menos directa de chatbot de IA.

Entonces, para resumir, nuestro artículo trata de identificar las ocupaciones en las que la IA puede ser más útil, al ayudar o realizar subtareas. Nuestros datos no indican, ni sugerimos, que ciertos trabajos serán reemplazados por IA.

Se reconocen las limitaciones metodológicas, y son importantes

El documento es transparente sobre las limitaciones de nuestro enfoque.

Analizamos las conversaciones anónimas de Bing Copilot para ver con qué actividades los usuarios buscan ayuda de IA y qué actividades pueden realizar la IA cuando se asignan a la base de datos O*NET. Si bien O*NET proporciona una lista estructurada de actividades asociadas con diversas ocupaciones, no  captura el espectro completo de habilidades, contexto y matices requeridos en el mundo real. Un trabajo es mucho más que la colección de tareas que lo componen.

Por ejemplo, una tarea puede implicar «escribir informes», pero O*NET no reflejará el juicio interpersonal, la experiencia en el dominio o las consideraciones éticas que se necesitan para hacerlo bien. El documento reconoce esta brecha y advierte contra la sobreinterpretación de los puntajes de aplicabilidad de la IA como medidas de la capacidad de la IA para realizar una ocupación.

Además, el conjunto de datos se basa en las consultas de los usuarios de Bing Copilot (de enero a septiembre de 2024), que pueden verse influenciadas por factores como el conocimiento, el acceso o la comodidad con las herramientas de IA. Diferentes personas usan diferentes LLM para diferentes propósitos y también es muy difícil (o casi imposible) determinar qué conversaciones se realizan en un contexto laboral o por ocio.

Por último, solo evaluamos el uso de chatbots de IA, por lo que este estudio no evalúa el impacto o la aplicabilidad de otras formas de IA.

¿A dónde vamos desde aquí?

Dado el intenso interés en cómo la IA dará forma a nuestro futuro colectivo, es importante que sigamos con el estudio y la mejor comprensión de su impacto social y económico. Al igual que con todas las investigaciones sobre este tema, los hallazgos tienen matices y es importante prestar atención a este matiz.

El interés público en nuestra investigación se basa, en gran parte, en el tema de la IA y el desplazamiento laboral. Sin embargo, es poco probable que la metodología actual para este estudio conduzca a conclusiones firmes al respecto.  La IA puede resultar una herramienta útil para muchas ocupaciones, y creemos que el equilibrio adecuado radica en encontrar cómo usar la tecnología de una manera que aproveche sus habilidades al tiempo que complementa las fortalezas humanas y tiene en cuenta las preferencias de las personas.

Para obtener más información de Microsoft sobre el futuro del trabajo y las habilidades de IA, consulten el Índice anual de tendencias laborales de Microsoft y Microsoft Elevate.

The post Aplicabilidad frente a desplazamiento laboral: más notas sobre nuestra investigación reciente sobre IA y ocupaciones appeared first on Source LATAM.

 

​The post Aplicabilidad frente a desplazamiento laboral: más notas sobre nuestra investigación reciente sobre IA y ocupaciones appeared first on Source LATAM.  

Publicado el Deja un comentario

Amazon OpenSearch Service announces Star-Tree Index

OpenSearch has introduced Star-Tree Index, a new feature that significantly improves aggregation performance for high-cardinality and multi-dimensional queries. This index pre-aggregates data across configured dimensions and metrics at ingestion time, enabling sub-second response times for frequent aggregations like terms, histogram, and range.

Star-Tree Index is designed for real-time analytics and requires no changes to query syntax; OpenSearch automatically uses the optimized path when supported queries are detected. Early benchmarks show faster aggregation performance on large datasets. This makes it ideal for use cases such as observability, personalization, and time-series dashboards. It works best with append-only data and builds during segment refresh/merge, with minimal impact on ingestion throughput.

Star-Tree Index is available in all regions where OpenSearch 3.1 is supported. The feature is opt-in and can be enabled at index creation time using composite index settings.

Please refer to the AWS Regional Services List for more information about Amazon OpenSearch Service availability. To learn more about Star-Tree Index, see the OpenSearch Documentation

 

​OpenSearch has introduced Star-Tree Index, a new feature that significantly improves aggregation performance for high-cardinality and multi-dimensional queries. This index pre-aggregates data across configured dimensions and metrics at ingestion time, enabling sub-second response times for frequent aggregations like terms, histogram, and range. Star-Tree Index is designed for real-time analytics and requires no changes to query syntax; OpenSearch automatically uses the optimized path when supported queries are detected. Early benchmarks show faster aggregation performance on large datasets. This makes it ideal for use cases such as observability, personalization, and time-series dashboards. It works best with append-only data and builds during segment refresh/merge, with minimal impact on ingestion throughput. Star-Tree Index is available in all regions where OpenSearch 3.1 is supported. The feature is opt-in and can be enabled at index creation time using composite index settings. Please refer to the AWS Regional Services List for more information about Amazon OpenSearch Service availability. To learn more about Star-Tree Index, see the OpenSearch Documentation  

Publicado el Deja un comentario

Amazon OpenSearch Service announces Derived Source for storage optimization

Amazon OpenSearch Service introduces support for Derived Source, a new feature that can help reduce the amount of storage required for your OpenSearch Service domains. With derived source support, you can skip storing source fields and dynamically derive them when required. 

OpenSearch stores each ingested document in the _source field and also indexes individual fields for search. The _source field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the _source field and instead reconstruct it dynamically when needed, for example, during search, get, mget, reindex, or update operations.

Derived Source is available in all regions where OpenSearch 3.1 is supported. The feature is opt-in and can be enabled at index creation using composite index settings.

Please refer to the AWS Regional Services List for more information about Amazon OpenSearch Service availability. To learn more about Derived Source, see the OpenSearch documentation.

 

​Amazon OpenSearch Service introduces support for Derived Source, a new feature that can help reduce the amount of storage required for your OpenSearch Service domains. With derived source support, you can skip storing source fields and dynamically derive them when required.  OpenSearch stores each ingested document in the _source field and also indexes individual fields for search. The _source field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the _source field and instead reconstruct it dynamically when needed, for example, during search, get, mget, reindex, or update operations. Derived Source is available in all regions where OpenSearch 3.1 is supported. The feature is opt-in and can be enabled at index creation using composite index settings. Please refer to the AWS Regional Services List for more information about Amazon OpenSearch Service availability. To learn more about Derived Source, see the OpenSearch documentation.  

Publicado el Deja un comentario

Amazon S3 Batch Operations now supports managing buckets or prefixes in a single step in AWS Management Console

Amazon S3 Batch Operations now supports managing objects within an S3 bucket, prefix, suffix, or more, in a single step in AWS Management Console. When creating an S3 Batch Operation, customers can specify the objects on which to perform the operation. With this feature you have the option to instead specify an entire bucket, prefix, suffix, creation date, or storage class. Amazon S3 Batch Operations will then quickly apply the operation to all the matching objects and notify you when the job completes.

S3 Batch Operations lets you easily perform one-time or recurring batch workloads such as copying objects between staging and production buckets, restoring archived backups from S3 Glacier storage classes, or computing objects checksum to verify the content of stored datasets, at any scale. After starting your job, S3 Batch Operations automatically processes all of the objects that match your filtering criteria. You will receive a detailed completion report with the status of each object once the job completes.

This feature of S3 Batch Operations is available in all AWS Regions. You can get started through AWS Management Console, AWS Command Line Interface (CLI), or the AWS Software Development Kit (SDK) client. For pricing information, please visit the Management & Insights tab of the Amazon S3 pricing page. To learn more about S3 Batch Operations, visit the S3 User Guide.

 

​Amazon S3 Batch Operations now supports managing objects within an S3 bucket, prefix, suffix, or more, in a single step in AWS Management Console. When creating an S3 Batch Operation, customers can specify the objects on which to perform the operation. With this feature you have the option to instead specify an entire bucket, prefix, suffix, creation date, or storage class. Amazon S3 Batch Operations will then quickly apply the operation to all the matching objects and notify you when the job completes.
S3 Batch Operations lets you easily perform one-time or recurring batch workloads such as copying objects between staging and production buckets, restoring archived backups from S3 Glacier storage classes, or computing objects checksum to verify the content of stored datasets, at any scale. After starting your job, S3 Batch Operations automatically processes all of the objects that match your filtering criteria. You will receive a detailed completion report with the status of each object once the job completes. This feature of S3 Batch Operations is available in all AWS Regions. You can get started through AWS Management Console, AWS Command Line Interface (CLI), or the AWS Software Development Kit (SDK) client. For pricing information, please visit the Management & Insights tab of the Amazon S3 pricing page. To learn more about S3 Batch Operations, visit the S3 User Guide.  

Publicado el Deja un comentario

Now generally available: Amazon EC2 R8gn instances

Today, AWS announces the general availability of the new Amazon Elastic Compute Cloud (Amazon EC2) R8gn instances. These instances are powered by AWS Graviton4 processors to deliver up to 30% better compute performance than AWS Graviton3 processors. R8gn instances feature the latest 6th generation AWS Nitro Cards, and offer up to 600 Gbps network bandwidth, the highest network bandwidth among network optimized EC2 instances.

Take advantage of the enhanced networking capabilities of R8gn to scale the performance and throughput of network-intensive workloads such as SQL and NoSQL databases, and in-memory databases. For increased scalability, these instances offer instance sizes up to 48xlarge, including two metal sizes, up to 1,536 GiB of memory, and up to 60 Gbps of bandwidth to Amazon Elastic Block Store (EBS). These instances support Elastic Fabric Adapter (EFA) networking on the 16xlarge, 24xlarge, 48xlarge, metal-24xl, and metal-48xl sizes, which enables lower latency and improved cluster performance for workloads deployed on tightly coupled clusters.

The new instances are available in the following AWS Regions: US East (N. Virginia), and US West (Oregon). Metal sizes are only available in US East (N. Virginia).

To learn more, see Amazon R8gn Instances. To begin your Graviton journey, visit the Level up your compute with AWS Graviton page. To get started, see AWS Management Console, AWS Command Line Interface (AWS CLI), and AWS SDKs.

 

​Today, AWS announces the general availability of the new Amazon Elastic Compute Cloud (Amazon EC2) R8gn instances. These instances are powered by AWS Graviton4 processors to deliver up to 30% better compute performance than AWS Graviton3 processors. R8gn instances feature the latest 6th generation AWS Nitro Cards, and offer up to 600 Gbps network bandwidth, the highest network bandwidth among network optimized EC2 instances. Take advantage of the enhanced networking capabilities of R8gn to scale the performance and throughput of network-intensive workloads such as SQL and NoSQL databases, and in-memory databases. For increased scalability, these instances offer instance sizes up to 48xlarge, including two metal sizes, up to 1,536 GiB of memory, and up to 60 Gbps of bandwidth to Amazon Elastic Block Store (EBS). These instances support Elastic Fabric Adapter (EFA) networking on the 16xlarge, 24xlarge, 48xlarge, metal-24xl, and metal-48xl sizes, which enables lower latency and improved cluster performance for workloads deployed on tightly coupled clusters. The new instances are available in the following AWS Regions: US East (N. Virginia), and US West (Oregon). Metal sizes are only available in US East (N. Virginia). To learn more, see Amazon R8gn Instances. To begin your Graviton journey, visit the Level up your compute with AWS Graviton page. To get started, see AWS Management Console, AWS Command Line Interface (AWS CLI), and AWS SDKs.  

Publicado el Deja un comentario

Amazon Managed Service for Prometheus now available in 11 additional AWS Regions

Amazon Managed Service for Prometheus is now available in Asia Pacific (Jakarta), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Asia Pacific (Taipei), Canada West (Calgary), Europe (Spain), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain), and US West (N. California). Amazon Managed Service for Prometheus is a fully managed Prometheus-compatible monitoring service that makes it easy to monitor and alarm on operational metrics at scale.

The list of all supported regions where Amazon Managed Service for Prometheus is generally available can be found in the user guide. Customers can send up to 1 billion active metrics to a single workspace and can create multiple workspaces per account, where a workspace is a logical space dedicated to the storage and querying of Prometheus metrics.

To learn more about Amazon Managed Service for Prometheus, visit the user guide or product page.

 

​Amazon Managed Service for Prometheus is now available in Asia Pacific (Jakarta), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Asia Pacific (Taipei), Canada West (Calgary), Europe (Spain), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain), and US West (N. California). Amazon Managed Service for Prometheus is a fully managed Prometheus-compatible monitoring service that makes it easy to monitor and alarm on operational metrics at scale.
The list of all supported regions where Amazon Managed Service for Prometheus is generally available can be found in the user guide. Customers can send up to 1 billion active metrics to a single workspace and can create multiple workspaces per account, where a workspace is a logical space dedicated to the storage and querying of Prometheus metrics.
To learn more about Amazon Managed Service for Prometheus, visit the user guide or product page.  

Publicado el Deja un comentario

Announcing on-demand deployment for custom Meta Llama models in Amazon Bedrock

Starting today, customers can use the on-demand deployment option in Amazon Bedrock for their Meta Llama 3.3 models that have been fine-tuned or distilled in Bedrock. Models customized on or after September 15, 2025 will be eligible.

This enables Bedrock customers to reduce costs by processing requests in real time without requiring pre-provisioned compute resources. Customers only pay for what they use, eliminating the need for an always-on infrastructure.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies via a single API. Amazon Bedrock also provides a broad set of capabilities customers need to build generative AI applications with security, privacy, and responsible AI built in.

To get started, visit documentation here.

 

​Starting today, customers can use the on-demand deployment option in Amazon Bedrock for their Meta Llama 3.3 models that have been fine-tuned or distilled in Bedrock. Models customized on or after September 15, 2025 will be eligible. This enables Bedrock customers to reduce costs by processing requests in real time without requiring pre-provisioned compute resources. Customers only pay for what they use, eliminating the need for an always-on infrastructure. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies via a single API. Amazon Bedrock also provides a broad set of capabilities customers need to build generative AI applications with security, privacy, and responsible AI built in. To get started, visit documentation here.  

Publicado el Deja un comentario

AWS Organizations now provides account state information for member accounts

AWS Organizations provides a new State field in the AWS Organizations Console and APIs (DescribeAccount, ListAccounts, and ListAccountsForParent) to enhance AWS account lifecycle visibility. With this launch, the account state, a new State field replaced the existing account status, Status field in the AWS Organizations Console, however both Status and State fields will remain available in the APIs until September 9, 2026.

This launch allows you to have a more granular account state information such as, ‘SUSPENDED’ for AWS-enforced suspension, ‘PENDING_CLOSURE’ for in-process closure requests, and ‘CLOSED’ for accounts in their 90-day reinstatement window, and more. After September, 2026 the Status field will be fully deprecated. Customers using account vending pipelines should update their implementations to reference the State field before the Status field deprecation date. This feature is available in all AWS commercial and AWS GovCloud (US) Regions. To get started managing your accounts, please see the blog post and documentation.

 

​AWS Organizations provides a new State field in the AWS Organizations Console and APIs (DescribeAccount, ListAccounts, and ListAccountsForParent) to enhance AWS account lifecycle visibility. With this launch, the account state, a new State field replaced the existing account status, Status field in the AWS Organizations Console, however both Status and State fields will remain available in the APIs until September 9, 2026. This launch allows you to have a more granular account state information such as, ‘SUSPENDED’ for AWS-enforced suspension, ‘PENDING_CLOSURE’ for in-process closure requests, and ‘CLOSED’ for accounts in their 90-day reinstatement window, and more. After September, 2026 the Status field will be fully deprecated. Customers using account vending pipelines should update their implementations to reference the State field before the Status field deprecation date. This feature is available in all AWS commercial and AWS GovCloud (US) Regions. To get started managing your accounts, please see the blog post and documentation.  

Publicado el Deja un comentario

Amazon SageMaker HyperPod announces health monitoring agent support for Slurm clusters

Today, Amazon SageMaker HyperPod announces the general availability of the health monitoring agent for Slurm clusters. SageMaker HyperPod helps you provision resilient clusters for running machine learning (ML) workloads and developing state-of-the-art models such as large language models (LLMs), diffusion models, and foundation models (FMs). The health monitoring agent performs passive, background health checks of instances to identify problems in key areas without impact on application behavior or performance, flags failures instantly, and replaces any unhealthy instances to keep your training jobs running smoothly. 

The agent runs continuously on all GPU- or Trainium-based nodes in your HyperPod cluster, watching for hardware issues such as unresponsive GPUs or NVLink error counters. When a fault is detected, it marks the node as unhealthy and automatically reboots or replaces it with a healthy node, keeping your jobs running without requiring manual intervention. The agent also follows a co-ordinated approach to handling failures with the job auto-resume functionality available with Slurm clusters. For example, jobs with auto-resume enabled will continue from the last saved checkpoint once nodes are replaced by the agent. This hands-free recovery—already available on HyperPod clusters orchestrated with Amazon EKS—now gives Slurm clusters the same resilient environment, helping teams train large models for weeks without disruption and reclaim time and costs that would otherwise be lost to mid-run failures. In addition, customers can now also reboot their nodes using a simple command in case of intermittent issues such as GPU driver issues requiring reset. 

Health monitoring agent for Slurm is available in all regions where HyperPod is generally available. The agent is auto-enabled on all newly created Slurm clusters; to enable it on an existing cluster, simply upgrade to the latest HyperPod AMI by calling the UpdateClusterSoftware API. To learn more, visit the Amazon SageMaker HyperPod documentation.

 

​Today, Amazon SageMaker HyperPod announces the general availability of the health monitoring agent for Slurm clusters. SageMaker HyperPod helps you provision resilient clusters for running machine learning (ML) workloads and developing state-of-the-art models such as large language models (LLMs), diffusion models, and foundation models (FMs). The health monitoring agent performs passive, background health checks of instances to identify problems in key areas without impact on application behavior or performance, flags failures instantly, and replaces any unhealthy instances to keep your training jobs running smoothly. 
The agent runs continuously on all GPU- or Trainium-based nodes in your HyperPod cluster, watching for hardware issues such as unresponsive GPUs or NVLink error counters. When a fault is detected, it marks the node as unhealthy and automatically reboots or replaces it with a healthy node, keeping your jobs running without requiring manual intervention. The agent also follows a co-ordinated approach to handling failures with the job auto-resume functionality available with Slurm clusters. For example, jobs with auto-resume enabled will continue from the last saved checkpoint once nodes are replaced by the agent. This hands-free recovery—already available on HyperPod clusters orchestrated with Amazon EKS—now gives Slurm clusters the same resilient environment, helping teams train large models for weeks without disruption and reclaim time and costs that would otherwise be lost to mid-run failures. In addition, customers can now also reboot their nodes using a simple command in case of intermittent issues such as GPU driver issues requiring reset. 
Health monitoring agent for Slurm is available in all regions where HyperPod is generally available. The agent is auto-enabled on all newly created Slurm clusters; to enable it on an existing cluster, simply upgrade to the latest HyperPod AMI by calling the UpdateClusterSoftware API. To learn more, visit the Amazon SageMaker HyperPod documentation.