Using Application Health Checks

Page last updated:

This topic describes how to configure health checks for your applications in Cloud Foundry.

Overview

An application health check is a monitoring process that continually checks the status of a running Cloud Foundry app.

Developers can configure a health check for an application using the Cloud Foundry Command Line Interface (cf CLI) or by specifying the health-check-http-endpoint and health-check-type fields in an application manifest.

To configure a health check using the cf CLI, follow the instructions in the Configure Health Checks section below. For more information about using an application manifest to configure a health check, see the health-check-http-endpoint and health-check-type sections of the Deploying with Application Manifest topic.

Application health checks function as part of the app lifecycle managed by Diego architecture.

Configure Health Checks

To configure a health check while creating or updating an application, use the cf push command:

$ cf push YOUR-APP -u HEALTH-CHECK-TYPE -t HEALTH-CHECK-TIMEOUT

Replace the placeholders in the example command above as follows:

  • HEALTH-CHECK-TYPE: Valid health check types are port, process, and http. See the Health Check Types section below for more information.
  • HEALTH-CHECK-TIMEOUT: The timeout is the amount of time allowed to elapse between starting up an application and the first healthy response. See the Health Check Timeouts section for more information.

Note: The health check configuration you provide with cf push overrides any configuration in the application manifest.

To configure a health check for an existing application or to add a custom HTTP endpoint, use the cf set-health-check command:

$ cf set-health-check YOUR-APP HEALTH-CHECK-TYPE --endpoint CUSTOM-HTTP-ENDPOINT

Replace the placeholders in the example command above as follows:

  • HEALTH-CHECK-TYPE: Valid health check types are port, process, and http. See the Health Check Types section below for more information.
  • CUSTOM-HTTP-ENDPOINT: A http health check defaults to using / as its endpoint, but you can specify a custom endpoint. See the Health Check HTTP Endpoints section below for more information.

Note: You can change the health check configuration of a deployed app with cf set-health-check, but you must restart the app for the changes to take effect.

Understand Health Checks

Health Check Lifecycle

The following table describes how application health checks work in Cloud Foundry.

Stage Description
1 Application developer deploys an app to Cloud Foundry.
2 When deploying the app, the developer specifies a health check type for the app and, optionally, a timeout. If the developer does not specify a health check type, then the monitoring process defaults to a port health check.
3 Cloud Controller stages, starts, and runs the app.
4 Based on the type specified for the app, Cloud Controller configures a health check that runs periodically for each app instance.
5 When Diego starts an app instance, the application health check runs every 2 seconds until a response indicates that the app instance is healthy or until the health check timeout elapses. The 2-second health check interval is not configurable.
6 When an app instance becomes healthy, its route is advertised, if applicable. Subsequent health checks are run every 30 seconds once the app becomes healthy. The 30-second health check interval is not configurable.
7 If a previously healthy app instance fails a health check, Diego considers that particular instance to be unhealthy. As a result, Diego stops and deletes the app instance, then reschedules a new app instance. This stoppage and deletion of the app instance is reported back to the Cloud Controller as a crash event.
8 When an app instance crashes, Diego immediately attempts to restart the app instance several times. After three failed restarts, Cloud Foundry waits 30 seconds before attempting another restart. The wait time doubles each restart until the ninth restart, and remains at that duration until the 200th restart. After the 200th restart, Cloud Foundry stops trying to restart the app instance.

Health Check Types

The following table describes the types of health checks available for applications and recommended circumstances in which to use them:

Health Check Type Recommended Use Case Explanation
http The app can provide an HTTP 200 response. The http health check performs a GET request to the configured http endpoint. When the health check receives an HTTP 200 response, the app is declared healthy. We recommend using the http health check type whenever possible. A healthy HTTP response ensures that the web app is ready to serve HTTP requests. The configured endpoint must respond within 1 second to be considered healthy.
port The app can receive TCP connections (including HTTP web applications). A health check makes a TCP connection to the port or ports configured for the app. For applications with multiple ports, a health check monitors each port. If you do not specify a health check type for your app, then the monitoring process defaults to a port health check. The TCP connection must be established within 1 second to be considered healthy.
process The app does not support TCP connections (for example, a worker). For a process health check, Diego ensures that any process declared for the app stays running. If the process exits, Diego stops and deletes the app instance.

Health Check Timeouts

The value configured for the health check timeout is the amount of time allowed to elapse between starting up an app and the first healthy response from the app. If the health check does not receive a healthy response within the configured timeout, then the app is declared unhealthy.

In Pivotal Web Services, the default timeout is 60 seconds and the maximum configurable timeout is 180 seconds.

Health Check HTTP Endpoints

Only used by http type, the --endpoint flag of the cf set-health-check command specifies the path portion of a URI that must be served by the app and return HTTP 200 when the app is healthy.

Was this helpful?
What can we do to improve?
View the source for this page in GitHub