When you first install the Kubernetes integration , we deploy a default set of recommended alerts conditions and dashboards to your account that form the basis for alert conditions and dashboards on your Kubernetes cluster. Alert conditions are grouped into a couple of policies: Kubernetes alert policy  and Google Kubernetes Engine alert policy .
While we've tried to address the most common use cases in all environments, there are a number of additional alerts you can set up to extend the default policy. See Getting started with New Relic alerts  to know more about alerts.
Adding the recommended alert conditions and dashboards  To add recommended alert policies and dashboards, follow these steps:
Go to one.newrelic.com  > Integrations & Agents .
In the search box, type kubernetes.
Select one of these options:
Kubernetes : To add the default set of recommended alert conditions and a dashboard.
Google Kubernetes Engine : To add the default set of recommended Google Kubernetes engine alert conditions and a dashboard.
Click Begin installation  if you need to install the Kubernetes integration or click Skip this step  if you already setup this integration.
Depending on the option you selected in step 3, you'll see different resources to add.
Default set of recommended alert conditions and a dashboard when you select Kubernetes  in step 3.
Default set of recommended Google Kubernetes engine alert conditions and a dashboard when you select Google Kubernetes Engine  in step 3.
Click See your data  to see a dashboard with your Kubernetes data in New Relic.
How to see the recommended alert policies  To view the recommended alert policies you've added, do this:
Go to one.newrelic.com > All capabilities  > Alerts .
Click Alert Policies  in the left navigation pane.
You'll see Kubernetes alert policy  and Google Kubernetes engine alert policy .
How to see the Kubernetes dashboards  There is a collection of recommended pre-built dashboards to help you instantly visualize your Kubernetes data for common use cases. See Manage your recommended dashboards  to know how to see these dashboards.
Kubernetes alert policy  This is the default set of recommended alert conditions you'll add:
Kubernetes Dashboard (dashboard) This dashboard includes charts and visualizations that help you instantly visualize your Kubernetes data for common use cases.
Container CPU throttling is high (alert condition) This alert condition generates an alert when a container is throttled by more than 25% for more than 5 minutes. It runs this query:
SELECT   sum ( containerCpuCfsThrottledPeriodsDelta )   /   sum ( containerCpuCfsPeriodsDelta )   *   100   
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET containerName ,  podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Container high CPU utilization (alert condition) This alert condition generates an alert when the average container CPU usage against the limit exceeds 90% for over 5 minutes. It runs this query:
SELECT  average ( cpuCoresUtilization ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET containerName ,  podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Container high memory utilization (alert condition) This alert condition generates an alert when the average container memory usage against the limit exceeds 90% for over 5 minutes. It runs this query:
SELECT  average ( memoryWorkingSetUtilization ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET containerName ,  podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Container is restarting (alert condition) This alert condition generates an alert when container restarts exceed 0 in a 5-minute sliding window. It runs this query:
SELECT   sum ( restartCountDelta ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET containerName ,  podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Container is waiting (alert condition) This alert condition generates an alert when a container waits over 5 minutes. It runs this query:
SELECT  uniqueCount ( podName ) 
WHERE   status   =   'Waiting'   AND  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET containerName ,  podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Daemonset is missing pods (alert condition) This alert condition generates an alert when the daemonset is missing any pods for a period longer than 5 minutes. It runs this query:
SELECT  latest ( podsMissing ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' ) 
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' ) 
FACET daemonsetName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Deployment is missing pods (alert condition) This alert condition generates an alert when the deployment is missing any pods for a period longer than 5 minutes. It runs this query:
SELECT  latest ( podsMissing ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET deploymentName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Etcd file descriptor utilization is high (alert condition)This alert condition generates an alert when the Etcd file descriptor usage exceeds 90% for over 5 minutes. It runs this query:
SELECT   max ( processFdsUtilization ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' ) 
FACET displayName ,  clusterName 
See the GitHub configuration file  for more info.
Etcd has no leader (alert condition)This alert condition generates an alert when the Etcd file descriptor is leaderless for over 1 minute. It runs this query:
SELECT   min ( etcdServerHasLeader ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET displayName ,  clusterName 
See the GitHub configuration file  for more info.
HPA current replicas < desired replicas (alert condition) This alert condition generates an alert when the current replicas of a horizontal pod autoscaler are lower than the desired replicas for more than 5 minutes. It runs this query:
SELECT  latest ( desiredReplicas  -  currentReplicas ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET displayName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
HPA has reached maximum replicas (alert condition) This alert condition generates an alert when a horizontal pod autoscaler exceeds 5 replicas. It runs this query:
SELECT  latest ( maxReplicas  -  currentReplicas ) 
WHERE  clusterName  in   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET displayName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Job Failed (alert condition) This alert condition generates an alert when a job reports a failed status. It runs this query:
SELECT  uniqueCount ( jobName ) 
WHERE  failed  =   'true'   and  clusterName  in   ( 'YOUR_CLUSTER_NAME' )   and  namespaceName  in   ( 'YOUR_NAMESPACE_NAME' )  facet jobName ,  namespaceName ,  clusterName ,  failedPodsReason 
See the GitHub configuration file  for more info.
More than 5 pods failing in namespace (alert condition) This alert condition generates an alert when more than 5 pods in a namespace fail for more than 5 minutes. It runs this query:
SELECT  uniqueCount ( podName ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Node allocatable CPU utilization is high (alert condition) This alert condition generates an alert when the average node allocable CPU utilization exceeds 90% for more than 5 minutes. It runs this query:
SELECT  average ( allocatableCpuCoresUtilization ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET nodeName ,  clusterName 
See the GitHub configuration file  for more info.
Node allocatable memory utilization is high (alert condition) This alert condition generates an alert when the average node allocable memory utilization exceeds 90% for more than 5 minutes. It runs this query:
SELECT  average ( allocatableMemoryUtilization ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET nodeName ,  clusterName 
See the GitHub configuration file  for more info.
Node is not ready (alert condition) This alert condition generates an alert when a node is unavailable for 5 minutes. It runs this query:
SELECT  latest ( condition . Ready ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET nodeName ,  clusterName 
See the GitHub configuration file  for more info.
Node is unschedulable (alert condition) This alert condition generates an alert when a node is marked unscheduled. It runs this query:
SELECT  latest ( unschedulable ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET nodeName ,  clusterName 
See the GitHub configuration file  for more info.
Node pod count nearing capacity (alert condition) This alert condition generates an alert when a node's running pods exceed 90% of the node's pod capacity for more than 5 minutes. It runs this query:
FROM  K8sPodSample ,  K8sNodeSample 
   )   /  latest ( capacityPods )   *   100 
WHERE  nodeName  !=   ''   AND  nodeName  IS   NOT   NULL   
AND  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET nodeName ,  clusterName 
See the GitHub configuration file  for more info.
Node root file system capacity utilization is high (alert condition) This alert condition generates an alert when the average node root file system capacity utilization exceeds 90% for more than 5 minutes. It runs this query:
SELECT  average ( fsCapacityUtilization ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET nodeName ,  clusterName 
See the GitHub configuration file  for more info.
Persistent volume has errors (alert condition) This alert condition generates an alert when persistent volume is in a failed or pending state for more than 5 minutes. It runs this query:
FROM  K8sPersistentVolumeSample 
SELECT  uniqueCount ( volumeName ) 
WHERE  statusPhase  IN   ( 'Failed' , 'Pending' )   
AND  clusterName  IN   ( 'YOUR_CLUSTER_NAME' )   
FACET volumeName ,  clusterName 
See the GitHub configuration file  for more info.
Pod cannot be scheduled (alert condition) This alert condition generates an alert when a pod is unable to be scheduled for more than 5 minutes. It runs this query:
SELECT  latest ( isScheduled ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' ) 
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Pod is not ready (alert condition) This alert condition generates an alert when a pod is unavailable for over 5 minutes. It runs this query:
WHERE   status   NOT   IN   ( 'Failed' ,   'Succeeded' )   
AND  clusterName  IN   ( 'YOUR_CLUSTER_NAME' ) 
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' ) 
FACET podName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
statefulset is missing pods (alert condition)This alert condition generates an alert when statefulset is missing pods over 5 minutes. It runs this query:
FROM  K8sStatefulsetSample 
SELECT  latest ( podsMissing ) 
WHERE  clusterName  IN   ( 'YOUR_CLUSTER_NAME' ) 
AND  namespaceName  IN   ( 'YOUR_NAMESPACE_NAME' )   
FACET daemonsetName ,  namespaceName ,  clusterName 
See the GitHub configuration file  for more info.
Google Kubernetes engine alert policy  This is the default set of recommended Google Kubernetes engine alert conditions you'll add:
Google Kubernetes Engine (dashboard) This dashboard includes charts and visualizations that help you instantly visualize your Google Kubernetes data for common use cases.
High CPU utilization (alert condition) This alert condition generates an alert when a node's CPU utilization exceeds 90% for at least 15 minutes. It runs this query:
SELECT   max ( ` gcp.kubernetes.node.cpu.allocatable_utilization ` )   *   100 
WHERE  clusterName  LIKE   '%'  FACET gcp . kubernetes . nodeName 
See the GitHub configuration file  for more info.
High memory usage (alert condition) This alert condition generates an alert when a node's memory usage exceeds 85% of its total capacity. It runs this query:
SELECT   max ( gcp . kubernetes . node . memory . allocatable_utilization )   *   100   
WHERE  clusterName  LIKE   '%'  FACET gcp . kubernetes . nodeName 
See the GitHub configuration file  for more info.