NerdGraph tutorial: Configure service levels

With New Relic you can implement service levels for your applications, consume the results easily from the UI during your planning sessions and incident response, and progressively iterate on the configuration to adjust your objectives to the desired user experience.

Besides the UI, you can also use our NerdGraph API Explorer to create and edit of SLIs and their SLOs. Alternatively, you can automate this configuration using the Terraform Service Level resource.

Importante

In order to create service levels, a user requires a permission for modifying and deleting events-to-metrics rules.

Create an SLI with an SLO

Please refer to Create and edit SLIs and SLOs to learn the basic concepts in the SLI and SLO configuration, such as the entity that an SLI is associated with. Also, you can refer to that documentation to find examples of the most common indicators for services and applications.

The following is an example NerdGraph call that creates an SLI using the serviceLevelCreate mutation query:

mutation {
  serviceLevelCreate(
    entityGuid: "entityGuid"
    indicator: {
      name: "Latency below 0.25 seconds"
      description: "The proportion of valid requests that were served faster than 0.25s, which is considered to correspond to a good experience."
      events: {
        validEvents: { from: "Transaction", where: "entityGuid = 'entityGuid'" }
        goodEvents: {
          from: "Transaction"
          where: "entityGuid = 'entityGuid' and duration < 0.25"
        }
        accountId: accountId
      }
      objectives: {
        target: 99.5
        timeWindow: { rolling: { count: 7, unit: DAY } }
      }
    }
  ) {
    id
    description
  }
}

It contains these fields:

entityGuid: The GUID of the entity (for example, service, browser application, etc.) that you want to relate this SLI to. On the UI, you can find this GUID on the entity page, under See metadata and manage tags.
description: Use detailed descriptions, including the selected threshold that determines good events.
- For example, for an availability SLI, include something like “The proportion of valid requests that were served without errors.”
- Or, for a latency SLI, include a description such as “The proportion of valid requests that were served faster than 0.25s, which is considered to correspond to a good experience”.
accountId: The ID of the account where the service or browser application belongs to, which contains the NRDB data for the SLI/SLO calculations.
badEvents.from, badEvents.where
- The NRQL query that defines bad events, SELECT count(*) FROM badEvents.from WHERE badEvents.where, requires these FROM and WHERE clauses.
- If you defined an SLI from valid and bad events, leave the goodEvents object empty.
goodEvents.from, goodEvents.where
- The NRQL query that defines good events, SELECT count(*) FROM goodEvents.from WHERE goodEvents.where, requires these FROM and WHERE clauses.
- If you defined an SLI from valid and good events, leave the badEvents object empty.
validEvents.from, validEvents.where
- These are the FROM and WHERE clauses for the NRQL query that defines valid events, which will result in SELECT count(*) FROM validEvents.from WHERE validEvents.where.
name: A short category name for your SLI to help understand what the Service Level is about. We suggest including any specific parameters and filters involved in the SLI definition. Examples:
- Availability
- Latency below 4 seconds
- CLS for desktops below 0.1
objectives: An array of objectives (SLOs) for the SLI.
- target: The target for your SLO, up to 100.00. The field supports up to 5 decimals.
  - If your users are happy with the current experience, set the SLO percentage to match the current baseline. For instance, the percentile used to determine the SLI's good events.
- timeWindow.rolling.count: The length of the period taken into consideration to calculate the SLO. The supported values are 1, 7, 14, 28, and 30.
- timeWindow.rolling.unit: DAYis the supported value.

Using `SELECT` functions and wildcard

We have an optional SELECT attribute, set to count(*) by default. If you have a more complex scenario, you can use select to be explicit about the Metric or the Event property you want to query. For the SELECT also the SUM function is supported, as well as, wildcard (%). Let's see an example of a more complex SELECT configuration.

mutation {
  serviceLevelCreate(
    entityGuid: "entityGuid"
    indicator: {
      name: "Success request"
      description: "The proportion of success requests count is 99% that the total count"
      events: {
        validEvents: {
          select: { function: SUM, attribute: "http.request.status.%.count" }
          from: "Metric"
        }
        goodEvents: {
          select: { function: SUM, attribute: "http.request.status.2%.count" }
          from: "Metric"
        }
        accountId: accountId
      }
      objectives: {
        target: 99.5
        timeWindow: { rolling: { count: 7, unit: DAY } }
      }
    }
  ) {
    id
    description
  }
}

Notice the properties validEvents and goodEvents of events now includes a select. In the select, you can configure the function:

COUNT: default function, will count number of results;
SUM: sums all values for the selected events/metric.

Another important difference in this example is the usage of wildcard (%) to query values of all metrics with the same format. Imagine your application is reporting request count by status (for example, http.request.status.200.count, http.request.status.201.count, http.request.status.400.count, etc.), the query will sum all matching metric names using the wildcard.

Retrieve the configuration of an SLI for an APM service

Use this query to retrieve the configuration of an SLI, including its id.

{
  actor {
    entity(guid: "entityGuid") {
      guid
      name
      serviceLevel {
        indicators {
          createdAt
          createdBy {
            email
          }
          description
          entityGuid
          id
          name
          objectives {
            target
            timeWindow {
              rolling {
                count
                unit
              }
            }
          }
        }
      }
    }
  }
}

Update the SLOs of an SLI

Use the serviceLevelUpdate mutation to define one or more SLOs for each one of the SLIs. To obtain the SLI's id, use the query above.

mutation {
  serviceLevelUpdate(
    id: "indicators.id"
    indicator: {
      objectives: {
        target: 99.00
        timeWindow: { rolling: { count: 7, unit: DAY } }
      }
    }
  ) {
    id
  }
}