With New Relic you can implement service levels for your applications, consume the results easily from the UI during your planning sessions and incident response, and progressively iterate on the configuration to adjust your objectives to the desired user experience.
Besides the UI, you can also use our NerdGraph API Explorer to create and edit of SLIs and their SLOs. Alternatively, you can automate this configuration using the Terraform Service Level resource.
Importante
In order to create service levels, a user requires a permission for modifying and deleting events-to-metrics rules.
Create an SLI with an SLO
Please refer to Create and edit SLIs and SLOs to learn the basic concepts in the SLI and SLO configuration, such as the entity that an SLI is associated with. Also, you can refer to that documentation to find examples of the most common indicators for services and applications.
The following is an example NerdGraph call that creates an SLI using the serviceLevelCreate
mutation query:
mutation { serviceLevelCreate( entityGuid: "entityGuid" indicator: { name: "Latency below 0.25 seconds" description: "The proportion of valid requests that were served faster than 0.25s, which is considered to correspond to a good experience." events: { validEvents: { from: "Transaction", where: "entityGuid = 'entityGuid'" } goodEvents: { from: "Transaction" where: "entityGuid = 'entityGuid' and duration < 0.25" } accountId: accountId } objectives: { target: 99.5 timeWindow: { rolling: { count: 7, unit: DAY } } } } ) { id description }}
It contains these fields:
entityGuid
: The GUID of the entity (for example, service, browser application, etc.) that you want to relate this SLI to. On the UI, you can find this GUID on the entity page, under See metadata and manage tags.description
: Use detailed descriptions, including the selected threshold that determines good events.- For example, for an availability SLI, include something like “The proportion of valid requests that were served without errors.”
- Or, for a latency SLI, include a description such as “The proportion of valid requests that were served faster than 0.25s, which is considered to correspond to a good experience”.
accountId
: The ID of the account where the service or browser application belongs to, which contains the NRDB data for the SLI/SLO calculations.badEvents.from
,badEvents.where
- The NRQL query that defines bad events,
SELECT count(*) FROM badEvents.from WHERE badEvents.where
, requires theseFROM
andWHERE
clauses. - If you defined an SLI from valid and bad events, leave the
goodEvents
object empty.
- The NRQL query that defines bad events,
goodEvents.from
,goodEvents.where
- The NRQL query that defines good events,
SELECT count(*) FROM goodEvents.from WHERE goodEvents.where
, requires theseFROM
andWHERE
clauses. - If you defined an SLI from valid and good events, leave the
badEvents
object empty.
- The NRQL query that defines good events,
validEvents.from
,validEvents.where
- These are the
FROM
andWHERE
clauses for the NRQL query that defines valid events, which will result inSELECT count(*) FROM validEvents.from WHERE validEvents.where
.
- These are the
name
: A short category name for your SLI to help understand what the Service Level is about. We suggest including any specific parameters and filters involved in the SLI definition. Examples:- Availability
- Latency below 4 seconds
- CLS for desktops below 0.1
objectives
: An array of objectives (SLOs) for the SLI.target
: The target for your SLO, up to 100.00. The field supports up to 5 decimals.- If your users are happy with the current experience, set the SLO percentage to match the current baseline. For instance, the percentile used to determine the SLI's good events.
timeWindow.rolling.count
: The length of the period taken into consideration to calculate the SLO. The supported values are1
,7
,14
,28
, and30
.timeWindow.rolling.unit
:DAY
is the supported value.
Using SELECT
functions and wildcard
We have an optional SELECT
attribute, set to count(*)
by default. If you have a more complex scenario, you can use select
to be explicit about the Metric or the Event property you want to query. For the SELECT
also the SUM
function is supported, as well as, wildcard (%
). Let's see an example of a more complex SELECT
configuration.
mutation { serviceLevelCreate( entityGuid: "entityGuid" indicator: { name: "Success request" description: "The proportion of success requests count is 99% that the total count" events: { validEvents: { select: { function: SUM, attribute: "http.request.status.%.count" } from: "Metric" } goodEvents: { select: { function: SUM, attribute: "http.request.status.2%.count" } from: "Metric" } accountId: accountId } objectives: { target: 99.5 timeWindow: { rolling: { count: 7, unit: DAY } } } } ) { id description }}
Notice the properties validEvents
and goodEvents
of events
now includes a select
. In the select, you can configure the function:
COUNT
: default function, will count number of results;SUM
: sums all values for the selected events/metric.
Another important difference in this example is the usage of wildcard (%
) to query values of all metrics with the same format. Imagine your application is reporting request count by status (for example, http.request.status.200.count
, http.request.status.201.count
, http.request.status.400.count
, etc.), the query will sum all matching metric names using the wildcard.
Retrieve the configuration of an SLI for an APM service
Use this query to retrieve the configuration of an SLI, including its id
.
{ actor { entity(guid: "entityGuid") { guid name serviceLevel { indicators { createdAt createdBy { email } description entityGuid id name objectives { target timeWindow { rolling { count unit } } } } } } }}
Update the SLOs of an SLI
Use the serviceLevelUpdate
mutation to define one or more SLOs for each one of the SLIs. To obtain the SLI's id
, use the query above.
mutation { serviceLevelUpdate( id: "indicators.id" indicator: { objectives: { target: 99.00 timeWindow: { rolling: { count: 7, unit: DAY } } } } ) { id }}