Overview

Detect unexpected changes or irregularities in cloud spend using machine learning-powered, human-tunable anomaly detection. Ternary identifies anomalies within specific categories (e.g., compute, analytics) as well as within individual projects. This article will guide you through setting up an anomaly detection alert rule and managing and investigating anomalies once triggered.

We have conveniently created a YouTube tutorial on creating an alert rule:

Alert Rules

We have built two default alert rules that all tenants have when their tenant is created. They cannot be modified but ensure that some form of Anomaly Detection is active while you look to develop Alert Rules tailored to your organizations needs. The definition of the Alert Rules is below.

System Alert Rule:
Uses our normal Billing data source (the native billing data from your Cloud Service Providers configured in Ternary)

Filter: No filter of data (all your cloud spend)
Group By: projectId and Category - model groups the data to develop an expected range for each row of Category + ProjectID. As an example, Category = Compute + projectId = department-prod-123 - it develops an expected range then if there is another line item of Category = Compute + projectId = department-dev-123 - it develops an expected range for that
Threshold: $200 - the model generates an alert if the row is $200 above the expected range developed by the model
Lookback: 90 days

System Alert Rule BigQuery:
Uses our BQ usage data source (if you have configured Enhanced BigQuery Monitoring)

Filter: jobType not equals LOAD and reservationId is not set - essentially filtering to only jobs that have cost and are On-Demand
Group By: projectId - model groups the data by each projectId to develop an expected range for each row
Threshold: $500 - the model generates an alert if the row is $500 above the expected range developed by the model
Lookback: 90 days

Create an alert rule

Ternary allows you to create custom anomaly alert rules.

Configuration fields:

Name: Name you create for the rule. We suggest using something descriptive such as "Ryland - $100 - Non-Production by Service" to describe what the rule is analyzing so others understand at a glance
Cost or Percentage: Amount of cost you want to alert on such as $25, $100, $1000, etc. This is the threshold at which an anomaly will be generated based on the deviation outside the expected range. You can also select percentage to use a percentage deviation
Granularity: You can set this to Day, Hour, Minute
Direction: You can look for both Increases and Decreases, Increases only or Decreases only
Lookback Days: The amount of days the rule will lookback to generate things that previously would have been identified had the rule been configured. 90-days is the maximum value for lookback days.
- Note: There can be confusion about what Lookback represents. The lookback is how many days you tell the model to go back to evaluate whether an anomaly alert would have been generated. As an example, if you give the model a 7 day lookback period. It will develop an expected range using the last 90 days of complete data and compare each day of the last 7 days to that expected range to determine whether to generate an alert. During initial testing or configuring of an alert rule, we recommend using a 45-60 day lookback to "test" thresholds to see if they would have generated a lot of noise and you need to change the thresholds to something higher. E.g; You develop an alert rule and use $100 as the threshold for generating an alert, you set a lookback of 45 days and it generates 200 alerts, that probably means my threshold is too low and you need to raise it up. This can help with testing before rolling out organization wide and creating a lot noise which can create alert fatigue.
Filters: Billing Accounts, Project IDs, Services and SKUs act as filters to filter the data set down that you want to analyze. For example, you can select a subset of projects, services or SKUs to filter the data set down too. We have customers, for example, who have created a rule to filter down to Production or Non-Production spend in separate rules as their threshold for production may be larger than non-production.
Group By: After the data set is filtered down, the include labels field defines how the data is grouped by dimensions. You can use a label you have in your GCP environment, fields from the billing file, ternary specific dimensions and custom labels. For example, you may want to see deviations grouped by service. You would add serviceDescription to the include labels field.
Subscribers: These are the individuals who will be notified when an alert is generated via e-mail. You can click edit subscribers to add or remove users. You also have the option to subscribe non-Ternary account e-mail addresses. For example, if you had executives who did not have an account in the tool or a distribution list you wanted the e-mail to go to.

Recent anomalies triggered

This section is designed to give you a list of your 5 most recent anomalies in your account in the form of a dynamic list of anomalies from system default rules or custom anomaly alert rules that you configure. The list will change to reflect your selected date range in the upper right date selection menu, the default is last 30 days.

Status: Each anomaly event has a changeable status to help keep track of which anomalies need investigation or have already been actioned, the available statuses are (active, investigating, unresolved, resolved)

NEW: See linked cases in the table.

Anomaly groups

You will see a list of each anomaly configuration you have with the number of anomalies it has triggered during your selected time range along with the most recent anomaly date. Each section is expandable to view additional details and a "view" button to see the specific information relating to a chosen anomaly event.

Manage anomaly alert rules

Delete anomaly alert rule
If you choose to delete an anomaly alert rule, it will be permanently removed and cannot be recovered. Deleting the rule will also delete all previously triggered anomaly events associated with that rule, and those events will no longer be accessible.

Archive anomaly alert rule
Archiving an anomaly alert rule will pause scanning for new anomalies and prevent the generation of new anomaly events. However, archived anomaly alert rules will retain any previously captured anomaly events. You can choose to unarchive the rule at any time, which will resume scanning and generating new anomaly events based on the original configuration.

Delete anomaly event
When you delete an anomaly event, it will be permanently removed and cannot be recovered. The parent anomaly alert rule will remain unchanged, and scanning for new anomalies will continue. Deleting an event only affects that individual anomaly and does not impact the overall alert rule or other events.

View anomalous event

When you click the "View" button you can see additional details of the alert.

Key measures:

Detected: Date and time the anomaly was detected, in Ternary.
Actual Value: Actual cost on a single day for the anomaly event
Expected Range: The expected lower and upper range for the last 90 days the model uses to identify an anomaly
Delta: The difference of the actual cost and upper range of the expected range

Identify root cause: Ternary Anomaly Detection simplifies investigating the root cause of an anomaly. The “Investigate” button in the alert window directs you to a detailed report in the Ternary Reporting Engine based on the anomaly’s timestamp. You can refine the report using groupings and filters to more accurately isolate the root cause.

Cases: From the anomaly detection view you can also create a case or view linked cases. This case can optionally leverage our bi-directional Jira integration. Learn more about Ternary Case Management.

Chart: Here you will see the last 90 days of spend filtered for this anomaly configuration with the upper and lower range displayed in grey, with the actual spend being shown as a blue line. The anomaly event is displayed as a red dot on the blue line.

Table: Displayed below the chart is a table showing the top 5 results based on the configured groupings in the alert. If no specific grouping is set, it defaults to showing the top 5 resources grouped by category and SKU description. This allows for a quick preview of the primary cost driver for the anomaly, including usage amount and cost for the day of the anomaly.