Prerequisites:

Ternary is agentless and uses GCP's Cloud Monitoring to power our Kubernetes Insights page. If you do not have Cloud Monitoring enabled the Insights page will not populate with your Kubernetes usage and cost.

Enabling Cloud Monitoring: https://cloud.google.com/stackdriver/docs/solutions/gke

If you have enabled Cloud Monitoring, you can replicate this Metrics Explorer Query in the GCP console to validate that metrics are being generated in the GCP console:

metric: Kubernetes Container - CPU Usage Time

filters: project_id=<your project_id>

groupingFunction: Sum

groupByLabels:
  - cluster_name
  - node_name 
  - top_level_controller_name
  - top_level_controller_type
  - container_name
  - location
  - namespace_name
  - pod_name

perSeriesAligner: Rate

alignmentPeriod: 1h

If you are not seeing metrics in the GCP console, one possibility is that there is a permissions issue publishing metrics to the console. One cause, is if custom service accounts are being used vs the default compute engine service account from GCP. You can check your custom service accounts for the following roles:

roles/logging.logWriter  
roles/monitoring.metricWriter  
roles/stackdriver.resourceMetadata.writer

For a complete list of Kubernetes metric definitions please see our supportingdocumentation.

Our sources of truth for the metrics are Stackdriver, Cloud Asset Inventory and a Pricing Table. We use a combination of these to pull the metrics and to understand what projects your clusters are running in. Some of the resources we inspect are:

workloads (pods, deployments, statefulset, etc)
namespaces
nodes
location
machine family
request and limits
usage
projects with clusters

Those metrics in combination with the pricing table allow us to do the required calculations.

We recommend that each GKE cluster have cost-allocation enabled. GKE autopilot clusters will have it enabled by default but standard and older GKE clusters may not have it enabled. Google documentation on enabling GKE cost allocation: https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations#enable_breakdown

By enabling this, additional details are added to the detailed billing export which allows Ternary to calculate the costs for your GKE standard clusters. If you do not have it enabled, you only receive cost data for GKE autopilot clusters (which have cost allocation enabled by default).

GKE Visibility Summary

Purpose: This section is designed to give you a high level overview of what you're spending on GCP GKE at the Cluster and Namespace level within your GCP Projects.

Measures:

Total Cost: Total cost of all of the associated clusters. This is gross cost and does not include any GCP related discounts you receive.
Total Waste: Total waste of all the clusters (unused Cores and RAM). This is gross cost and does not include any GCP related discounts you receive.
Used / Total Cores: The first number if the number of used cores and the second is the amount of total allocated cores to all the clusters.
Used / Total Bytes: The first number is the amount of used ram and the second is the amount of total allocated ram to all the clusters.

You are able to switch the timeframe between 7/14/30 days using the small arrow on the right hand side.

You can also switch between a cluster view and a namespace view.

You also have the ability to narrow the scope to a particular Project or include All Projects.

Cost and Usage Graphs

Cost Graph

Dimension Groupings:

The graph is grouped by cluster, when viewing by cluster or namespace when viewing by namespace. The cost is grouped by cluster/namespace and the associated project.

All Clusters in All Projects

All Namespaces in All Projects

All Namespaces filtered to a single project

All Clusters in a Single Project

Usage Graph

Dimension Groupings:

CPU: Total amount of allocated cores and total amount of used cores
Memory: Total amount of allocated memory and total amount of used memory

Tables

Cluster Table

Measures:

CPU Cost: Total cost of all the cores
CPU Utilization: Average CPU utilization over the time period selected. Requested Cores / Used Cores for each day and aggregated to get an average
Total Cores: Total number of cores allocated to the cluster
Memory Cost: Total cost of all allocated memory
Memory Utilization: Average memory utilization over the time period selected. Requested memory / Used memory
Total Bytes: Total allocated memory bytes to the cluster in GB/TB
Total Waste: Amount of waste in unused CPU and Memory in dollars

Namespace Table

CPU Cost: Total cost of all the cores
CPU Utilization: Average CPU utilization over the time period selected. Requested Cores / Used Cores for each day and aggregated to get an average
Total Requested Cores: Total number of cores requested by the namespace
Memory Cost: Total cost of all allocated memory
Memory Utilization: Average memory utilization over the time period selected. Requested memory / Used memory
Total Requested Bytes: Total requested memory bytes by the namespace in GB/TB
Total Waste: Amount of waste in unused CPU and Memory in dollars Requested vs Used

Additional Details by Namespace or Cluster

For example, you could see namespaces associated with a particular cluster or the cluster associated with a particular namespace.

GKE Optimization Summary

Before diving into what the recommendation insights numbers are, it is important to know the different types of recommendations that are potentially surfaced in Ternary for GCP GKE:

Over provisioned Workload: The amount of CPU and RAM being used by this workload consistently does not match the level of allocation of CPU and RAM to the workload as defined in the Pod specification.
Nodepool Utilization: The identified nodepool seems to be consistently underutilized. The amount of workload scheduled to the nodepool is lower than the total capacity of the nodepool.

Measures:

Current Recommendations: Total amount of savings available across all recommendations. Active represent recommendations that do not have a status of "Snooze". Snoozed represents recommendations that have a status of "Snooze" set by a user for any reason.
Past Recommendations: Total amount of previous recommendations available across all statuses available in Ternary (these are historical recommendations). Includes recommendations with a status of Implemented, Inaccurate or As-designed after discussion with your engineering team and the appropriate status assigned to the recommendation in ternary.
Total Current Recommendations: Total number of current recommendations available.
Total MTD spend: Total gross GKE spend month to date. The percentage represents your amount of MTD spend compared against this time last month.

GKE Recommendation table

Measures:

ID: Unique ID of the recommendation in the Ternary platform. Each recommendation will have a unique ID for reference later.
Recommendation type: The type of recommendation (Please see above for a description of possible recommendations).
Resource (and Project): Unique resource name and associated Project it exists in.
Open: Whether the recommendation has a status of New
Status: Status of the recommendation, as assigned in the Ternary platform. Possible values: New, Accepted, As-Designed, Implemented, and Inaccurate.
Created On: The date that the recommendation was created in the Ternary database.
Potential Savings: Potential savings if the recommendation is actioned.

When you click on any one of the Table Rows, you'll see a pop up appear:

Details of a single recommendation. You can also snooze a recommendation for a specific period of time or set a particular status.

Other Useful Functionality

Export Visibility and Optimizations: Export the current Visibility categories and Compute optimization recommendations as CSVs

For Visibility, you can export the categories table below the graph

For Optimizations, you can export the recommendations table below the graph

Knowing where you are: In the project ->cluster ->namespace -> workload -> drill down you can orient where you are and navigate back by using the breadcrumbs on the top left of the screen: