About two years ago we launched the amasol Managed Cloud offering Dynatrace as a Managed Service hosted in Germany, so we could offer our customer the comfort of a SaaS solution while still allowing them to follow strict data protection regulations.
Since then we have been extending the platform and adding more features like our aMC Synthetic App, aMC Config Management or a central User Management Portal for all aMC services.
Because we started on a green field, we were able to pick a modern technology stack and decided to deploy everything on Kubernetes.
When looking for a Continuous Delivery Solution we quickly settled on Keptn as the control plane. Keptn comes with out-of-the-box integrations of several tools we already had in place for our Continuous Integration setup like GitLab, JMeter or Dynatrace. Thanks to its event-driven architecture we can easily extend it to other tools we may need in the future.
Keptn also supports us on the way to an Autonomous Cloud by enabling an easy start with SLO-based Quality Gates to improve the quality and continue to other topics like different deployment strategies and auto remediation.
We use GitLab CI with a simple pipeline to build and unit-test new versions of our applications. If the build is successful a new image is created and pushed to our container registry.
From there we trigger the deployment (currently manual) with a simple Keptn CLI call, telling it to deploy our new artifact:
We trigger the deployments through the CLI because we still want manual approval for production deployments. In the next Keptn Version (0.7.0) it will be possible to do manual approvals between stages with Keptn’s new Delivery Assistant, which will allow us to trigger the deployment directly from our CI pipeline and still maintain control when a version should be promoted to production.
Let me explain what happens behind the scenes when sending the new-artifact event.
Once the event is sent Keptn takes the rudder and deploys our new artifact to the first stage. In our case that is the development environment.
The different stages are defined in a simple YAML file called shipyard:
For every stage Keptn allows us to specify the deployment strategy, e.g. direct, blue/green or canary and the testing strategy, e.g.: functional, performance. This gives us declarative control about what should happen in each stage as an artifact gets pushed through the delivery process.
After Keptn deployed an artifact and then triggered the tests it automatically enforces a quality gate. This is done through Keptn’s Lighthouse service which verifies metrics from a monitoring system as Service Level Indicators (SLIs) against defined Service Level Objectives (SLOs).
In our case, after the deployment is finished Keptn starts the test through Keptn’s JMeter service that is already shipped with Keptn. It does a short functional test to check if the environment is operational and then starts a load test.
The SLOs are defined in a YAML file and following the GitOps approach are stored along with the other files (SLIs, JMeter Scripts and Helm Charts) in a git repository.
For every SLO the passing criteria can be specified as an absolute and/or relative value. The relative values are compared against the last X number of builds allowing for automatic regression detection, e.g.: do not allow an increase in response time by more than 10% to the previous build. The absolute allows us to enforce strict thresholds that we may have, e.g.: do not allow a failure rate of more than 1%
Every SLO has a weight which influences the calculation of the total score. The total score section allows us to specify what score we consider as a successful or failed build:
With the bridge, Keptn has a great built-in Web UI to get an overview of all your Keptn workflows.
Keptn follows an event-based approach and each event is listed in the bridge. Staring at the configuration changed event triggered by our keptn CLI call, all subsequent steps are listed in the bridge:
In this case the tests are executed but the thresholds defined in our SLOs are not met because the error rate is to high:
Because our Total Score is below the defined passing mark Keptn stops the workflow here and does not deploy to our production stage. That is immediate feedback delivered back to engineering so they can start fixing newly introduced issues right away.
So, once we fix the problem, we run a new deployment and Keptn validates if we are now meeting our SLOs. Looks fine now :-)
With the successful evaluation Keptn goes ahead an deploys the new artifact automatically to production.
Keptn enabled us to quickly onboard multiple projects and services to a robust Continuous Delivery process, without the overhead of managing multiple Pipelines.
The SLO-based Quality Gates help us in providing a good experience to our End Users and give our engineers immediate feedback for their deployed changes.
With Keptn 0.7 we will link our Continuous Integration Flow to Keptn for automatic deployments while still retaining manual approval for production with Keptn’s Delivery Assistant. After that we are looking forward to advanced use cases like auto remediation.