Yearly Archives: 2020

3 posts

OpenShift Update Graph Visualizer, lessons learned

Since OpenShift 4, updates are rather trivial. You wait for the new update to appear, press the button (or use the CLI), wait a bit, and the update is done. True, in production you might want to complicate that process a bit, for good reason.

Running an OpenShift 4 cluster now for a while myself, and developing apps on top of Kubernetes on my day job, I am curious about the next release. Is it GA already? Can I deploy it? Is there an upgrade for my current version? Is that in “candidate”, “fast”, or “stable”? Checking that turned out to be no as easy as it should be.

A bit of background

The tool behind the OpenShift update information is openshift/cincinnati:

Cincinnati is an update protocol designed to facilitate automatic updates. It describes a particular method for representing transitions between releases of a project and allowing a client to perform automatic updates between these releases.

https://github.com/openshift/cincinnati

Which means, that this tool has all the information available, to show which upgrade is available. And from which version I can upgrade. It also has the information about all the different channels (like “fast”, “candidate”, or “stable”). And it is written in Rust, I love it already.

There is a little bit of information on the data format, but what about the data itself? It is available from the endpoint https://api.openshift.com/api/upgrades_info. So now have everything that we need.

The solution

To be honest, there is nothing too difficult about it. You have all the data. There already is a nice example using graphviz in the repository of OpenShift Cincinnati. The problem with the example is, you need to run it every time an update gets posted, for every channel. It will generate a static graph representation, so you have no ability to zoom, or re-arrange. It would be nice to have bit more interactive visualization of that graph.

A few hours later, and with the help of visjs, jQuery, Bootstrap, and some plain old Javascript, you have something more interactive: https://ctron.github.io/openshift-update-graph/

Screenshot of OpenShift Update Graph Visualizer

The challenges

As always, it should be simple. But in real life, nothing is. Of course I encountered a few obstacles to work around …

Hosting a web page

Hosting a simple, static, webpage used to be so simple. Put a file into a directory that Apache publishes, and your are done … Today, I also could set up a Tekton pipeline, build a new image, and “run” the webpage in a container, on a cluster, proxied by one or two more HTTP reverse proxies.

The truth is, there are now so many options when it comes to hosting a simple web page, it can become a tough decision. I wanted to keep things together closely. Most likely I will build it, use it, and don’t want to actively maintain it (forget about it). Git and GitHub are obvious choices for me, so why not simply host this with GitHub pages, using the same repository I use for coding. As it turned out later, that was the right choice and helped with a few other obstacles as well.

CORS & headers

Having an online API, HTTP based, should make it so simple to get everything you need, directly from the user’s browser session. The only thing that would be necessary, is to host the web page and a bit of JavaScript.

Unfortunately, the update API is rather picky when it comes to doing requests. It’s missing CORS headers, and using a general purpose CORS proxy turned out to lack the correct HTTP headers that the API requires. I wanted to focus on the visualization and wanted to stick to the plan of simply hosting a static web page, not running a CORS proxy myself for this.

Then again, the only thing I do to with the API, is to perform an HTTP GET request. As there is nothing dynamic about it, I could as well host a JSON file, and fetch that. I would only need a process to update the JSON data.

Now I was glad that I chose GitHub for all of this. Setting up a GitHub Actions workflow for updating the data is rather simple. The command was already part of the original example, with the only difference, that I don’t need to run graphviz. The workflow will fetch the data, and when git detects a change in the data, the workflow will commit and push the changes to the same repository. Great plan, but …

Data formats

… the data format is not stable. Doing multiple GET operations on the endpoint give you back different content. True, the information is the same, but the “byte content” is different. The data format describes updates as nodes and edges, very simple and a perfect match for our purpose. However, the edges reference the nodes by their position in the list of nodes, and not by some stable identifier. Assume the following two examples:

{
  "nodes": [ "A", "B", "C" ],
  "edges": [
    { "from": 0, "to": 1 },
    { "from": 1, "to": 2 },
  ]
}

{
  "nodes": [ "C", "B", "A" ],
  "edges": [
    { "from": 2, "to": 1 },
    { "from": 1, "to": 0 },
  ]
}

Both examples contain the same information (A → B → C), however the raw bytes are different (can you spot the difference). And git diff only works with the bytes, and not the actual information conveyed by those.

So my plan of periodically fetching the data, and letting git diff check for differences wouldn’t work work. Unless I would create a small script that normalizes the data. Running that as part of the update job isn’t complicated at all. And now the diff can check if the normalized data changed, and only act on that.

Why do I keep the non-normalized data? Yes, I could let the visualizer use the normalized data. However, I would like to use the original data format. In the hope that some day, I would be able to use the API endpoint directly.

No API for channels

I wanted to visualize all channels, with the different versions. Turns out, that there is no API for that: openshift/cincinnati#171. But I also didn’t want to maintain an update-to-date list of channels myself. A thing that I will forget about sooner or later. Nothing a little shell script can fix. In GitHub actions, performing a checkout is simply yet another action, and you can check out multiple repositories. Of course, you don’t get triggered from those repositories. But we are running a periodic workflow anyway, so why not checkout the graph data repository, and check for the channels in there.

Why didn’t I use the data contained in the channel directory of that repository? I wanted to really stick to the original API. There is all kinds of processing done with the data in there, and I simply didn’t want to replicate that. Finding all the channels as a workaround seemed fined though.

Nervous graphs

Once the graphs grow a bit, they can get rather complex. visj has a “physics” model built in, which helps to balance the layout when you drag around nodes. However, every now and then the layouter and the physics model seemed to produce funny, but useless results. Depending on the model, a simple reload, re-starting the layouting algorithm with a different seed fixes the problem. But that is a bad experience.

Luckily, you can configure all kinds of settings in the physics model, and playing around a bit with the settings lead to some settings, that seem to be fun, but stable enough, even for the bigger graphs.

What is next?

Not much really. It is a tool for me that just works, showing OpenShift updates with a click. The itch is scratched, and I learned a few things in the process. And I hope that by sharing, it becomes useful for someone other than just me.

Of course, if you think that something is missing, broken or could be done in a better way: Open Source is all about contributing ;-) → ctron/openshift-update-graph

Quarkus – Supersonic Subatomic IoT

Quarkus is advertised as a “Kubernetes Native Java stack, …”, so we took it to a test, and checked what benefits we can get, by replacing an existing service from the IoT components of EnMasse, the cloud-native, self-service messaging system.

The context

For quite a while, I wanted to try out Quarkus. I wanted to see what benefits it brings us in the context of EnMasse. The IoT functionality of EnMasse is provided by Eclipse Hono™, which is a micro-service based IoT connectivity platform. Hono is written in Java, makes heavy use of Vert.x, and the application startup and configuration is being orchestrated by Spring Boot.

EnMasse provides the scalable messaging back-end, based on AMQP 1.0. It also takes care of the Eclipse Hono deployment, alongside EnMasse. Wiring up the different services, based on an infrastructure custom resource. In a nutshell, you create a snippet of YAML, and EnMasse takes care and deploys a messaging system for you, with first-class support for IoT.

Architecture diagram, explaining the tenant service.
Architectural overview – showing the Tenant Service

This system requires a service called the “tenant service”. That service is responsible for looking up an IoT tenant, whenever the system needs to validate that a tenant exists or when its configuration is required. Like all the other services in Hono, this service is implemented using the default stack, based on Java, Vert.x, and Spring Boot. Most of the implementation is based on Vert.x alone, using its reactive and asynchronous programming model. Spring Boot is only used for wiring up the application, using dependency injection and configuration management. So this isn’t a typical Spring Boot application, it is neither using Spring Web or any of the Spring Messaging components. And the reason for choosing Vert.x over Spring in the past was performance. Vert.x provides an excellent performance, which we tested a while back in our IoT scale test with Hono.

The goal

The goal was simple: make it use fewer resources, having the same functionality. We didn’t want to re-implement the whole service from scratch. And while the tenant service is specific to EnMasse, it still uses quite a lot of the base functionality coming from Hono. And we wanted to re-use all of that, as we did with Spring Boot. So this wasn’t one of those nice “greenfield” projects, where you can start from scratch, with a nice and clean “Hello World”. This is code is embedded in two bigger projects, passes system tests, and has a history of its own.

So, change as little as possible and get out as much as we can. What else could it be?! And just to understand from where we started, here is a screenshot of the metrics of the tenant service instance on my test cluster:

Screenshot of original resource consumption.
Metrics for the original Spring Boot application

Around 200MiB of RAM, a little bit of CPU, and not much to do. As mentioned before, the tenant service only gets queries to verify the existence of a tenant, and the system will cache this information for a bit.

Step #1 – Migrate to Quarkus

To use Quarkus, we started to tweak our existing project, to adopt the different APIs that Quarkus uses for dependency injection and configuration. And to be fair, that mostly meant saying good-bye to Spring Boot specific APIs, going for something more open. Dependency Injection in Quarkus comes in the form of CDI. And Quarkus’ configuration is based on Eclipse MicroProfile Config. In a way, we didn’t migrate to Quarkus, but away from Spring Boot specific APIs.

First steps

Starting with adding the Quarkus Maven plugin and some basic dependencies to our Maven build, and off we go.

And while replacing dependency inject was a rather smooth process, the configuration part was a bit more tricky. Both Hono and Microprofile Config have a rather opinionated view on the configuration. Which made it problematic to enhance the Hono configuration in the way that Microprofile was happy. So for the first iteration, we ended up wrapping the Hono configuration classes to make them play nice with Microprofile. However, this is something that we intend to improve in Hono in the future.

Packaging the JAR into a container was no different than with the existing version. We only had to adapt the EnMasse operator to provide application arguments in the form Quarkus expected them.

First results

From a user perspective, nothing has changed. The tenant service still works the way it is expected to work and provides all the APIs as it did before. Just running with the Quarkus runtime, and the same JVM as before:

Screenshot of resource consumption with Quarkus in JVM mode.
Metrics after the conversion to Quarkus, in JVM mode

We can directly see a drop of 50MiB from 200MiB to 150MiB of RAM, that isn’t bad. CPU isn’t really different, though. There also is a slight improvement of the startup time, from ~2.5 seconds down to ~2 seconds. But that isn’t a real game-changer, I would say. Considering that ~2.5 seconds startup time, for a Spring Boot application, is actually not too bad, other services take much longer.

Step #2 – The native image

Everyone wants to do Java “native compilation”. I guess the expectation is that native compilation makes everything go much faster. There are different tests by different people, comparing native compilation and JVM mode, and the outcomes vary a lot. I don’t think that “native images” are a silver bullet to performance issues, but still, we have been curious to give it a try and see what happens.

Native image with Quarkus

Enabling native image mode in Quarkus is trivial. You need to add a Maven profile, set a few properties and you have native image generation enabled. With setting a single property in the Maven POM file, you can also instruct the Quarkus plugin to perform the native compilation step in a container. With that, you don’t need to worry about the GraalVM installation on your local machine.

Native image generation can be tricky, we knew that. However, we didn’t expect this to be as complex as being “Step #2”. In a nutshell, creating a native image compiles your code to CPU instruction, rather than JVM bytecode. In order to do that, it traces the call graph, and it fails to do so when it comes to reflection in Java. GraalVM supports reflection, but you need to provide the information about types, classes, and methods that want to participate in the reflection system, from the outside. Luckily Quarkus provides tooling to generate this information during the build. Quarkus knows about constructs like de-serialization in Jackson and can generate the required information for GraalVM to compile this correctly.

However, the magic only works in areas that Quarkus is aware of. So we did run into some weird issues, strange behavior that was hard to track down. Things that worked in JVM mode all of a sudden were broken in native image mode. Not all the hints are in the documentation. And we also didn’t read (or understand) all of the hints that are there. It takes a bit of time to learn, and with a lot of help from some colleagues (many thanks to Georgios, Martin, and of course Dejan for all the support), we got it running.

What is the benefit?

After all the struggle, what did it give us?

Screenshot of resource consumption with Quarkus in native image mode.
Metrics when running as native image Quarkus application

So, we are down another 50MiB of RAM. Starting from ~200MiB, down to ~100MiB. That is only half the RAM! Also, this time, we see a reduction in CPU load. While in JVM mode (both Quarkus and Spring Boot), the CPU load was around 2 millicores, now the CPU is always below that, even during application startup. Startup time is down from ~2.5 seconds with Spring Boot, to ~2 seconds with Quarkus in JVM mode, to ~0.4 seconds for Quarkus in native image mode. Definitely an improvement, but still, neither of those times is really bad.

Pros and cons of Quarkus

Switching to Quarkus was no problem at all. We found a few areas in the Hono configuration classes to improve. But in the end, we can keep the original Spring Boot setup and have Quarkus at the same time. Possibly other Microprofile compatible frameworks as well, though we didn’t test that. Everything worked as before, just using less memory. And except for the configuration classes, we could pretty much keep the whole application as it was.

Native image generation was more complex than expected. However, we also saw some real benefits. And while we didn’t do any performance tests on that, here is a thought: if the service has the same performance as before, the fact that it requires only half the of memory, and half the CPU cycles, this allows us to run twice the amount of instances now. Doubling throughput, as we can scale horizontally. I am really looking forward to another scale test since we did do all other kinds of optimizations as well.

You should also consider that the process of building a native image takes quite an amount of time. For this, rather simple service, it takes around 3 minutes on an above-than-average machine, just to build the native image. I did notice some decent improvement when trying out GraalVM 20.0 over 19.3, so I would expect some more improvements on the toolchain over time. Things like hot code replacement while debugging, are things that are not possible with the native image profile though. It is a different workflow, and that may take a bit to adapt. However, you don’t need to commit to either way. You can still have both at the same time. You can work with JVM mode and the Quarkus development mode, and then enable the native image profile, whenever you are ready.

Taking a look at the size of the container images, I noticed that the native image isn’t smaller (~85 MiB), compared to the uber-JAR file (~45 MiB). Then again, our “java base” image alone is around ~435 MiB. And it only adds the JVM on top of the Fedora minimal image. As you don’t need the JVM when you have the native image, you can go directly with the Fedora minimal image, which is around ~165 MiB, and end up with a much smaller overall image.

Conclusion

Switching our existing Java project to Quarkus wasn’t a big deal. It required some changes, yes. But those changes also mean, using some more open APIs, governed by the Eclipse Foundation’s development process, compared to using Spring Boot specific APIs. And while you can still use Spring Boot, changing the configuration to Eclipse MicroProfile opens up other possibilities as well. Not only Quarkus.

Just by taking a quick look at the numbers, comparing the figures from Spring Boot to Quarkus with native image compilation: RAM consumption was down to 50% of the original, CPU usage also was down to at least 50% of original usage, and the container image shrank to ~50% of the original size. And as mentioned in the beginning, we have been using Vert.x for all the core processing. Users that make use of the other Spring components should see more considerable improvement.

Going forward, I hope we can bring the changes we made to the next versions of EnMasse and Eclipse Hono. There is a real benefit here, and it provides you with some awesome additional choices. And in case you don’t like to choose, the EnMasse operator has some reasonable defaults for you 😉


Also see

This work is based on the work of others. Many thanks to:

Headless installation of Cargo and Rust

When you want to containerize your Rust application, you might be using a prepared Rust image. But maybe you are a bit more paranoid when it comes to trusting base layers and you want to create your own Rust base image. Or maybe you are just curios and want to try it yourself.

Getting cargo, the Rust build tool, into the image is probably one of the first tasks in your Dockerfile. And it is rather easy on an interactive command line:

curl https://sh.rustup.rs -sSf | sh

Automated container build

However, running inside a container build, you will be greeted by the nice little helper script, asking you for some input:

Current installation options:


   default host triple: x86_64-unknown-linux-gnu
     default toolchain: stable
               profile: default
  modify PATH variable: yes

1) Proceed with installation (default)
2) Customize installation
3) Cancel installation
>

In a terminal window this is no problem. But in an automated build, you want the script to proceed without the need for manual input.

The solution

The solution is rather simple. If you take a look at the script, then you will figure out that it actually allows you to pass an argument -y, assuming defaults without the need to input any more details.

And you can still keep the “one liner” for installing:

curl https://sh.rustup.rs -sSf | sh -s -- -y

The -s will instruct the shell to process the script from “standard input”, rather than reading the script from a file. In the original command it already did that, but implicitly, because there was no other argument to the shell.

The double dash (--) indicates to the shell that everything which comes after, it not an argument to the shell, but to the shell script instead.

And finally -y is passed to the script, which is the cargo installer.

Thanks

I hope this comes in handy to you. It took me a bit to figure it out. Of course, not only in the context of containers, but for any headless/silent installation of Rust.