Package Drone

5 posts

Parsing RPMs in Java

RPMThe core idea of Package Drone is to extract meta data from files and generated some sort of repository index. And although Package Drone’s main focus is on OSGi, we did want to implement a YUM repository adapter and for this we needed to extract metadata from RPM files.

Package Drone itself is written in Java. So I wanted some sort of Java approach. Of course it would be possible to run the rpm command in the background, parse the output somehow and gather the meta data information from that. Or make a JNI wrapper to librpm and extract the information with a native library call. However this is not only prone to error, but also a nightmare when it comes to porting.

So I really was looking for a plain Java solution, which also was compatible with the license of Eclipse (EPL). I came across jRPM and redline.

jRPM was last updated around 2005, still has an Apache 1.1 license and simply stuck in the past. redline is more up to date and sounded promising at first, but then the library is really more like a jar file with some “main” entry points and an ant task. There is no clear API for programmatically reading the RPM files. And the legal aspect was a little bit troublesome to me. 28 contributors according to GitHub, no CLA, an “MIT license” from a company simple named “FreeCompany” and the Google Tracking code backed right into the Maven POM file. So, I had to do it myself ;-)

A fresh start

So not to fall into the same pitfalls, I did start to write a parsing library first, instead of directly writing the Package Drone extractor module. This way there is now a clean library which can parse RPM files. It also is an OSGi bundle, which was necessary for Package Drone, but does not make use of any OSGi functionality. So it still can be used as a simple JAR file. Licensed under the EPL, as requirement for Eclipse projects anyway and the Eclipse CLA and IP process to take care of the legal aspects. And if you are an Eclipse project, you even don’t need a CQ to use it.

What’s the catch?

I did implement what I needed. And that was reading RPM metadata and building a YUM repository index. So writing RPM files or reading/writing signatures is not possible at the moment. However there are plans to sign YUM repositories and RPM files as well. So this limitation is only a matter of time. Also are there some fields which are not mapped to enums. RPM used numeric IDs internally, many are mapped, but not all. You can still access those, but by number not by enum in that case.

Also is this library currently not on maven central. But again, I am also working an a “deploy to Maven Central” feature in Package Drone, which will clear that blocker.

So where are we now?

So the code is right here on GitHub. In a few weeks we will have a binary download, but right now the Eclipse IP process has to clear the way first.

Looking at the source code of the test case you can see how this library works. Instead of working on a plain File, it can work on an InputStream, which can be important if your RPM file comes from a remote location.

The following example shows how to extract metadata and content from an RPM file.

[code language=”java”]
try(RpmInputStream in = new RpmInputStream(new FileInputStream("file.rpm"))) {
String name = (String)in.getPayloadHeader ().getTag (RpmTag.NAME);

CpioArchiveInputStream cpio = in.getCpioStream ();
CpioArchiveEntry entry;
while ((entry = cpio.getNextCPIOEntry ()) != null) {
process ( entry );
}
}
[/code]

There is also the older JavaDoc, which has to be updated to reflect the change to org.eclipse.packagedrone.

What’s next?

Of course a binary release at Eclipse, an updated JavaDoc and publishing the binaries to Maven Central. Beside extending the library to be able to sign and write RPM files.

If you like to help or need some help using it, just let me know!

Package Drone – what’s next?!

Every now and then there is some time for Package Drone. So let’s peek ahead what will happen in the next few weeks.

First of all, there is the Eclipse DemoCamp in Munich, at which Package Drone will he presented. So if you want to talk in person, come over and pay us a visit.

Also I have been working on version 0.8.0. The more you think about it, the more ideas you get of what could be improved. If I only got the time. But finally it is time for validation! Channels and artifacts can be validated and the outcome will be presented in red and yellow, and a lot more detail ;-). This is a first step towards more things we hope to achieve with validation, like rejecting content and proving resolution mechanisms. Quick fix your artifacts ;-)

Also there are a few enhancements to make it easier for new users to start with Package Drone. “Channel recipes” for example setup up and configure a channel for a specific purpose, just to name one.

Of course this is important since, with a little bit of luck, there will be an article in the upcoming German “Eclipse Magazin“, which might bring some new users. Helping them to have an easy start is always a good idea ;-)

The next version also brings a new way to upload artifacts. A plain simple HTTP request will do to upload a new artifact. While I would not call it “API”, it definitely is the starting point of exactly that. Planned is a command line client and already available is the Jenkins plugin for Package Drone. It allows to archive artifacts directly to a Package Drone channel, including adding some meta data of the build.

So, if you have more ideas, please raise an issue in the GitHub issue tracker.

Meanwhile @ Package Drone

Since Package Drone has its own home now, I would simple like to sum up here what progress Package Drone has made in the last few weeks.

First of all, the most recent release, as of now, is 0.4.0. The last two releases were mostly focused about the processing of zipped P2 repositories and what comes with that. These can be processed in two different ways now. Either using the Unzip adapter, which is more like a way of deep linking, but still allows one to access a P2 repository inside that ZIP artifact. The second way it the P2 repository unzipper aspect, which unzips bundles and features and create virtual child artifacts. The second approach makes these artifacts available to all other Package Drone functionality, but also modifies the original content but unzipping and creating new meta data. However both variants can be used at the same time!

There is also a setup for OpenShift, and a quickstart at the OpenShift Hub. So if you want to try out Package Drone, the most simplest ways, just create a free account at OpenShift und simply deploy a new Package Drone setup with a few clicks. Including the database setup.

If course there have been lots of things cleaned up and improved in the UI and the backend system, but this is more a topic for the actual release notes at GitHub.

So the question is, what the future will bring. One thing I would like to see Postgres again as a database. With the most recent Postgres JDBC driver and some help from my colleague, this might be feature appearing in one of the next versions. MySQL works fine, but also has a very bad behavior when it comes to BLOB support. And since all artifacts are stored in the database, this can cause some huge memory requirement. Hopefully Postgres does a better job here.

Of course there is also the idea of storing the artifacts separately in the file system. While this requires a little bit of extra processing when it comes to backup up your system, it might be right time to add a full backup and restore process to Package Drone. This would also solve the problem of how to switch between storage backends.

And of course, I you would like to help out, please report bugs and become a contributor ;-)