Monday, 17 July 2017

Instrumenting Java Web Applications without Modifying their Source Code

Most Java Web applications use standard Java interfaces when interacting with other systems. HTTP-based services like Web-pages or REST servers are implemented using the interface javax.servlet.Servlet. Database interaction is implemented using the JDBC interfaces java.sql.Statement and java.sql.Connection. These standards are almost universally in use, independent of the underlying framework (Spring or Java EE) and the Servlet container (Tomcat, Wildfly, etc.).

This article shows how to implement a Java agent that hooks into these interfaces using Bytecode manipulation and gathers metrics about the frequency and duration of HTTP and database calls. Demo code is available on https://github.com/fstab/promagent which is an agent instrumenting Java Web applications for the Prometheus monitoring system. However, this article is not Prometheus specific, it focuses on the underlying technologies like Java agents, Bytecode manipulation, and class loaders.

1. Java Agents


Java agents are Java programs that can be attached to a JVM in order to manipulate Java Bytecode. For example, Java agents may be used to modify all implementations of the interface javax.servlet.Servlet to gain statistics on the number and duration of HTTP calls.
Java agents are shipped as JAR files. While regular Java programs have a main() method as the application’s entry point, Java agents have a premain() method that will be called before the application’s main() method:

Java Agent Outline

public class MyAgent {
    public static void premain(String agentArgs, Instrumentation inst) throws Exception {
        // ...
    }
}

While executable JAR files have a MANIFEST.MF file specifying the Main-Class, agents JARs have a MANIFEST.MF file specifying the Premain-Class. The agent can be attached during application startup using the command line option -javaagent::

Java Agent Command Line

java -javaagent:myagent.jar -jar myapp.jar

The premain() method may then call inst.addTransformer() to register a ClassFileTransformer. The class file transformer implements a transform() method that will be called whenever a Java class is loaded. It may examine and modify the Bytecode of any Java class in order to add additional functionality.

2. Bytecode Manipulation


There are a couple of libraries available helping Java developers to implement Bytecode manipulation. The most low level one is ASM. Other libraries, like cglib and javassist provide higher level APIs. The newest and most easy to use library is Byte Buddy. It provides an easy-to-read fluent Java API to create a ClassFileTransformer and register it with the Instrumentation:

Byte Buddy Agent Example

package io.promagent;

import net.bytebuddy.agent.builder.AgentBuilder;
import net.bytebuddy.agent.builder.AgentBuilder.Transformer;
import net.bytebuddy.matcher.ElementMatchers;

import java.lang.instrument.Instrumentation;

import static net.bytebuddy.matcher.ElementMatchers.hasSuperType;
import static net.bytebuddy.matcher.ElementMatchers.named;

public class MyAgent {

    public static void premain(String agentArgs, Instrumentation inst) throws Exception {
        new AgentBuilder.Default()
                .type(hasSuperType(named("javax.servlet.Servlet")))
                .transform(new Transformer.ForAdvice()
                        .include(MyAgent.class.getClassLoader())
                        .advice(ElementMatchers.named("service"), "io.promagent.MyAdvice"))
                .installOn(inst);
    }

The example above shows the full code necessary to instrument the service() method of all javax.servlet.Servlet implementations. The service() method is called whenever a Servlet processes a Web request. The MyAdvice class defines the code that will be injected into the Servlet’s service() method. This code is annotated with @Advice.OnMethodEnter and @Advice.OnMethodExit:

Byte Buddy Advice Example

public class MyAdvice {

    @Advice.OnMethodEnter
    public static void before(ServletRequest request, ServletResponse response) {
        System.out.println("before serving the request...");
    }

    @Advice.OnMethodExit
    public static void after(ServletRequest request, ServletResponse response) {
        System.out.println("after serving the request...");
    }
}

Byte Buddy offers two ways of instrumenting methods: Advices (as shown above), and interceptors. The difference is subtle: With advices, the Bytecode of the @Advice.OnMethodEnter and @Advice.OnMethodExit methods is copied to the beginning and into a finally block of the intercepted method. The effect is the same as if you would copy-and-paste the code into the service() implementation you want to intercept. As a result, the class MyAdvice is no longer used after instrumentation is done. The intercepted service() method does not need to have access to the MyAdvice class, it can be executed in a class loader context where the MyAdvice class is not available.

Interceptors on the other hand are regular method calls that are executed at the beginning and in a finally block of the intercepted methods. That means that the intercepted method must be executed in a context where the interceptor class is available.

3. Adding Dependencies


In order to turn the example above into something useful, we need to replace the System.out.println() messages with code maintaining metrics and providing metrics to a monitoring system. For example, the Promagent uses the Prometheus client library for maintaining and exposing Prometheus metrics.

The JVM automatically adds the JAR file specified with the -javaagent: command line parameter to the application’s system class loader. Therefore, it should theoretically be possible to create a Uber JAR containing the agent and all its dependencies, and use this in the -javaagent: command line argument.

However, making all dependencies available on the system class loader is problematic in an application server environment for two reasons:
  • Some of the agent’s dependencies might conflict with libraries used internally inside the application server or with libraries shipped in a WAR file as part of a deployed application.
  • In order to prevent conflicts, application servers restrict access to classes from the system class loader. For example, Wildfly modules cannot access classes from the system class loader unless the affected package is explicitly exposed using the jboss.modules.system.pkgs system property. It is not trivial to keep track of all dependencies and configure the module system accordingly.
A better approach is to expose only a few Java classes without external dependencies on the system class loader, and load the actual metrics implementation using a custom class loader. This minimizes the potential conflicts and the configuration needed to run the agent.

4. Loading Hooks from a Custom Class Loader


Implementing a custom class loader in Java is easy, as we can simply use the java.net.URLClassLoader and initialize it with the path to the JAR file where our classes are located. In order to make the agent easy to use, the Promagent is shipped as a JAR file containing other JAR files. The internal JAR files are copied to a temporary directory on start-up, and the custom class loader is configured with the temporary paths. That way, the user gets a single agent JAR, while internally the agent distinguishes between classes on the system class loader (these classes are contained directly in the agent JAR) and classes on the custom class loader (these classes are loaded from the JARs in the temporary directory).

The actual instrumentation is implemented in a class called hook. The hook is loaded from the custom class loader. That way, the hook may reference any dependencies it needs, as long as the custom class loader is able to provide these dependencies. As an example, the ServletHook looks like this:

Custom Hook Class Example

public class ServletHook {

    public void before(ServletRequest request, ServletResponse response) {
        // ...
    }

    public void after(ServletRequest request, ServletResponse response) {
        // ...
    }
}

The hook looks similar to the Byte Buddy advice. The difference is that the Byte Buddy advice is only a few line of code with minimal dependencies needed for loading the corresponding hook from the custom class loader, and delegating via reflection to the hook’s before() and after() methods. The Byte Buddy advice does not have any dependencies to an instrumentation library, because the actual instrumentation library is visible only in the custom class loader.

However, there’s a subtle pitfall when loading the hook: The parameters ServletRequest and ServletResponse will be passed through from the instrumented Servlet. That means, the ServletRequest and ServletResponse classes in the hook must be loaded with the same class loader as the intercepted Servlet, otherwise we cannot pass the Servlet’s parameters into the hook’s before() and after() method.

The solution is to use Thread.currentThread().getContextClassLoader() as the custom class loaders’ parent. That way, all classes that can be loaded from the context class loader will be loaded from context class loader. This includes the ServletRequest and ServletResponse. Only classes that are not available in the current context, like the hook itself and its dependencies, will be loaded from the custom JAR files. That means we need one custom class loader per context, because each custom class loader delegates to another context class loader as its parent.

5. Implementing a Global Metric Registry


Using the implementation described so far, it is possible to instrument a single Web application. However, if there are multiple deployments on the application server, each instrumentation will have its own class loader. When a metrics library is loaded from different class loaders, the deployments cannot share global static variables defined in that metrics library. For example, it is not possible to use the global metrics registry that comes with the Prometheus client library across multiple deployments. Lacking a global registry, each deployment needs to maintain and expose their metrics independently.

One way to tackle this is to extend the custom class loader and make it delegate loading the shared metric library to another shared custom class loader. However, the JVM also comes with a built-in global registry that we can use as a VM-wide metrics store: The JMX platform MBean server. Registering metrics as MBeans has the following benefits:
  • Global registry: The JMX platform MBean server provides a VM-wide registry allowing us to maintain a global set of metrics for instrumenting all deployments on an application server.
  • Single exporter to the monitoring system: It is easy to implement a small Web application that reads all metrics from the MBean server and makes them available to a monitoring system. For example, Promagent includes a WAR deployment for exporting metrics to the Prometheus server.
  • JMX tooling: As all metrics are available as MBeans, any JMX client can be used to learn the state of the metrics.
The JMX platform MBean server is part of Java SE, and can be accessed via the static method ManagementFactory.getPlatformMBeanServer(). Java Objects registered with the MBean server are called MBeans. MBeans must define their publicly accessible API in an interface that is, by convention, named like the Java class with the suffix MBean appended. For example, to register a Counter class as an MBean, the class must implement an interface named CounterMBean. Each MBean is addressable through a unique ObjectName. The methods defined in the MBean interface can be called using MBeanServer.invoke().