Leak Detector
Introduction
Lets set the scene a bit. You get ready to start working on something when your phone gets pinged, theres been an alert for production. Your heart sinks a bit hoping its a miss fire but you check the alert and see its because JVM heap memory is too high. You inspect your metrics in grafana and see JVM heap memory has been steadily increasing over the past 4 hours and objects arent being collected correctly. You wonder if it was because of something you deployed into production just a couple days ago and start trying to diagnose the issue by grabbing some heap dumps to investigate which objects are still retained on the heap.
Hopefully this isnt a common occurrence but, no matter how much testing you perform, there will always be edge cases which cause issues in production. One that gave us a lot of trouble would be histogram timers. In this post, im going to walk through how we’ve used reflection to help us diagnose histogram timers not ending correctly and how using reflection just a tiny bit might be a good thing in high performance applications.
Background on reflection
What is it?
It allows us to inspect/modify classes, interfaces, fields and methods. We could also instantiate new objects, invoke methods and get/set field values regardless of their access modifiers.
When should it be used?
If you want to access methods, fields, classes or interfaces where they are set to final, private or protected, reflection can be very useful. In any other scenario, reflection should probably not be used as it is quite slow. This is because it must inspect the metadata of the bytecode to resolve against the string you provide instead of using precompiled addresses and constants.
Where is reflection commonly used?
One common scenario where reflection is used is in JUnit to retrieve and execute methods in a class by grabbing methods annotated with “@Test”
Simple reflection example
public class Car {
private String make;
private String model;
private String engineType;
}
This is a simple example class which has private fields for make, model and engine type for a car. As the access modifiers for these fields is set to private, we can’t access them unless there is a public getter to retrieve the variable on the class.
@Test
void shouldGetFieldsFromCarObject() {
final var car = new Car();
Field[] fields = car.getClass().getDeclaredFields();
List<String> actualFieldNames = getFieldNames(fields);
assertTrue(Arrays.asList("make", "model", “engineType”)
.containsAll(actualFieldNames));
}
This example test shows that we can get all declared fields on a class where we can retrieve the field names on the class to validate against even though the access to these fields is private and we shouldn’t be able to retrieve them.
How are we using reflection?
Why do we use reflection?
Sometimes, we would see increased memory utilization where for some reason, objects weren’t being collected by the garbage collector but were living till tenor which would cause memory to steadily increase until an alert would fire that we were meeting a memory limit on a specific node. After grabbing and inspecting a heap dump in eclipse memory analyser, we discovered that the leak was being caused by a histogram timing event not being ended correctly.
We use Prometheus in our production servers to collect metrics and aggregate our alerts in one location. The majority of our application code runs asynchronously where we use futures to handle logic that we want to execute without blocking the main threads of our application from spinning. Sometimes, some logic being executed on a future could throw an exception or the result of a future could throw an exception causing any histogram timers that haven’t had been ended yet to not be collected, causing a memory leak in our production servers.
Like many production systems, there may be many histograms collecting metrics so diagnosing which histogram is leaking memory would become quite the ordeal where the flow for each histogram would need to be followed and checked to ensure that the timing events for that histogram would end correctly.
We use reflection to retrieve every histogram we are collecting metrics with, and we retrieve the counters on each histogram. If the counter on a histogram has lived longer than a given amount of time, we log that the histogram parent for that counter is leaking memory and this process runs every couple of seconds.
This approach allows us to narrow our search area to a specific histogram leaking memory instead of inspecting how the flow of every histogram is used to ensure their timing events are ended correctly however, this approach wouldn’t be possible if we didn’t know the average time a timing event should live for. This also means that diagnosing this specific issue becomes easier as if this log is firing, we know a histogram is leaking memory. Therefore, if this log isn’t firing, we know we would have to grab a heap dump or jfr to further diagnose a production issue.
Example of the reflection we are using.
REFLECTED_HISTOGRAM_CHILDREN = SimpleCollector.class.getDeclaredField("children");
REFLECTED_HISTOGRAM_CHILDREN.setAccessible(true);
REFLECTED_START_TIME = Histogram.Timer.class.getDeclaredField("start");
REFLECTED_START_TIME.setAccessible(true);
REFLECTED_CHILD = Histogram.Timer.class.getDeclaredField("child");
REFLECTED_CHILD.setAccessible(true);
REFLECTED_HISTOGRAM_FULL_NAME = SimpleCollector.class.getDeclaredField("fullname");
REFLECTED_HISTOGRAM_FULL_NAME.setAccessible(true);
REFLECTED_HISTOGRAM_LABEL_NAMES = SimpleCollector.class.getDeclaredField("labelNames");
REFLECTED_HISTOGRAM_LABEL_NAMES.setAccessible(true);
REFLECTED_HISTOGRAM_TIMERS_MAP = AbstractMetricsReporter.class.getDeclaredField("histogramTimersMap");
REFLECTED_HISTOGRAM_TIMERS_MAP.setAccessible(true);
This code snippet retrieves the children, fullname and labelNames fields on the SimpleCollector class, the start and child field on a HistogramTimer class and the histogramTimersMap on our custom AbstractMetricsReporter class.
We use this data to determine if a specific histogram is leaking after a given time period where we log out the name of the histogram, labels on the histogram and the amount of time the histogram has been leaking for to help narrow down our investigation for a histogram timers map leaking.
Thoughts on reflection
Benefits
Reflection allows us to retrieve values on classes, modify elements and redirect method calls if we chose to which gives us more control over how we would want a specific application or library to function.
Drawbacks
The main drawback of reflection is that it can be inefficient in some use cases. As you are retrieving information of a class or object as a string to then access from an object, the process of string matching to the correct variable/method and then type casting or executing your custom business logic is time consuming and isn’t very performant.
When is it appropriate to use reflection?
In most scenarios, I would recommend against using reflection. If the code that is being modified is in the user space and lives with the business logic or application code that you have specified, it makes more sense to create getters or setters to help with modifying the state of that object. Using reflection breaks object oriented paradigms and should be avoided if the functionality you are trying to achieve can be done without the use of reflection.
It makes more sense to use reflection for external libraries such as Prometheus where the functionality you are trying to achieve isn’t possible using the API of the underlying library. We have opted to execute this logic on a separate thread which Is executed periodically to ensure the rest of applications performance isn’t adversely affected by performing this reflection process continuously.