Improve testing11 april 2016

AWS Lambda inside insight

Currently microservices are gaining a lot in popularity. But innovation will continue. So what will be the next big thing after microservices? With microservices you still have the burden to manage the application deployments and infrastructure to run them. At Trivento we’re constantly on the lookout to spot innovative technology that allows us to create great software. At this moment we are doing a research on Lambda functions to investigate where it can be used and where it should not be used. In this blog post we will look at the running characteristics of Amazon AWS Lambda.

With the launch of AWS Lambda, a revolution, serverless architecture, has been started. Serverless architecture allows developers to focus on their code, because servers are abstracted away. This means the developers no longer have to deal with scaling, fault tolerance, patches, deployments and over/under capacity. With AWS Lambda you just upload your ‘Lambda function’ (zip/jar) and you’re good to go. The Lambda function can be written in a JVM language, NodeJs and Python.

AWS Lambda functions react to events from other AWS services like Simple Email Service (SES), Simple Notification Service (SNS), DynamoDB, Simple Storage Service (S3), Cognito, Kinesis, API Gateway and Mobile Backend. All the code uploaded to AWS Lambda has to be ‘stateless’, which allows AWS Lambda to launch as many instances of the function as needed to scale to the rate of incoming events. So, what happens when there aren’t any events? AWS Lambda is priced on a pay per use basis, so when there aren’t any Lambda invocations, you won’t be charged.

So the cloud provider will take over the responsibility to start and run your code when needed, but what does this mean for your application? When will my code be started? What will be the overhead of starting the JVM (or node or python)? There is little description about the internals of Amazon AWS Lambda functions. Most documentation describes how it can be used, but not what the runtime characteristics are. Let’s see if we can do some tests to get more insight.

Is my code hot?

From a conceptual perspective AWS needs to start a JVM with the code to be able to call the function. Starting the JVM takes time and thus will incur some overhead. If this is done on every startup it would mean that you have this overhead for every request. This is why amazon keeps your code “hot” for some time. But nowhere is mentioned how long the code stays hot and when amazon launches a new JVM to scale. Maybe this is done on purpose to prevent clients to do some funky stuff. As a developer you should program like your code was cold every time it is called and thus needs to be stateless. Even so, we want to know the runtime characteristics to be able to decide where and when or how to use AWS Lambda.

The Test

We created a very small Java function that will output some information about the runtime (note: the Amazon dependencies are provided and should not be packaged in the jar). We used the cheapest runtime with 128MB for our test. Below you see the code that we deployed (also available on github:


package nl.trivento.lambda;

public class JavaTest {
  private String id = java.util.UUID.randomUUID().toString();
  private static String jvmId = java.util.UUID.randomUUID().toString();

  public void greeting( input, output, Context context) throws {
    Long runtime = (System.nanoTime() - Long.parseLong(System.getenv().get("LAMBDA_RUNTIME_LOAD_TIME")))/1000000;
    String result = String.format("Greetings from %s Current time: %sns, Running since: %sns, Running for:%sms, classID: %s, jvmId: %s, requestId: %s",, System.nanoTime(), System.getenv().get("LAMBDA_RUNTIME_LOAD_TIME"), runtime, id, jvmId, context.getAwsRequestId());


This function will give us some information so that we can analyze what is going on in AWS Lambda. It will show us the IP Address of the host this function is running on. A UUID that identifies this instance of the class and a UUID that identifies this JVM instance. We also printed out LAMBDA_RUNTIME_LOAD_TIME environment variable that shows at what time the container was started and how long the container is running.

We created a gatling test that will run 1 user over 5 minutes, doing http requests via API Gateway that will in turn call the Lambda function and replies back the result of the function. The gatling test can also be found on github:

At first we conducted this test from a laptop connecting to the defined endpoint in API Gateway. We configured Gatling to fire requests for a duration of 5 minutes. The output we saw had some spikes in it (see below).

Run from Laptop

The strange thing is that we see spikes in the number of users and the response times. We figured, that this might be due to network latency. Gatling starts a new request every second, so if a request takes to long, it will end up with two users in a second. To overcome the network latency from Amersfoort to Dublin we decided to run the same tests on an EC2 node in the same Amazon region. We used a ‘m4.xlarge’ to have a node with great network. This turned out to give us the expected result.

When we analysed the output log lines we saw that in both situation we used 2 JVMs, but the IP stayed the same. So this means that Amazon is starting multiple JVMs on 1 machine. We noticed that running on EC2 we only see 2 JVMs in the beginning of the test when there is no available running JVM. This is probably due to the fact that the cold start takes some time, so that when the second call comes there is no free JVM, because the single JVM is still starting. We suspect that AWS Lambda creates a new JVM when a request comes in and there is no available JVM idling. So this means Amazon started 2 instances for us. In the Amazon test we saw that only one call went to another instance, while in the other test we saw that 2/3 of the calls went to one instance and 1/3 to another with no calls interleaved. So at some point Amazon decided to switch to the other instance.

Run from EC2

Metrics from laptop
Metrics from EC2

The left picture shows the run from the Laptop and right from EC2. When running from EC2 we see that the max is 1915ms and from the laptop it is 2181ms. In both situations we had 1 cold start. The mean and the 99th percentile show the network latency.

We really enjoyed looking into the details of Amazon and we will continue to investigate this further. We already looked at a test with 50 concurrent users and we are planning to try to find out how long Amazon keeps a container hot. So stay tuned for the next blog or feel free to contact us via mail: Martijn van de grift ( or Jeroen Gordijn (

This blog is a co-creation of Martijn van de Grift and Jeroen Gordijn.

Martijn van der Grift