Trivento Summercamp 2016 was all about Fast Data. Stavros Kontopoulos, R&D Polyglot Software Engineer at Lightbend, reviewed the design problem for big data applications that lambda architecture tries to solve with a happy few masterclass participants in the morning. He also answered some questions for those who are interested but couldn’t attend the masterclass.
1. What is the difference between Lambda Architecture and Lambda Functions?
“AWS Lambda is a compute service where you can upload your code to AWS Lambda and the service can run the code on your behalf using AWS infrastructure.”
2. How do you build, maintain and scale micro-services with lambda functions?
Since this is out of the scope of the masterclass I will point you to some resources:
Our colleague Martijn van de Grift shared insights about serverless computing and AWS Lambda Functions during his breakout session and wrote some blogs about it in co-creation with Jeroen Gordijn:
- AWS Lambda inside insight;
- AWS Lambda part 2: API Gateway playing tricks?
- AWS Lambda part 3: Call Lambda via SDK.
3. What is the latest development of great tools, best practices when it comes to predictive analysis and how do you get your organization involved in these developments? Also, what is the future of data?
In order to get your organization involved you need to know what business goals you want to achieve with predictive analytics and measure them, secondly hire the right people for different roles like data engineer, data scientist, domain experts, ops etc. Then they pick the right tool for the job (could be also a custom algorithm etc).
A few best practices:
- Define your business goal. Check if it will pay you back and worths the cost
for the development of the model. Add a standardized process for doing predictive analytics so it can drive value. Measure value.
You need to add strategies for: data governance, data integration, operations etc before you start building your model.
Educate people in different areas: in technology to get the state of the art knowledge they need and in business to understand the business potential coming from predictive analytics.
As for the future of data I think we have just started and there are massive data waiting to be explored efficiently, so there is space for new tools, new techniques and algorithms.
4. Are Mesos jobs the same as the Lambda Functions of AWS?
These are two different concepts. Mesos jobs are processes run on top of mesos, while lambda functions are an implementation of a serverless architecture. I think it would be possible to use mesos jobs to implement (lambda) functions running in a container on mesos, since functions can be seen as services under the serveless architecture. But again mesos jobs and lambda functions refer to different things.
5. How does the SMACK components work together and how mature are these components? Also, how, where and when can or can’t I deploy them?
SMACK = Spark + Mesos + Akka + Cassandra + Kafka
Data are ingested in Kafka from an app which is based on akka.
Data are then processed from either spark streaming or akka streams and can be written to Cassandra to support real time views, queries or storage. For long persistence of course you also may need a distributed file system like HDFS.
All these components run on mesos. You can deploy them easily though DC/OS which offers packages for Kafka, Cassandra, Spark etc: https://github.com/mesosphere/universe/tree/version-3.x/repo/packages
Breakout sessions about Fast Data
Did you miss Trivento Summercamp? Receive a notification for our next edition and download the presentations for free. Including the presentations of:
- The keynote ‘Challenges and opportunities around elastic data pipelines’ by Jörg Schad (Mesosphere, Inc.);
- Breakout session ‘What is FastData & how the SMACK (Spark, Mesos, Akka, Cassandra, Kafka) stack plays a major role by implementing a Fast Data strategy’ by Stavros Kontopoulos (Lightbend);
- Breakout session ‘Implicits Inspected and Explained’ by Tim Soethout (ING);
- Breakout session ‘Fostering Agility and Digital Acceleration with a 2 speed data streaming platform architecture’ by Mohamed Himi, Trivento;
- Breakout session ‘A closer look at Elastic Stack 5.0’ by Loek van Gool (Elastic);
- Breakout session ‘Serverless: AWS Lambda Functions’ by Martijn van de Grift (Trivento).