Today we’d like to tell you about Corral! This is a serverless MapReduce framework. In a nutshell, Corral is a framework for writing arbitrary MapReduce applications that can be executed in AWS Lambda. It was the result of using Lambda as an execution environment (like Hadoop MapReduce uses YARN).
A short note for those of you who didn’t know. AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of the Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. p.s. In one of our previous articles we compared AWS to Google Cloud Platform, check it to understand more about cloud platforms.
Going back to Corral, it is best suited for data-intensive but computationally inexpensive tasks, such as ETL jobs.
To deploy to Lambda Corral uses the next process:
- The user compiles the Corral application targeting their platform of choice.
- Upon execution, the corral app recompiles itself for GOOS=linux, and compresses that generated binary into a zip file.
- Corral then uploads that zip file to Lambda, creating a Lambda function.
- Corral invokes this Lambda function as an executor for map/reduce tasks.
Corral tries to be agnostic to the filesystem it runs on. This allows it to transparently switch between local and Lambda execution (and allows room for extension, such as if GCP begins to support Go in cloud functions). Corral’s performance is fairly respectable. Much of this is due to the nearly infinite parallelism that Lambda offers.
Corral’s creator says that in future the connectors to GCP’s Cloud Functions and Datastore could be added.
Find more about this project on GitHub!