AWS recently released Data Solutions Framework (DSF), a unique open source framework designed to accelerate the creation of data solutions on AWS. Built using AWS CDK, this framework exposes abstractions and patterns as building blocks for building data solutions, and is available in TypeScript (npm) and Python (PyPi).
DSF provides building blocks packaged as standard L3 AWS CDK constructs for configuring data solutions on AWS. These building blocks provide customization capabilities and can be configured with other CDK Constructs, including those available through Construct Hub, a collection of open source CDK libraries. Lotfi Mouhib, Principal Solutions Architect at AWS, Dzenan Softić, Senior Solutions Architect at AWS, and Vincent Gromakowski, Principal Solutions Architect at AWS, wrote:
DSF allows data (platform) engineers to focus on use cases and business logic, and instead create data platforms from building blocks that represent common abstractions for data solutions, such as data lakes. (…) Although DSF is a proprietary framework, it provides deep customization capabilities that allow developers to adapt what they build to their specific needs.
AWS CDK is an open-source software development framework for defining cloud infrastructure in code and provisioning it through CloudFormation. While L1 is constructed, it is known as CFN resourceswhich is the lowest level structure and provides no abstraction, is known as the L2 structure. Carefully selected structure, map directly to a single CloudFormation resource. L3 construct, known as patternInstead, it provides the highest level of abstraction and contains multiple resources configured to work together to accomplish a specific task or service.
According to the authors, DSF is ready for production-ready workloads and follows data analysis best practices described in the Well-Architected Framework’s data analysis lens. DSF uses cdk-nag to enforce security and compliance, validating that the state of a component complies with a specified set of rules. Muhib, Softić and Gromakowski added:
DSF exposes all the resources you create so you can use them directly in your AWS CDK applications, customize them using AWS CDK escape hatches, or override AWS CloudFormation resources.
The Spark Data Lake example builds a data lake, processes data with Apache Spark, and provides a multi-environment CI/CD pipeline that supports integration testing. Sebastian Gebski, Principal Solutions Architect at AWS commented:
It is strictly dedicated to building configurable data analysis solutions. I don’t think solutions in this category have gotten as much love as they deserve so far (…) Initial releases are heavily biased towards data lakes (a good thing!). Although we are talking about an open source project here, the direction of future development will largely depend on what is interesting to the community.
Source: AWS Blog
SDF isn’t the only framework that extends AWS CDK. The Open Construct Foundation recently announced its community-driven CDK construct library initiative.
DSF is open source under the Apache 2.0 license and provides a public roadmap.