Climate technology leader BlocPower wanted to build a powerful and cost-effective data processing pipeline so it could process more than 100 million building energy profiles and better understand how to optimize energy efficiency in the United States. The company obtains its energy profiles using EnergyPlus, the US Department of Energy’s open-source whole-building modeling engine. BlocPower needed to adopt a set of high-performance computing (HPC) solutions compatible with its C++ software development kit.

BlocPower turned to Amazon Web Services (AWS) and adopted several highly efficient cloud-based data processing solutions that balance performance and cost. Within 3 months, the company completed building its entire cloud data pipeline, enabling it to deploy BlocMaps, a software-as-a-service (SaaS) solution that provides actionable insights for building decarbonization to homeowners, utility companies, and municipalities. , states and other groups. Working on AWS, BlocPower has processed over 30TB of data at speeds 16,000 times faster than it could before, helping it use data-driven insights to promote environmental justice and fair housing in underserved communities.

Opportunity | In search of a profitable HPC

BlocPower aims to make America’s buildings smarter, greener and healthier. The company is committed to diversity, equity and inclusion, and its workforce is 60% minority and 30% female. BlocPower has helped thousands of low- and middle-income building owners, tenants and managers in 24 cities in New York, California, Wisconsin and Massachusetts understand energy efficiency opportunities and renovation of their buildings with renewable energies. Additionally, BlocPower has successfully implemented electrification, solar power and other energy efficiency measures in over 1,200 buildings starting in 2022.

BlocPower believes that the United States must electrify to reduce the risks associated with climate change. To accelerate electrification projects and design practical energy solutions, BlocPower collects data from more than 100 million buildings from external sources, such as the Department of Energy’s National Laboratories. These labs store their data using Intermediate Data Format files, requiring BlocPower to use EnergyPlus to process and render simulations of individual buildings. “There are more than 130 million buildings in the United States that account for about 30% of our carbon emissions,” says Ankur Garg, director of architecture and data analytics at BlocPower. “However, most of the data from these buildings is not compiled cleanly so that we can run analytics on it.”

BlocPower sought to create a data processing pipeline that uses HPC to run intermediate data format files through EnergyPlus, extract the necessary data, and scale to support massive parallel processing. Because the company has been cloud native on AWS since 2016, it turned to AWS to find scalable compute and data processing solutions that would work with the C++ SDK. BlocPower discovered AWS Batch, which provides fully managed batch processing at virtually any scale. “To process the data onsite, it would have cost us potentially millions of dollars,” says Garg. “We can scale to process our data using AWS Batch for a few hundred dollars each month.”

Our data processing would have taken thousands of hours on site. Thanks to AWS Batch, we can process this data in less than an hour.

Ankur Garg
Director of Architecture and Data Analytics, BlocPower

Solutions | Building a Scalable Data Processing Pipeline on AWS

BlocPower containerizes its workloads using Amazon Elastic Container Service (Amazon ECS), making it easy to run highly secure, reliable, and scalable containers. Using this service, the company quickly created 500 containers each hosting 32 vCPUs, all orchestrated by AWS Batch. BlocPower accelerated its data processing speeds by 16,000 times and processed over 30TB of data. “Our data processing would have taken thousands of hours on site,” says Garg. “With AWS Batch, we can process this data in less than an hour.”

BlocPower’s HPC compute environment uses Amazon Elastic Compute Cloud (Amazon EC2), which provides secure, scalable compute capacity for virtually any workload. And to optimize its compute costs, BlocPower has adopted diverse Amazon EC2 Spot Instances, which help enterprises run fault-tolerant workloads at up to 90% reduction. “Using Spot Instances has made our data processing very cost effective,” says Garg. BlocPower also runs its workloads using Amazon EC2 C6g instances, which offer better price performance for compute-intensive workloads.

The company hosts its lake of raw and refined data using Amazon Simple Storage Service (Amazon S3), an object storage service designed to retrieve virtually any amount of data from anywhere. Since adopting this solution, BlocPower has scaled to import over 100 million files into Amazon S3 buckets. BlocPower also relies on Amazon Redshift, which uses SQL to analyze structured and semi-structured data in data warehouses, operational databases, and data lakes. To maximize its cost savings, BlocPower runs its clusters in bursts using Amazon Redshift Serverless, making it easy to run and scale analytics without having to manage your data warehouse infrastructure. With these solutions, the company has streamlined its data management, improved the performance of its query processes, and can run advanced analytics that help it better understand building energy efficiency improvements. To visualize its data, the company uses Amazon QuickSight, a cloud-native, serverless business intelligence service.

After completing its data processing pipeline, BlocPower quickly deployed BlocMaps, a SaaS solution that provides users with climate justice data and the tools needed to create a sustainable electrification program and address inequalities in their communities. The company received DevOps training from the AWS team, which helped them complete the development of this SaaS solution in 3 months. “AWS provided additional training,” says Garg. “The AWS team is very supportive and takes the time to help us.” On the backend of its SaaS offering, BlocPower uses machine learning to deliver relevant insights to users. To run its models, the company uses Amazon SageMaker, which helps users build, train, and deploy machine learning models for virtually any use case with fully managed infrastructure, tools, and workflows.

Reminder: Read the full case study to learn more. You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel and following the AWS HPC Blog channel.