My role
I was the sole engineer on this project, as well as the liason between the graphics desk and the Interactive News Engineering team. I was responsible for the designing the architecture of the project, as well as building what I had designed. Furthermore, I was responsible for maintaining the project over the years, ensuring uptime and adding new features as needed.
Problem
The Washington Post publishes dozens of stories related to hurricanes and wildfires each year. Due to the evolving nature of these events, it is critical that the Washington Post's readers have access to the latest information as it breaks. Government agencies, such as NASA and NIFC, publish data that is constantly evolving, and the graphics desk wanted a solution that could grab updates from these sources, apply transformations that are specific to the Washington Post's needs, and store them for fast retrieval by our front-end graphics as well as for archival purposes.
Solution
I built a cloud-based framework, written in Node.js, that uses AWS’ Step Functions to orchestrate the processing of data from the source to the final product. The framework uses a series of AWS Lambda functions to run the code, and S3 to store the data. I built the framework with the following goals in mind:
- Flexibility: The framework is designed to be highly configurable, allowing the graphics desk to re-use the same framework for different trackers, since it is powered by a configuration file that is agnostic to the data source or the data type.
- Scalability: The framework is highly scalable, facilitating the processing of large amounts of data, and to do so in a timely manner. We use Step Functions to run transformations in parallel, whenever possible.
- Reliability: The framework allows the graphics desk to specify error handling, ensuring that the framework can continue to run even if some transformations fail which can be useful for valuable but erratic data sources.
- Enables autonomy: The most imortant part of the framework is the Transformation step in the ETL process, which is code that is largely written by the graphics desk. The graphics desk wanted a high degree of autonomy in this step, so I built the framework in a way that allows the graphics desk to write their own code, and to have it run in the same environment as the rest of the framework. This also allowed me to build the framework in a way that is easy to maintain and frees up bandwidth for my team to focus on other projects.
Results
The framework has been a huge success, and has been used to power the Washington Post's hurricane and wildfire trackers for several years with minimal maintenance and no downtime. The pieces produced by the framework have been used in dozens of stories, in a wide variety of formats, including print, online, and social media. With each hurricane and wildfire season, the journalism powered by the trackers brings millions of pageviews to the Washington Post and helps the Post's readers stay safe and informed.