Hey everyone, it's Irene 👋
As a product manager, you’ve likely heard the terms data analytics and business analytics, and ELT and ETL thrown around but don’t have a solid grasp of what they mean or their significance. It’s important to have a holistic mental map of the world of data analytics and data processing for a few reasons:
- To measure the impact of your product and product changes
- Extract business insights that are used to make future product decisions
- Understand how engineering work around data affects your product timeline and roadmap
- Be able to communicate, understand, and contribute during engineering discussions about setting up or migrating data processing systems
In this newsletter, you’ll get the lay of the land of popular terms and concepts in data processing, especially as it relates to popular Cloud technology today. This will go a long way in helping you better understand, collaborate and communicate with your data and engineering teams about the work needed.
Got a topic you want me to cover in an upcoming newsletter?
You know the drill: fill out this super short 2 question form and tell us what you want to see in future newsletter issues. Could be a tech term, tool, or concept, or PM tips, or just anything! We'll look through your suggestions and add the most popular topics into upcoming issues. Thank you in advance 🥰
Data Processing for Product Managers
What is ETL and ELT?
ETL and ELT are core concepts in data processing that you should know as a product manager. They describe two different approaches to processing data from various sources to prepare for data analytics.
The purpose of ETL and ELT are the same: to prepare and store data for business analytics. The core difference between ETL and ELT lies in the order of operations.
ETL and ELT are nearly identical acronyms that both describe a data integration process. These acronyms stand for the following 3 processes:
- Extract: This initial step pulls raw data from different data sources for central storage in a data repository. The raw data can either be structured (i.e. text, numbers, JSON) or unstructured (i.e. images, videos, emails).
- Transform: The transform step takes raw data that can be messy, incomplete, inaccurate, and/or unusable/unreliable and cleans and processes it. Other transformations may be business related like converting currencies into USD.
- Load: Once the raw data is transformed, it’s loaded into a central data repository like a data lake or data warehouse.
ETL transforms your data before loading it into a data repository, while ELT transforms data only after loading it into a data repository.
What is ETL?
ETL (extract, transform, load) has been around for decades and has been the go-to approach for gathering and reforming data into a standardized format. In ETL, we extract data from various sources first before transforming the data and finally loading it into a data warehouse.
Real-world example of the ETL process
For example, imagine the ETL process for the sales data of a large multi-national brick-and-mortar retail company. The data from various sales sources including in-store POS systems, online website transactions, salesforce data, legacy systems, and other sources. At a pre-determined time like every Wednesday at 9am, all of the raw data from various sources would stream into a staging area, basically a separate database, and calculations, translations, and data analytics is performed on all the data. Currencies may be transformed into a standard USD currency, measurements may be standardized to US standards, and sensitive data might be encrypted, removed, or hidden.
The data is then moved and finally stored in an organized and formatted way in the data warehouse. The data in the staging area is cleared out in order to receive the next run, or the next batch of raw data that comes in and goes through the same process.
In this example, the transforming and the loading of the data must occur in the same run, even if transforming the data might takes days based on how much data there is and the complexity of the data. This makes ETL a much slower process compared to ELT.
What is ELT?
ELT (extract, load, transform) loads raw data into a data lake before transforming it. A core reason for organizations to move from ETL to ELT is when the complexity and amount of data increases to a point where it makes the transformation process very lengthy and intensive on infrastructure. Therefore, it’s more optimal to load the data into a data lake and transform the data as needed.
Real-world example of the ETL process
Let’s take the same example of sales data of a multi-national retail company and apply the ELT process to it. At a pre-determined time all the sales data comes in from their various sources and immediately gets stored into the data lake. When this data needs to be analyzed for business reasons, the data already stored in the data lake will go through the last transformation step. For example, business analysts may want to run business analytics reports once a month, at which point they will run transformations on the data already stored in the data lake.
Streaming ELT Transformation workflow with dbt
A popular tool that’s emerged in the ELT transformation workflow worth mentioning is dbt so let’s quickly cover what it is.
We need your help! How can we improve this newsletter?
Our team puts in a lot of time and effort providing you with the best technical literacy content, and we need your help. Tell us how we can improve the Skiplevel newsletter and what you want us to write about.
Want to feel more confident in your technical skills?
Missed the mid-month PM Roles and Tech Tips Newsletter?
Looking for a new PM role? My team and I decided to create a shorter newsletter issued twice a month with a list product job listings from senior to entry-level roles, along with a selection of PM & tech tips in every issue.