Data Engineering Podcast
Latest Episodes
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable
Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infra
Using Data To Illuminate The Intentionally Opaque Insurance Industry
The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this epi
Building ETL Pipelines With Generative AI
Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative
Powering Vector Search With Real Time And Incremental Vector Indexes
The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applicati
Building Linked Data Products With JSON-LD
A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling me
An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem
Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operatio
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library
Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open sourc
Building An Internal Database As A Service Platform At Cloudflare
Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud serv
Harnessing Generative AI For Creating Educational Content With Illumidesk
Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to
Unpacking The Seven Principles Of Modern Data Pipelines
Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The