Data Engineering Podcast

Data Engineering Podcast


Latest Episodes

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable
October 15, 2023

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infra

Using Data To Illuminate The Intentionally Opaque Insurance Industry
October 08, 2023

The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this epi

Building ETL Pipelines With Generative AI
October 01, 2023

Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative

Powering Vector Search With Real Time And Incremental Vector Indexes
September 24, 2023

The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applicati

Building Linked Data Products With JSON-LD
September 17, 2023

A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling me

An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem
September 10, 2023

Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operatio

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library
September 03, 2023

Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open sourc

Building An Internal Database As A Service Platform At Cloudflare
August 27, 2023

Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud serv

Harnessing Generative AI For Creating Educational Content With Illumidesk
August 20, 2023

Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to

Unpacking The Seven Principles Of Modern Data Pipelines
August 13, 2023

Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The