This presentation was recorded at YOW! 2019. #GOTOcon #YOW
Tommy Hall - Theatre fan, occasional mountaineer, part time runner, thoroughly nice chap, available in fine bookstores everywhere
ABSTRACT
In all businesses, there is some kind of data pipeline, even if it’s powered by humans working off a shared drive somewhere. Lots of places are better than this - they have workflow systems, ETL pipelines, analytics teams, data scientists, etc - but can they say months later which version of which code is running on what data generated insights?
Can they be reproduced?
What if the algorithms change, do you go back and re-run everything?
Science itself has a reproducibility problem, but it’s worse in most companies, and mistakes can be expensive.
There is a useful subset of data pipelines, let’s call them “pure”, that only depend on the data flowing through them. For pure pipelines, we can use techniques from distributed build systems to allow us to know what code was used for each step, not lose any previous results as we improve our algorithms and avoid repeating work that has been done already.
This talk contains interesting theory but is resolutely practical and with concrete examples in several languages and distributed computation frameworks. [...]
RECOMMENDED BOOKS
Bas P. Harenslak & Julian Rutger de Ruiter • Data Pipelines with Apache Airflow •
James Densmore • Data Pipelines Pocket Reference •
Barr Moses, Lior Gavish & Molly Vorwerck • Data Quality Fundamentals •
Rishu Mehra • What is Data Observability •
Gerardus Blokdyk • Observability Services A Complete Guide •
#DataPipelines #DataPipeline #Prometheus #Grafana #Data #ETLPipelines #Backend #DevOps #Streams #Frontend #TommyHall #Programming #YOWcon
Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at
Sign up for updates and specials at
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
1 view
0
0
2 months ago 00:12:41 1
How to spot Generative AI? (even if it has all 10 fingers and toes)
2 months ago 00:08:46 1
GOROD | New multi-level urban infrastructure in China
2 months ago 00:00:44 1
COMS Sensor USB Endoscope Camera Module
3 months ago 00:11:16 5
Sensationsfund: Natürlicher Wasserstoff in DE löst Energie-Problem?
3 months ago 00:10:28 1
How to shoot Distortion Grids
4 months ago 00:02:21 1
How the PURE4D 2.0 facial animation pipeline works
4 months ago 00:02:56 1
Lineage 2 NEXT GEN UE5 - #03 Improving Tech and Art
4 months ago 00:00:22 1
Blue Sky Deformation Rig
4 months ago 02:07:51 7
Building a Scalable Terrain Biome Pipeline | Twan de Graaf | EPC 2024
4 months ago 00:50:36 1
In Full: Zelenskiy on Putin, Russia Cease-Fire Prospects, Trump and US Election
4 months ago 00:32:32 1
The future of AI looks like THIS (& it can learn infinitely)
4 months ago 04:37:09 19
Scrapy Course – Python Web Scraping for Beginners
5 months ago 00:05:22 3
SUPERKIND (슈퍼카인드) ’MOODY’ Official MV
5 months ago 00:00:47 6
HumanPlus: Autonomous Skills from Imitating Humans
5 months ago 00:50:38 1
Houdini MLOPs - Data, Training and Generative AI | Moritz Schwind & Paul Ambrosiussen | SIGGRAPH...
6 months ago 00:25:26 1
Build Real-Time AI Voice Assistant With RAG Pipeline And Memory | Mistral LLM | Ollama | LlamaIndex
6 months ago 00:04:12 1
Universal Scene Description (OpenUSD): 4 Superpowers to Get You Started
6 months ago 00:20:52 3
greatest treasure finding moments ever
7 months ago 00:02:40 1
New features in RTSS beta 2: PresentMon latency analyzer, overlay layouts merging support
7 months ago 00:02:37 2
NVIDIA Omniverse Foundational Technology Montage I GTC Spring 2024 Edition
8 months ago 00:06:30 2
Sequencer Creation & Crowd Sim Pipeline Unreal 5 | Unreal Live Link 1.3 Tutorial
8 months ago 00:08:18 1
AI & Flare. The #1 AI token in the 2024/25 Bull Market?!? #flare #flarenetwork #ai
8 months ago 00:12:51 1
Create mind-blowing AI RENDERINGS of your 3D animations! [Free Blender + SDXL]
8 months ago 00:22:19 1
When Optimisations Work, But for the Wrong Reasons