View profile

SF Data Weekly - Most Clicked Posts of 2018

2018 has been an amazing year, and our newsletter had a x 10 increase in subscribers! As a thank you
January 2 · Issue #98 · View online
SF Data Weekly
2018 has been an amazing year, and our newsletter had a x 10 increase in subscribers!
As a thank you to all our dedicated readers, we’ve prepared a round up of the most clicked posts of the year. Enjoy reading!

A Beginner’s Guide to Data Engineering — Part I – Robert Chang
Building The Analytics Team At Wish – Wish Engineering And Data Science – Medium
Scaling Decision-Making Across Teams within LinkedIn Engineering | LinkedIn Engineering
Rules for Crash Course Data Engineering – Data Syndrome
Laying the Foundation for a Data Team | Monzo
Software Architecture: Architect Your Application with AWS
ETL vs ELT or Data Warehouse vs Data Lake – Xplenty Blog – Medium
The Surprising Link between a Company’s Growth and its Data Analytics Strategy
Batch and Streaming in the World of Data Science and Data Engineering
The State of Data Engineering
Data Pipelines
Functional Data Engineering — a modern paradigm for batch data processing
Thorough Introduction to Apache Kafka™ – Hacker Noon
Understanding Apache Airflow’s key concepts – Dustin Stansbury
Getting started with Elasticsearch in Python – Towards Data Science
Stream Processing Made Easy With Confluent Cloud and KSQL
Streaming in the Clouds: Where to Start | Confluent
How Pinterest runs Kafka at scale – Pinterest Engineering – Medium
Building data infrastructure in Coursera – Zhaojun Zhang
How Disney Built a Pipeline for Streaming Analytics
Processing streams of data with Apache Kafka and Spark: ingestion, processing, reaction, examples
Is Batch ETL Dead, and is Apache Kafka the Future of Data Processing?
How we built a data pipeline with Lambda Architecture using Spark/Spark Streaming
Overview of Mozilla's Data Pipeline - Firefox Data Documentation
Hands on: Building a Streaming Application with KSQL | Confluent
Democratizing Stream Processing with Apache Kafka® and KSQL - Part 2
Airflow 101: Start automating your batch workflows with ease
Building a Big Data Pipeline With Airflow, Spark and Zeppelin
Data pipelines, Luigi, Airflow: everything you need to know
Data Storage
Hadoop 3: Comparison with Hadoop 2 and Spark – ActiveWizards: machine learning company – Medium
Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift | Amazon Web Services
Doing Without Databases in the 21st Century – codeburst
Migrating Messenger storage to optimize performance - Facebook Code
Optimizing BigQuery: Cluster your tables – Google Cloud Platform - Community – Medium
Why SQL is beating NoSQL, and what this means for the future of data
Amazon Redshift Spectrum: Diving into the Data Lake! -
Snowflake’s Cloud Data Warehouse — What I Learned and Why I’m Rethinking the Data Warehouse
How to Get Started With AWS Spectrum in Minutes
Streaming Messages from Kafka into Redshift in near Real-Time
How to architect the perfect Data Warehouse – Lewis Gavin – Medium
Top 14 Performance Tuning Techniques for Amazon Redshift -
Time Series Data and MongoDB: Part 2 – Schema Design Best Practices | MongoDB
How We Reduced Our Amazon Redshift Cost by 28% -
The SQL vs NoSQL Difference: MySQL vs MongoDB – Xplenty Blog – Medium
Improving Amazon Redshift Performance: Our Data Warehouse Story [Udemy]
Data Analysis
Give meaning to 100 billion analytics events a day – Teads Engineering – Medium
Get Started with PySpark and Jupyter Notebook in 3 Minutes
Google’s AutoML will change how businesses use Machine Learning
Twitter meets TensorFlow
What is TensorFlow? An Intro to The Most Popular Machine Learning Framework -
Productionizing ML with workflows at Twitter
Reinforcement Learning: A Deep Dive | Toptal
Learn ML Algorithms by coding: Decision Trees – Lethal Brains
Using Apache Spark for large-scale language model training - Facebook Code
Tutorial: Connecting Tableau to your data warehouse for analytics
Using Deep Learning at Scale in Twitter’s Timelines
Machine Learning @ Teads (part 2) – Teads Engineering
10 Myths of Enterprise Python | PayPal Engineering Blog
Get Started with Deep Learning Using the AWS Deep Learning AMI | AWS Machine Learning Blog
A/B Testing: The Definitive Guide to Improving Your Product
Apache Spark Introduction for Beginners
Gradient Boosting Libraries — A Comparison -
Deploying a Keras Deep Learning Model as a Web Application in Python
Data Visualization
Visualizations on Apache Kafka® Made Easy with KSQL
Creating a Data Visualization GraphQL Server with a Loosely Coupled Schema
Interactive Data Visualization with D3.js – Towards Data Science
An overview of every Data Visualization course on the internet
4 More Quick and Easy Data Visualizations in Python with Code
The Gender Balance of The New York Times Best Seller List
Exploring Movie Data with Interactive Visualizations
Data-driven Products
Food Discovery with Uber Eats: Building a Query Understanding Engine | Uber Engineering Blog
Post-Retirement Calculator: Will My Money Survive Early Retirement? Visualizing Longevity Risk - Engaging Data
City Health Dashboard
3 Phases of Building a Data Product — Juice Analytics
Exclusive: Fitbit's 150 billion hours of heart data reveal secrets about health
Did you enjoy this issue?
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
650 California St., San Francisco, CA 94108