|
January 29 · Issue #52 · View online |
|
|
The Death of Microservice Madness in 2018
Microservices became a very popular topic in over the last couple of years, but many don’t understand its costs and benefits. This article describes in detail what microservices are, their pros and cons, and key challenges.
|
A video sharing platform example.
|
|
Top 8 Best Practices for High-Performance ETL with Redshift
Amazon Redshift is a fast, petabyte-scale data warehouse that enables you easily to make data-driven decisions. This post guides you through some of the best practices for ensuring optimal, consistent runtimes for your ETL processes - workload management, ad-hoc ETL, copy data from multiple sources, monitoring…
|
A four-step daily ETL workflow.
|
Insight Data Engineering Ecosystem Map
A tool produced a while ago, but still very useful for understanding different aspects and tools of a data pipeline from data ingestion, to processing to storing. A great place to pick the tools you need.
|
The interactive map of a typical data pipeline.
|
This One Weird Trick Will Simplify Your ETL Workflow
The extract, transform, load (ETL) processing of data, probably takes the largest portion of time in building machine learning models. This post gives a way to make your ETLs more maintainable and easier to write using a web tool - jinja2.
|
|
LinkedIn's Gobblin: An Open Source Framework for Gobbling Big Data with Ease
LinkedIn’s Gobblin evolved from a data ingestion framework for offline big data, to a distributed data integration framework for complete lifecycle management across both streaming and batch environments, by recently adding support for Helix clusters and Kafka streams. It’s actively used by Apple, Paypal, CERN etc.
|
Gobblin-as-a-Service
|
Want a Job in Data? Learn This.
Yes, mastering a 50-year-old programming language is the key to getting a data science job. These are the reasons why SQL is everywhere, and it’s not seeming to go away any time soon.
|
Most mentioned data tools in 25,000 jobs advertised on Indeed.
|
|
Doing Data Science at Twitter
Data Science has changed a lot in the last few years at Twitter, and it also turns out to include a pretty big portion of data-related responsibilities, such as: product insights, data pipelines, statistical experimentation, machine learning etc. The author describes all these different aspects through his own experience.
|
How to Datalab: Running Notebooks Against Large Datasets
If you can’t bring your data to the your compute, bring your compute to the your data! The Google Cloud Datalab is a tool built on top of Jupyter notebook allowing easy integration with your BigQuery datasets and Google Cloud storage. Here’s how to get started with it.
|
A Google Cloud Datalab notebook.
|
|
Crafting Data-Driven Maps | Uber Design
This is how Uber crafts maps to visualize millions of data points, monitor road conditions, and advocate for policy change. The core issue to address is building a unified, consistent set of maps across various use cases.
|
Scatter plots and Hex bins showing concentration of Uber trip activity.
|
Marriage? Maybe later. | Uncharted
Data visualization of the week - an example of a perfectly done interactive data chart! This time on historical marriage data across the United States.
|
Percentage of married people across different age groups over time.
|
|
I’ve Simulated the Bitcoin Price for the Whole 2018. You Won’t Believe the Result!
The author of the post does a Monte Carlo simulation on the daily returns of the USD bitcoin price to try to know what will be its most likely price by the end of 2018.
|
The distribution of Bitcon price, as predicted fo 2018.
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here
If you were forwarded this newsletter and you like it, you can subscribe here
|
|
650 California St., San Francisco, CA 94108
|