|
May 13 · Issue #115 · View online |
|
|
Collaboration Between Data Engineers, Data Analysts and Data Scientists | Dailymotion
Dailymotion has been analyzing their data journey that focuses on how data engineers work with data scientists and data analysts to improve production release. The ongoing challenge is to find the right balance between catering to each specific need and being in a generic customer/supplier relationship.
|
Project workflow for data engineers and analysts.
|
|
Running Amazon Payments analytics on Amazon Redshift with 750TB of data | Amazon Web Services
The Amazon Payments Data Engineering team is responsible for data ingestion, transformation, and the computation and storage of data. They recently switched to Amazon Redshift as a payments data warehouse. This post shows their selection process and how they use Redshift.
|
Amazon Payments data architecture.
|
Amazon Redshift Maturity Survey [Sponsored Content]
We’re hearing more and more about Snowflake and BigQuery. Meanwhile, Amazon Redshift is still the #1 cloud warehouse by marketshare. To help assess where our customers are on their journey to being data-driven we created the Amazon Redshift Maturity Survey and today we’re opening it up for you to take.
|
Each time somebody takes the survey, we'll hit a big gong in our office.
|
Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices | Confluent
The most challenging goal of any application architecture is simplicity, but it is possible to achieve. Neil Avery explores four pillars for enabling scalable development that works across the event-driven enterprise. These pillars minimize `complexity and provide foundational rules for building systems using composition.
|
The four pillars of event streaming architecture applied to Confluent's system.
|
Let’s Build a Streaming Data Pipeline
When Daniel started his job position in a UK government institution, he noticed there’s huge amount of unused data, stored in text files. Eager to learn more about data engineering, he created a streaming data pipeline and also of found ways of making data more accessible. This is his work using Apache Beam and Dataflow.
|
The model on GCP, which was followed in this example.
|
5 Tips for Selecting the Right Data Warehouse
In this article, learn exactly what a data warehouse is and most importantly, how to select the right data warehouse for your company. It also shares a few thoughts on modern data warehousing, different architectures, use cases and vendors.
|
How Tilting Point Does Streaming Ingestion into Delta Lake | The Databricks Blog
Tilting Point is a new-generation games partner that provides top development studios with expert resources, services, and operational support. Their data engineering team is running daily / hourly batch jobs for reporting on game analytics. This post shows how they reached a near real-time reporting in 5-10 min intervals.
|
Architecture showing continuous data ingest into Delta Lake Tables.
|
|
Intro to AWS SageMaker
SageMaker is a tool on the Amazon Cloud for developing and deploying ML models. It promises to ease the process of training and deploying models to production at scale providing three very useful functionalities.
|
Build - train - deploy cycle, simplified with AWS SageMaker.
|
Building Recommender Systems with Azure Machine Learning service
Microsoft recently simplified the way machine learning is done in the Azure cloud. This is an intro to recommender systems, an area where Microsoft’s contribution is amongst the most important open-sourced ones.
|
An excerpt from the wide range of recommender systems supported in Azure.
|
|
5 Common Advanced Analytics Scenarios & Resources for Tableau
Tableau invests in advanced analytics (techniques and tools typically beyond those of traditional business intelligence) so that you can get to the root of your questions, no matter how complex they become. This article will focus on Tableau’s approach to predictive analytics, segmentation, what-if analysis, and more.
|
Tableau allows automatic clustering for data segmentation.
|
|
Scaling Product Design with Blueprint | Palantir
Blueprint is a newly open-sourced design system by Palantir, implemented as a collection of composable React components and optimized for desktop applications. It has changed the way Palantir scales design and frontend engineering by providing a systematic way to apply unified and consistent design across a diverse set of teams.
|
|
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
650 California St., San Francisco, CA 94108
|