|
January 15 · Issue #50 · View online |
|
|
Building Data Infrastructure in Coursera
In Coursera, data plays a big role, with over 27M learners on the platform. This post explains the 3.5 year long process of building its data infrastructure from scratch.
|
Data infrastructure orchestration.
|
|
Stream Data into an Aurora PostgreSQL Database Using AWS DMS and Amazon Kinesis Data Firehose
When your business needs to create a data pipeline for database migration of streaming data, the AWS Database Migration Service (AWS DMS) and Amazon Kinesis Data Firehose are the right tools for you.
|
High-level view of the implemented data pipeline.
|
Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena
This article, regarded as one of the best Big Open Data Stories from 2017 in AWS, explains how to get the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.
|
The data pipeline broken down in five steps.
|
|
Postgres Internals: Building a Description Tool
To understand what efficiency means in Postgres, it’s important to learn how Postgres works under the hood. This post looks at how Postgres stores its own internal data for describing, debugging, and identifying the bottlenecks in a system.
|
Amazon Redshift Spectrum: Diving into the Data Lake!
With Amazon Redshift Spectrum you can query data in Amazon S3 without first loading it into Amazon Redshift. Spectrum adds one more tool to your Redshift-based data warehouse investment. You can now use its power to probe and analyze your data lake on an as-needed basis for a very low per query price.
|
Implementing a Large AWS Data Lake for Analysis of Heterogeneous Data
C4ADS partnered with ClearScale to develop and implement a Data Lake solution for analysis of heterogeneous data on AWS. This post gives a high-level view of the built system, its challenges, benefits, and future plans.
|
System architecture diagram.
|
|
Understanding Feature Engineering (Part 1)
Feature engineering is an essential part of building any intelligent system, so far that some would say - applied machine learning is basically feature engineering. This post focuses on engineering of continuous numeric data.
|
Machine learning pipeline.
|
Complex Event Processing with Flink on realtime Twitter data
Complex Event Processing (CEP) is a process for detecting patterns in streaming data and sending alerts or notifications based on these patterns. Apache Flink is the tool to simplify this for you.
|
|
Intermediate Tableau Guide for Data Science, BI Pros
Tableau is the most popular visualization tool which doesn’t require coding. In this post, explore the core functionalities of Tableau such as Joins, Groups, Sets, Calculation Fields, Parameters etc and use them to make informative dashboards.
|
An example visualization using Tableau.
|
|
Five Steps to Take Before Kicking Off A Clickstream Data Initiative
Collecting data about the customers of your web-based business? This is a quick read on the five questions you should answer to yourself before putting data to work.
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here
If you were forwarded this newsletter and you like it, you can subscribe here
|
|
650 California St., San Francisco, CA 94108
|