View profile

SF Data Weekly - Redshift Spectrum & Data Lakes, Spark ETLs, Alibaba Cloud, Kafka for Developers

June 4 · Issue #70 · View online
SF Data Weekly
Our Pick
Amazon Redshift Spectrum: Diving into the Data Lake
The Dark Data Problem. Source: Amazon AWS.
Data Pipelines
Orchestrate Apache Spark Applications Using AWS Step Functions and Apache Livy
High level system diagram: Livy is used to connect to Spark in Amazon EMR.
AWS Data Pipeline: An Introduction to a Platform for Solving Complex Data Pipeline Headaches
Down with Pipeline Debt / Introducing Great Expectations
How do you know if the data in your data lake is correct?
Data Storage
Migrate to the MySQL-compatible Edition of Amazon Aurora While Protecting Personal Data Using Encryption
Introducing Self-Service Apache Kafka for Developers
Confluent's solution of bringing streaming data to the apps.
Redis vs. Memcached: In-Memory Data Storage Systems
Data Analysis
Practical Apache Spark in 10 Minutes. Part 2 — RDD
Google Cloud Dataprep - Data Handling Made Easier | Google Cloud Platform
Job results is a convenient tool for reporting on the success of Dataprep's work.
Data Visualization
Creating a Data Visualization GraphQL Server with a Loosely Coupled Schema
A screenshot of the Ibex project.
Data-driven Products
Data is beautiful: Traffic accidents in the UK
The final product showing number of vehicles and casualties in London, UK.
Data Engineering Jobs
Data Engineer - Microsoft
Intern - Data Engineering - Huawei Technologies
At the end of each SF Data Weekly issue you can find job postings that are relevant to all members of our community. 🎉
If you want to post a job for your company, you can do it here.
Did you enjoy this issue?
If you don't want these updates anymore, please unsubscribe here
If you were forwarded this newsletter and you like it, you can subscribe here
Powered by Revue
650 California St., San Francisco, CA 94108