View profile

SF Data Weekly -'s Circus Train, High Performance ETL, Spark on YARN, Gradient Boosting Libraries

July 30 · Issue #78 · View online
SF Data Weekly
Our Pick
Doing Without Databases in the 21st Century
Data Pipelines
Big Data for Data Engineers | Coursera
How to Build a Secure by Default Kubernetes Cluster with a Basic CI/CD Pipeline on AWS
The end-state Kubernetes cluster.
Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift | AWS Big Data Blog
An example four-step daily ETL workflow .
Data Storage
Replicating Big Datasets in the Cloud – The Technology Blog
Circus Train: high-level sequence diagram.
Get Sub-second Query Response Times with Amazon Redshift Result Caching | AWS Big Data Blog
Understanding Apache Spark on YARN
Spark Cluster overview: Spark applications being coordinated by the SparkContext.
Data Analysis
Gradient Boosting Libraries — A Comparison |
Best-first/ Leaf-wise expansion of a binary tree used in boosting algorithms.
Reinforcement Learning: A Deep Dive | Toptal
An agent's interactions with the environment.
Data Visualization
Screenshot of a
Interactive Data Visualization with D3.js
Screenshot of an interactive plot on US trade deficit over the years.
Data-driven Products
Talk the Walk: Teaching AI systems to navigate New York through language | Facebook Code
A view of the Talk the Walk interface by FAIR.
Data Engineering Jobs
Data Engineer - Taulia
Senior Data Engineer - Glu Mobile
If you want to post a job for your company, you can do it here.
Did you enjoy this issue?
If you don't want these updates anymore, please unsubscribe here
If you were forwarded this newsletter and you like it, you can subscribe here
Powered by Revue
650 California St., San Francisco, CA 94108