|
April 29 · Issue #113 · View online |
|
|
The Case for Database-First Pipelines
Quite a few data architectures involve both a database and Apache Kafka. There is always the question of the order of integration: DB then Kafka, Kafka then DB, or both in parallel. Hint: the third tends to be a terrible decision.
|
|
Apache Kafka vs RabbitMQ - Introduction to Message Brokers
Message brokers are especially important for data analytics and business intelligence. This post looks at 2 big data tools: Apache Kafka and RabbitMQ. While Kafka is a powerful event streaming platform capable of handling trillions of messages a day, RabbitMQ is also useful for smaller-scale, quick and easy data streaming apps.
|
Messaging process: queuing.
|
How Google’s Anthos Is Different from AWS and Azure Hybrid Clouds
There’s a new trend in town: create distributed applications that work across different platforms both on-premises and in the cloud. Google brings a new game to town, different from Azure and AWS, with the recent announcement of its Anthos product hitting general availability.
|
Jose Roca: Taming the Logs Jungle
José Roca is a Director of Product at Prezi. Within Prezi he is responsible for productizing the areas of data, security and infrastructure to enable Prezi users and internal developers get the most of those platforms. This is a video of his presentation at Impact, a product management conference on working with logs data in games.
|
|
Want to Use BigQuery? Read This
BigQuery is the public implementation of Dremel that was launched by Google to general availability. BigQuery then uses it via a REST API, a command line interface and a Web UI. This is a brief explanation of the technology behind BigQuery, such as its columnar storage and tree architecture.
|
Open Sourcing Delta Lake | The Databricks Blog
Databricks is launching open source project Delta Lake, which the company says is their biggest innovation to date, bigger even than its creation of the Apache Spark machine learning library. Delta Lake is a storage layer that sits on top of data lakes to ensure reliable data sources for machine learning.
|
|
Inside the Machine Learning Powering LinkedIn Recruiter Recommendation Systems
LinkedIn Recruiter is the product that helps recruiters build and manage a talent pool that optimizes the chances of a successful hire. The initial search and recommendation experience in LinkedIn Recruiter was based on linear regression models, while later on it was improved with Gradient Boosting Decision Trees (GBDTs).
|
The architecture of LinkedIn Recruiter Recommendations system.
|
What Makes Apache Druid Great for Realtime Analytics?
Apache druid is one of the most popular open-source solutions for Online Analytical Processing (OLAP). It’s used by many tech companies, such Airbnb and Netflix. This post explains how it enables exploration of realtime data and historical data while providing low latencies and high availability.
|
A schematic view of Druid's philosophy.
|
|
Here’s the Visual Proof of Why Vaccines do More Good than Harm
Here’s an interesting interactive visualization of how vaccines beat back nine dangerous infectious diseases over the years.
|
A snapshot of the visualization on 9 infectious diseases over the years.
|
|
Will it Scale? Let’s Load Test Geohashing on DynamoDB
Geospatial data is a fancy way of describing items containing a latitude and longitude, but handling queries based on the nearest, furthest or within a certain distance has some complexity. DynamoDB makes this easy using a small geocoding NPM package. This example takes the approach to its limits.
|
|
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
650 California St., San Francisco, CA 94108
|