|
April 8 · Issue #234 · View online |
|
This week’s pick is a visualization of 50 years of the Oscars - don’t worry, no slaps involved. We also have test data creation with the Faker Python library, and ten recommended processes to find data cavities at scale. Stay healthy!
|
|
|
50 Years of Oscars: Acting Success and Collaboration | Nightingale
Collecting data and diagramming network relationships for 50 years of Academy Award winners.
|
|
Build data lineage for data lakes using AWS Glue, Amazon Neptune, and Spline | Amazon Web Services
Data lineage helps ensure that accurate, complete and trustworthy data is being used to drive business decisions. This piece shows how to use Spline to harvest lineage automatically from Spark ETL jobs.
|
The Ultimate Guide to E-commerce Integrations | Integrate.io
Want to learn the top 5 benefits of e-commerce data integration? Read this article. [Sponsored]
|
|
Anomaly Detection in SQL. How to implement fast, powerful… | by Avi Chad-Friedman | Mar, 2022 | Towards Data Science
Z-scoring is a powerful, minimalist anomaly detection model. This piece shows how to implement it in SQL.
|
|
You Don’t Need Sample Data, You Need Python Faker | by Christopher Tao | Mar, 2022 | Towards Data Science
How to use Python Faker to generate fake names, dummy addresses and other sample data for data science and machine learning projects.
|
No magical toothpaste for data quality cavities | by Sandeep Uttamchandani | Apr, 2022 | Towards Data Science
Just as there’s no toothpaste or other magic potion which will stop dental cavities, there’s no magic bullet for data cavities. This piece describes ten data hygiene processes to identify data quality issues.
|
|
A Simple Guide to Machine Learning Visualisations | by Rebecca Vickery | Mar, 2022 | Towards Data Science
An introduction to creating ML visualizations using Yellowbrick, a Python library designed for the Scikit-learn ML library.
|
|
Detecting silent errors in the wild: Combining two novel approaches to quickly detect silent data corruptions at scale - Engineering at Meta
Facebook’s experience with silent error detection testing at scale, including results from two different approaches.
|
How LyftLearn Democratizes Distributed Compute through Kubernetes Spark and Fugue | by Han Wang | Apr, 2022 | Lyft Engineering
How LyftLearn solves some of the issues with the compute layer of its homegrown machine learning platform.
|
|
Traverse | Data Science Roadmap
A Traverse template for learning data science, by a data scientist from a London fintech startup. Here are this week’s job picks:
|
|
Did you enjoy this issue?
|
|
|
|
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
650 California St., San Francisco, CA 94108
|