|
January 7 · Issue #221 · View online |
|
Our pick this week is a Wired piece that looks at “anonymized” data, and what can be gleaned from it. We also have a short introduction to augmented databases, and a short piece on the importance of 100 in data visualization. Stay healthy!
|
|
|
Big Data May Not Know Your Name. But It Knows Everything Else | WIRED
Data brokers claim that deidentified data on millions of Americans is risk-free. This piece argues that anonymized data is not really anonymous – important background for anyone trying to share anonymized customer data.
|
|
Stream Apache HBase edits for real-time analytics | Amazon Web Services
Apache HBase is a non-relational database. To use the data, applications need to query the database to pull the data and changes from tables. This piece introduces a mechanism to stream Apache HBase edits into streaming services such as Apache Kafka or Amazon Kinesis Data Streams.
|
Integrate.io Launch Event | Jan 27th, 2022 | Xplenty
A world-changing product is on the way. Changing the way e-commerce companies power their business with data integration. [Sponsored]
|
|
What’s new in Amazon Redshift – 2021, a year in review | Amazon Web Services
A good overview of the major changes and updates that Amazon made to Redshift in 2021.
|
Augmented Database Management: A Brief Overview | by Mayuresh Joshi | Jan, 2022 | Medium
This piece is a basic introduction to augmented data management, which uses ML and AI techniques to optimize and improve operations.
|
|
8 Guidelines to Create Professional Data Science Notebooks | by Ricardo Carvalho | Dec, 2021 | Towards Data Science
Eight important techniques to make your Jupyter or RMarkdown notebooks more easily maintained and reproducible.
|
|
Why 100 is an important in data visualization. | Upskilling
How the author’s not-quite-full pie chart led her to understand the importance of 100.
|
Creating a better dashboard with Python, Dash, and Plotly | by Brad Bartram | Dec, 2021 | Towards Data Science
How the author created dashboards as part of his hobby of following the commodities and futures markets.
|
When Data Fails To Tell a Story: Data Visualization Mistakes | *instinctools
How to learn from misleading data visualization examples to minimize the risk of bad data presentation that may lead to poor business decisions.
|
|
How Belcorp decreased cost and improved reliability in its big data processing framework using Amazon EMR managed scaling | Amazon Web Services
Belcorp is one of the main consumer packaged goods (CPG) companies providing cosmetics products for 13 countries in North, Central, and South America. This post describes their approach to creating a data lake using an auto scaling EMR cluster.
|
Advancing Jupyter Notebooks at Twitter - Part 1
Twitter Notebook is Twitter’s internal notebook solution for Data Scientist and Machine Learning practitioners at Twitter. This piece describes how Twitter implemented a number of features to improve the notebook experience.
|
|
|
Did you enjoy this issue?
|
|
|
|
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
650 California St., San Francisco, CA 94108
|