The New AWS Glue DataBrew: A Visual Data Preparation Tool

Nov 27, 2020 | Article

Amazon Web Services (AWS) recently launched DataBrew, a no-code visual data preparation tool that helps its users clean and normalize data up to 80% faster. As an extension of AWS Glue, DataBrew intends to make data prep easier and accessible through its interactive visual interface so its users can better focus on the business value.

Data scientists and analysts spend a significant amount of their time cleaning, transforming, and preparing data for analysis or training machine learning models. Several vendors have attempted to automate this process to reduce the time spent on data prep. AWS Glue DataBrew is the newest addition to that list. Besides providing a no-code visual interface, it also lets its users choose from over 250 built-in functions to explore, combine, pivot, and transpose data. Data transformations that require advanced machine learning techniques such as natural language processing is also provided in it.

DataBrew supports CSV, JSON, Parquet, or .XLSX data formats stored Amazon Simple Storage Service (S3), Amazon Redshift, Amazon Relational Database Service (RDS), or any other JDBC accessible data store. It can also connect to data indexed by the AWS Glue Data Catalog. Users begin working by creating a project in the DataBrew console, where they can visually explore the data, look for patterns, or use functions to manipulate data. Once the data is ready, the users can straightaway start gaining insights from it using AWS or third-party services, including Amazon Sagemaker and Tableau.

Functionalities of AWS Glue DataBrew

  • Clean and Normalize:
    Its interactive, point-and-click visual interface provides over 250 built-in transformations to easily visualize, clean, and normalize data.
  • Profile Data:
    DataBrew enables its users to profile their data by generating more than 40 statistics about the datasets. It makes understanding the data patterns and detecting anomalies much easier.
  • Map Data Lineage:
    It helps track the various data sources and transformation steps that the data has been through by providing a visual map of the data’s journey.
  • Automate:
    Users can save their transformations or recipes for later use on the incoming data, thereby automating these tasks. They can also share these recipes with others.

DataBrew aims to let its users focus on getting the right insights from data instead of writing code by providing a visual environment for data prep, which is easy to use and accessible for many users. It is  generally available today in Northern Virginia (US), Ohio (US), Oregon (US), Ireland (EU), Frankfurt (EU),  Tokyo (Asia Pacific) and Sydney (Asia Pacific).

Share This Post

You May Also Like:

SQL Cheatsheet for Business Users

Whether you’re a business user newly learning SQL or a Data Analyst who is already a pro, it never hurts to keep a handy reference guide for a quick peek the next time you’re writing an SQL query!

Read More »

Find out how We can help with your Analytics needs