Who am I?

Hello, my name is Bhawesh Mehta. I am a data engineer.

Prior to my current role, I worked as a business analyst, where I developed a strong foundation in data analysis and automation.

I am passionate about data engineering and enjoy staying up-to-date with the latest developments in the field.

Introduction To Data Engineering

Difference Between Various Roles in Data Field

Difference between Data Engineer,Data Analyst,Data Scientists,Business Analyst and Business Intelligence Analyst

Data Engineers Ensures:

Data is HIghly Available
Consistent
Secure
Recoverable

Data Scientist and Analyst make use of Data that Data Engineers Provide

Data Engineers work with other data professionals to ensure data matches their needs

Responsibilities of Data Engineers

Technical Skills Required

Role of Data Engineering in Customer Sentiment Analysis

Extracts data from various data sources like social media,ecommerce portals and blogs through api or web scraping.
Stores that data in temporary storage.
Do some sort of data manipulation with tools like Python.
Stores this processed data in Databases.
This cleaned format of data is then used by data analyst,business analyst and end users .

The Above Steps are not the one time activity

Its should be set in automatic pipelines

Data Types

Data Repositories

Data Pipeline

Languages

Reporting Tools

Structured Data

Semi Structured Data

Unstructured Data

Standard File Formats

Delimited Text

XML File Format

PDF

JSON

Sources of Data

Popular API’s

Web Scraping

Data Streams and feeds

Metadata and Metadata Management

Objectives

After completing this reading, you will be able to:
Define what metadata is
Describe what metadata management is
Explain the importance of metadata management
List popular tools for metadata management

What is metadata?

Metadata is data that provides information about other data.

This is a very broad definition. Here we will consider the concept of metadata within the context of databases, data warehousing, business intelligence systems, and all kinds of data repositories and platforms.

We’ll consider the following three main types of metadata:

Technical metadata
Process metadata, and
Business metadata

Technical metadata

Technical metadata is metadata which defines the data structures in data repositories or platforms, primarily from a technical perspective.

For example, technical metadata in a data warehouse includes assets such as:

1-Tables that record information about the tables stored in a database, like:

each table’s name
the number of columns and rows each table has

2-A data catalog, which is an inventory of tables that contain information, like:

the name of each database in the enterprise data warehouse
the name of each column present in each database
the names of every table that each column is contained in
the type of data that each column contains

The technical metadata for relational databases is typically stored in specialized tables in the database called the System Catalog.

Process metadata

Process metadata describes the processes that operate behind business systems such as data warehouses, accounting systems, or customer relationship management tools.

Many important enterprise systems are responsible for collecting and processing data from various sources. Such critical systems need to be monitored for failures and any performance anomalies that arise. Process metadata for such systems includes tracking things like:

process start and end times
disk usage
where data was moved from and to, and
how many users access the system at any given time

This sort of data is invaluable for troubleshooting and optimizing workflows and ad hoc queries.

Business metadata

Users who want to explore and analyze data within and outside the enterprise are typically interested in data discovery. They need to be able to find data which is meaningful and valuable to them and know where that data can be accessed from. These business-minded users are thus interested in business metadata, which is information about the data described in readily interpretable ways, such as:

how the data is acquired
what the data is measuring or describing
the connection between the data and other data sources

Business metadata also serves as documentation for the entire data warehouse system.

Managing metadata

Managing metadata includes developing and administering policies and processes to ensure information can be accessed and integrated from various sources and appropriately shared across the entire enterprise.

Creation of a reliable, user-friendly data catalog is a primary objective of a metadata management model.

The data catalog is a core component of a modern metadata management system, serving as the main asset around which metadata management is administered.

It serves as the basis by which companies can inventory and efficiently organize their data systems. A modern metadata management model will include a web-based user interface that enables engineers and business users to easily search for and find information on key attributes such as CustomerName or ProductType. This kind of model is central to any Data Governance initiative.

Why is metadata management important?

Good metadata management has many valuable benefits. Having access to a well implemented data catalog greatly enhances data discovery, repeatability, governance, and can also facilitate access to data.

Well managed metadata helps you to understand both the business context associated with the enterprise data and the data lineage, which helps to improve data governance. Data lineage provides information about the origin of the data and how it gets transformed and moved, and thus it facilitates tracing of data errors back to their root cause. Data governance is a data management concept concerning the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data, and data controls are implemented that support business objectives.

The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.