Common Characteristics in a Big Data Project – The Data Engineer

Availability, fragmentation, and heterogenous data

The first point to cover in any big data project is to review our data sources; whether they are internal or from external providers, it is important to know the nature, granularity, and volume of the data. At this point in the process, the Data Engineer is the one who should ask the appropriate questions that consolidate the hypothesis that sustains the project in order to identify what data we need and how we need it.

Unified Data Model

Data is a minimal unit of information, and, through its analysis, we can extract information relevant for decision making. We should consider two aspects to be able to model the data in a structured way:

·       Qualitative or quantitative analysis generated by human interaction, whether it is during data registration or validation. This aspect implies a data source with incorrect values or unrelated derivatives of its own nature that hinder subsequent treatment.

·       Lack of universal criteria to align the granularity of the data since the information can be represented in many ways and not have a single or universal criterion that disperses the information.

At this point, we again turn to the Data Engineer, who is responsible for bringing order to the chaos of data, to unify, categorize, and prepare it so that Artificial Intelligence Algorithms can handle it.

Functions of the Data Engineer

The capture of large volumes of data, both internal and external, and their processing to unify and debug them is the backbone of any big data project. This process occupies a large amount of the time dedicated to the project and is fundamental in guaranteeing its success. We will highlight the following functions of a Data Engineer:

·       Guarantee the quality of the extracted conclusions given the mutability of the data at the source.

·       Constant data provisioning to/from the Data Lake through production process development.

·       Design and development of data processing software, as well as evolutionary and/or corrective.

·       Design and implementation of APIs that allow for making the most of insights obtained after data processing.

Ver más artículos relacionados con Blog

Official Recognition of MIU City University Miami in Belize

Creado el: 1 de October de 2024

14 de January de 2025

Official Recognition of MIU City University Miami in Belize

Official Recognition of MIU City University Miami in Belize We are thrilled to announce that MIU City University Miami has […]

MIU

Blog

UNIR Presents the First Declaration for the Ethical Use of AI in Higher Education, at UNESCO

Creado el: 1 de October de 2024

25 de October de 2024

UNIR Presents the First Declaration for the Ethical Use of AI in Higher Education, at UNESCO

UNIR Presents the First Declaration for the Ethical Use of AI in Higher Education, at UNESCO Prof. Daniel Burgos, Vice-rector […]

MIU

Blog

UNIR Innovation Day Miami: Chema Alonso and Iker Casillas Lead the Charge for Cibersecurity and AI Education

Creado el: 1 de October de 2024

29 de October de 2024

UNIR Innovation Day Miami: Chema Alonso and Iker Casillas Lead the Charge for Cibersecurity and AI Education

Chema Alonso and Iker Casillas advocate for quality education to tackle the challenges of Cybersecurity and AI at UNIR Innovation […]

MIU

Blog