Technical and massive
data lineage

{openAudit}, a fusion of the analysis
of data lineage and uses of information, to
map an information system,
and transform it:
IT debt reduction, Cloud migrations.


{openAudit} thanks to its incomparable capacity for technical introspection, allows us to achieve this objective: to understand the uses of data and simplify our legacy, to accelerate our Cloud migration"

offers impact analysis + data lineage capabilities with a PL/SQL parser (our flagship techno for our ETL flows), a parser that we have never encountered in other tools. It also allows us to do the cleansing that are required in our code, with excellent results.”

{openAudit}, an easy-to-use tool that provides a quick and clear view of our SAP/Microsoft environment, but also provides a very useful data lineage for impact analysis, which is essential for our compliance topics.”

Methodology: dynamic analysis of 5 stacks

Data inventory:

A "data catalog" with files, flows, data sets: all physical data persisted or in memory, views, reports...

Probes and log parsing:

For consumption and injection of data.

Introspection of the dataviz layer:

> Know the link between technical and business informations,
> Gather intelligence (business rules),
> Propagate business terms to the underlying layers to keep a “business reading” of the processes.

Reverse engineering on the code:

For end-to-end technical granular data lineage, synchronized with the IS.


> All analyzes are carried out daily, in delta mode, so that openAudit® is permanently synchronized with the IS.
> openAudit®, it also an open databases and web interfaces, on premise or in SaaS.
> We provide APIs or Web Components as needed - to be used at the customer's convenience.

Scheduler parsing:

Understand scheduling to link it to data lineage and data uses.

Our partner "Dawizz" allows us to also address the "data catalog" part
with a fully automated solution.


3 major use cases

Use cases #1:

Map an information system

Teams change, technologies pile up, volumes explode.
{openAudit} is a software that democratizes data governance: {openAudit} performs an exhaustive reverse engineering on all internal processes to share with everyone a fine and objective reading of the information system.

Use cases # 2:

Erase IT debt

To lower maintenance, to reduce licensing costs, to facilitate technical migrations or to move towards green IT, {openAudit} allows massive and iterative simplifications of information systems, by identifying unused, replicated elements , inoperative, on premise or in the cloud.

Use cases #3:

Migrate to the Cloud

Migration projects are recurrent in companies, whether for tool-to-tool migrations, or to bring complete systems to the cloud.
{openAudit} allows these migrations to be carried out quickly and precisely by automating processes, while limiting regressions.

Use case #1 - Map an information system

1-1 Data lineage and the uses of information

{openAudit} makes it possible to understand end-to-end multi-technology data flows, on premise and in the Cloud, including in the dataviz layer which aggregates countless management rules.
{openAudit} allows you to go into the details of the flows and to have an overview of the code, the scheduling, and above all the uses of the information (ad hoc queries, dataviz tools).
{openAudit} offers different graphical display modes for its data lineage: an "end to end", without transformations, up to an ultra granular data lineage, at a field level, which details all the transformations.

Use cases: share a detailed understanding of the construction of feeding chains within data teams, identify breaks and fix them, BCBS 239 (Basel III), GDPR…

Data lineage and the uses of information

A true end-to-end, exhaustive and fully automated data lineage, which presents multiple views according to needs.

Resolving data lineage breaks

> Views: if they are stored, {openAudit} will read them, even if they are stacked (views of views of views...).

> Dynamic SQL: If {openAudit} fails to resolve it directly, the dynamic SQL is resolved with runtime parameters or runtime logs.

> Others: in case of transfer of information by FTP, or when the database schema is not specified (this is the case in many ETLs), {openAudit} resolves these breaks by structural recognition, where {openAudit} reads the Batch / or the Shell.

Dynamically combine different data transformation technologies:

> Data lineage in the dataviz layer: {openAudit} will present all the transformations of the dashboard, from the feeding layers to the dashboard cell, and will allow you to review all the management rules implemented.

> Data lineage in the feedin layers: {openAudit} analyzes all processing technologies (object/procedural language, ELT/ETL), on premise or Cloud, and combines them in a single data flow, at the level thinner. The drill through allows access to the code.

> The process is dynamic, operated in delta mode, daily, and therefore synchronized with the information system.

Different levels of analysis for feeding layers :

> Cloud of points: this view allows you to instantly know the uses of a datapoint by disregarding transformations. It is also possible from a use (a dashboard, a data from a dashboard), to instantly identify its operational sources.

> Mapping: this view allows from any datapoint (field, table) to display a complete mapping of the upstream or downstream flow, i.e. from the operational sources to the exposure of the data (dataviz, query). The information used is highlighted, and the uses of the information are specified on the flyover (who consults the data, when, how).

> Granular data lineage: this view makes it possible to gradually follow the deployment of data in the information system from a datapoint, by iterative clicks, or on the contrary to go back to operational sources. Each transformation (ELT/ETL job, procedural/object code) can be analyzed with the “drill through”. The precise details of the uses of the data (who consults it, when, how, etc.) are defined with a single click.

Use case #1 - Map an information system

1-2 A multi-technology impact analysis

{openAudit} allows you to understand all of the company's dataviz technologies on a single impact analysis interface: it is a filtered grid that allows you to perform quick analyzes to understand what are the interactions between each of the elements used in the dashboard, and the source physical field (or the view). This goes from the dashboard cell, to the query that queries the database, to the semantic layer if there is one, etc.
Data lineage in the dashboard or in the feeding layers can be triggered from this interface.

Use cases:
Instantaneous definition of the impacts of a data in all the dataviz technologies, instant sourcing of data from a dashboard, etc.

impact analysis

1-2 A multi-technology impact analysis

Some dataviz technologies use semantic layers to create intelligibility for the business, and thus give it autonomy. These semantic layers create abstraction: the underlying physical fields are difficult to identify, which makes sourcing complex.

Furthermore, dataviz technologies often query views, views of views… which again complicates sourcing.

As dataviz technologies multiply, real multi-technology impact analyzes (or sourcing) are complex to operate.

Technical answers:

> {openAudit} operates a data lineage in the dataviz layer, in the expressions, in the variables, etc., to identify the fields directly or indirectly in source of a datapoint of the dataviz layer.

> {openAudit} analyzes the content of views to identify the physical fields that are the source of data for a dataviz layer, even if the views are stacked.

> {openAudit} combines the analyzes of the different dataviz technologies in the same grid, which will allow business and IT to carry out impact analyzes between all the feeding layers and all the dataviz tools . Simply.

> In addition, {openAudit} makes it possible to specifically over-administer SAP BO, PowerBI and Qlik Sense: detection of useless dashboards, replicated, unused objects, in order to restore clarity to users, simplify a platform, possibly decommission it.

Use case #2 - Erase IT debt

2-1 Optimize/fix feeding chains

{openAudit} analyzes scheduler logs and dynamically identifies which jobs are operational, which are not, which are slow, which are decelerating, etc.
{openAudit} offers direct access to the content of all ELT/ETL jobs or procedures triggered, to go into the details of the flow to zoom in on the offending code.

Use cases :
Dynamic monitoring of the quality of the run, fixing of breaks in the feeding chains, massive simplification of the production plan for an efficient and less energy-consuming run.

data lineage

Optimize/fix feeding chains

IT teams plan countless procedures or tasks that overload the run. There is an ever-increasing risk of partial run execution, or even run failure, and it is exceedingly difficult to map the production plan to the feeding chains themselves.

Technical answers

> {openAudit}, thanks to the dynamic analysis of audit logs and the dataviz layer, makes it possible to know what users are really consuming.

> {openAudit} highlights the myriad of scheduled ETL/ELT tasks/procedures, without the consumption of information at the end of chains. The run can thus be simplified.

> This monitoring allows you to continuously know which jobs of the scheduler are dysfunctional (slow / ineffective), with an analysis of the content of the ETL-ELT procedure / job, a granular impact analysis of the ETL procedure / job - ELT, a drill through on the procedure / ETL-ELT job.
It becomes possible to intervene quickly at the source of the problem in the faulty chains.

Use case #2 - Erase IT debt

2-2 Detect the dead matter of the information system

Thanks to a permanent analysis of the audit logs, associated with the data lineage, {openAudit} identifies all the unused elements in the feeding layers (tables, files, procedures, ETL/ELT job).
On average, 50% of the information in the information system has no added value (replicated, obsolete), which has considerable impacts: system inertia, unnecessary costs, etc.

Use cases:
Decommissioning to prepare for a migration, streamlining a system to limit maintenance, reducing licensing, green IT projects.


Detect the dead matter of the information system

A large part of the content of information systems has no added value (replicated, obsolete), with significant impacts: maintenance, licenses, technical migrations made impossible.

Technical answers:

> {openAudit} analyzes audit database logs and the dataviz layer to find out what data is actually being used.

> From the fields used in the databases to feed dataviz tools, or adhoc queries (ODBC, JDBC), or from specific ETL/ELT flows, {openAudit} identifies the flows of information, the tables, i.e. the "living branches" of the information system. In contrast, {openAudit} identifies "dead branches", i.e. tables, procedures, ETL/ELT jobs that have no productive function.

> {openAudit} implements these analyzes in a dynamic way, and thus allows, by creating an important depth of history, to formally identify the branches which are continuously unused, with all that they concentrate: tables, files, procedures, ELT/ETL jobs.
Mass decommissioning can take place in record time.

Use case #3 -
Migrate to the Cloud

3-1 Translate procedural/object languages

Technological migrations of object/procedural languages are often so complex that companies prefer to stack technologies rather than decommission them.
However, they struggle to maintain these languages due to a lack of experts capable of reverse engineering.
Nowadays, the craze for the Cloud is changing this paradigm, and more and more companies are looking to get rid of these legacy languages quickly, with no other solution than to start hazardous and costly migrations.

Use cases:
DBMS change, Cloud migration, maintainability of a legacy system, etc.


Translate procedural/object languages

Large companies have always accumulated processing technologies. There is a continuous piling up, because the removal of a technology often presents too many risks. But the skills associated with it are becoming scarce, and retro-documentation is rarely in place.
At some point, companies have to get started! These can be consultancy projects, long, expensive, and risky. We think it's better to automate the process.

Technical answers

> {openAudit} will "parse" the source code, it will break down all the complexity of the code using a grammar allowing exhaustive and ultra-granular analyses. All subtleties will be taken into consideration,

> {openAudit} deduces the overall kinematics and intelligence, which will be reconstructed in an algorithmic, agnostic tree. On this basis, {openAudit} will produce "standard SQL",

> Then the intelligence will be reconstructed at least in the specific SQL of the target database (e.g. BigQuery for Google, Redshift for Amazon, Azure SQL for Microsoft, etc.),

> All complex processing that cannot be reproduced in simple SQL will be driven by a NodeJS executable. Typically "For Loop" cursors, variables, "If Else" conditional code, "Switches", procedure calls, etc.,

> {openAudit} produces "Yaml" files (intuitive files). Thus, the understanding of complexity is shared with as many people as possible.

> Optionally, new orchestration mechanisms can be implemented, to deconstruct the sliders of sliders (the loops of loops) to optimize the transformation chains.

Use case #2 -
Migrate to the Cloud

3-2 Translate dataviz technologies

How to decommission an outdated dataviz technology because it is too static, expensive, incompatible with the target architecture, especially in the Cloud? How can we go serenely towards the tools of tomorrow, also acclaimed by business lines?

{openAudit} allows almost automated migrations between different dataviz technologies, to save infinite time and avoid damaging regressions, and quite simply to enable these projects!

Use cases:
Migrate SAP BO to Looker, Data Sudio or PowerBI, migrate Qlik Sense to Power BI, etc., many scenarios are possible!


Translate dataviz technologies

Most dataviz tools have two things in common: a semantic layer that interfaces between IT and the business, and a dashboard editor.
We rely on the automated reverse engineering of {openAudit} to deconstruct the complexity in source, allowing us to re-address it in the target technology.


> {openAudit} will be able to feed the target technology from the single semantic layer, a kind of pivot model. This model will have been generated automatically from the dataviz tools to be decommissioned,

> The structure of the initial dashboard will also have been analyzed by {openAudit}, and it can also be transcribed into the target technology,

> Thus sprawling migration projects, difficult, or impossible to implement can be implemented in record time.

A 6' film to discover some features

About Ellipsys:

Ellipsys was founded by Samuel Morin in 2013. The idea is that Information Systems get bigger, more complex and more heterogeneous as technologies accumulate and users multiply. Ellipsys’s promise was to automate the analysis of these IS, empower teams to improve them, make them simpler, easier to migrate... This remains our ambition, and our know-how has strongly developed around data lineage in parlicular, so that we can now tackle many architectures!
The team is made up of several high-level engineers, all keen on research and development ... and customer impact!


Conference: how ADEO / Leroy Merlin transforms its IS thanks to technical Data Lineage !

The ADEO/Leroy Merlin group led a workshop in front of 150 people at the Big Data Paris 2022 fair to explain how it led the transformation of its Information System (simplification / GCP migration) based on data lineage, and why openAudit is essential within a Data Mesh architecture.


The strengths and limitations of SQL ?

You find SQL absolutely everywhere, including in the business layers, encapsulated, or “attached” to databases, or dataviz tools. But for teams in charge of Information Systems governance, reverse engineering to deconstruct information flows is often a challenge.


Data lineage in dataviz layer, PowerBI, QlikSense…

A 2020 Basel Committee report indicates that despite some progress, banks do not always manage to implement the 14 principles of BCBS 239. And overall, it is indeed this subject of "data lineage", and therefore of tracking the data through the systems which is often the source of these difficulties.



* These fields are required.