Capturing Expert Knowledge to Guide Data Flow and Structure Analysis of Large Corporate Databases

Gergő Balogh, Tamás Gergely, Árpád Beszédes, Attila Szarka and Zoltán Fábián
Maintaining and improving existing, large-scale systems, that are based on relational databases has proven to be a challenging task. Among many other aspects, it is crucial to develop actionable methods for estimating costs and durations in the process of assessing new feature requirements. This is a very frequent activity during the evolution of large database systems and data warehouses. This goal requires the analysis of program code, data structures and business level objectives at the same time, which is a daunting task if made manually by experts. Our industrial partner started to develop a static database analysis software package that would automate and ease this process in order to make more accurate estimations. The goal of this work was to create a quality assessment model that can effectively help developers to assess the data flow (lineage) quality and the database structure quality of data warehouse (DWH) and online transaction processing (OLTP) database systems. Based on the relevant literature, we created different models for these two interconnected topics, which were then evaluated by independent developers. The evaluation showed that the models are suitable for implementation, which are now included in a commercial product developed by our industrial partner, Clarity.

Keywords: database systems; data warehouses; cost estimation; software quality models; data flow; database structure; data lineage.