COVID-19 Data Warehouse: A Systematic Literature Review

Ahmed Khaled AbdelLatif, Ahmed Naji Abdullah, Ahmed Munther Abboud, Zahraa Abdulkareem Mohammed, Hisham Noori Hussain, Alaa Khalaf Hamoud


The coronavirus disease (COVID-19) affects the whole world and led clinicians to use the available knowledge to diagnose or predict the infection. Data Warehouse is one of the most crucial tools that may enhance decision-making (DW).In this paper, three main questions will be investigated according to using DW in the COVID-19 pandemic. The effect of using DW in the field of diagnosing and prediction will be investigated, besides, the most used architecture of DW will be explored. The sectors that faced a lot of researchers' attention such as diagnosing, predicting, and finding the correlations among features will be examined. The selected studies are explored where the papers that have been published between 2019-2022 in the digital libraries (ACM, IEEE, Springer, Science Direct, and Elsevier) in the field of DW that handle the COVID-19 are selected. During the research, many limitations have been detected, while some future works are presented. Enterprise DW is the most used architecture for COVID-19 DW while finding correlation among features and prediction are the sectors that had taken the researchers' attention


COVID-19 Data Warehouse, Data Warehouse, SARS-Cov-2, COVID-19 Data Mart, COVID-19 Infection.

Full Text:



L. L. Wang et al., “Cord-19: The covid-19 open research dataset,” ArXiv, 2020.

M. Shaito and R. Elmasri, “Map visualization using spatial and spatio-temporal data: Application to covid-19 data,” in The 14th PErvasive Technologies Related to Assistive Environments Conference, 2021, pp. 284–291.

G. Agapito, C. Zucco, and M. Cannataro, “COVID-warehouse: A data warehouse of Italian COVID-19, pollution, and climate data,” Int. J. Environ. Res. Public Health, vol. 17, no. 15, p. 5596, 2020.

S. R. Gardner, “Building the data warehouse: the tough questions project managers have to ask their" companies’ executives--and themselves--and the guidelines needed to sort out the answers,” Commun. ACM, vol. 41, no. 9, pp. 52–61, 1998.

H. Cao, M. Markatou, G. B. Melton, M. F. Chiang, and G. Hripcsak, “Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics,” in AMIA Annual Symposium Proceedings, 2005, vol. 2005, p. 106.

M. M. Jaber, M. K. Abd Ghani, N. Suryana, M. A. Mohammed, and T. Abbas, “Flexible data warehouse parameters: Toward building an integrated architecture,” Int. J. Comput. Theory Eng., vol. 7, no. 5, p. 349, 2015.

A. K. Hamoud, M. Abd Ulkareem, H. N. Hussain, Z. A. Mohammed, and G. M. Salih, “Improve HR decision-making based on data mart and OLAP,” in Journal of Physics: Conference Series, 2020, vol. 1530, no. 1, p. 12058.

A. K. Hamoud, H. N. Hussien, A. A. Fadhil, and Z. R. Ekal, “Improving service quality using consumers’ complaints data mart which effect on financial customer satisfaction,” in Journal of Physics: Conference Series, 2020, vol. 1530, no. 1, p. 12060.

C. Saegerman et al., “Clinical decision support tool for diagnosis of COVID-19 in hospitals,” PLoS One, vol. 16, no. 3, p. e0247773, 2021.

S.-Y. Shin, W. S. Kim, and J.-H. Lee, “Characteristics desired in clinical data warehouse for biomedical research,” Healthc. Inform. Res., vol. 20, no. 2, pp. 109–116, 2014.


A. Nanda, S. Gupta, and M. Vijrania, “A comprehensive survey of OLAP: recent trends,” in 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2019, pp. 425–430.

I. Moalla, A. Nabli, L. Bouzguenda, and M. Hammami, “Data warehouse design approaches from social media: review and comparison,” Soc. Netw. Anal. Min., vol. 7, no. 1, pp. 1–14, 2017.

E. Dom’inguez, B. Pérez, A. L. Rubio, and M. A. Zapata, “A taxonomy for key performance indicators management,” Comput. Stand. Interfaces, vol. 64, pp. 24–40, 2019.

Y. Yu et al., “Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration,” J. Biomed. Inform., vol. 127, p. 104002, 2022.

S. J. Stroever et al., “Medication Use Among Patients With COVID-19 in a Large, National Dataset: Cerner Real-World DataTM,” Clin. Ther., vol. 43, no. 6, pp. 173–196, 2021.

T. W. Campbell et al., “Predicting prognosis in COVID-19 patients using machine learning and readily available clinical data,” Int. J. Med. Inform., vol. 155, p. 104594, 2021.

T. T. Helmer et al., “Creating and implementing a COVID-19 recruitment Data Mart,” J. Biomed. Inform., vol. 117, p. 103765, 2021.

J. Poulos, L. Zhu, and A. D. Shah, “Data gaps in electronic health record (EHR) systems: An audit of problem list completeness during the COVID-19 pandemic,” Int. J. Med. Inform., vol. 150, p. 104452, 2021.

Y. Guo, H. Yu, G. Zhang, and D. T. Ma, “Exploring the impacts of travel-implied policy factors on COVID-19 spread within communities based on multi-source data interpretations,” Health Place, vol. 69, p. 102538, 2021.

J. Huang, M.-P. Kwan, and Z. Kan, “The superspreading places of COVID-19 and the associated built-environment and socio-demographic features: A study using a spatial network framework and individual-level activity data,” Health Place, vol. 72, p. 102694, 2021.

T. Kawasaki, H. Wakashima, and R. Shibasaki, “The use of e-commerce and the COVID-19 outbreak: A panel data analysis in Japan,” Transp. Policy, vol. 115, pp. 88–100, 2022.

D. Pérez-Campuzano, L. R. Andrada, P. M. Ortega, and A. López-Lázaro, “Visualizing the historical COVID-19 shock in the US airline industry: A Data Mining approach for dynamic market surveillance,” J. Air Transp. Manag., vol. 101, p. 102194, 2022.

C. K. Leung, Y. Chen, C. S. H. Hoi, S. Shang, and A. Cuzzocrea, “Machine learning and OLAP on big COVID-19 data,” in 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 5118–5127.

O. Duda, V. Pasichnyk, N. Kunanets, R. Antonii, and O. Matsiuk, “Multidimensional Representation of COVID-19 Data Using OLAP Information Technology,” in 2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT), 2020, vol. 2, pp. 277–280.

S. Shang, C. K. Leung, Y. Chen, and A. G. M. Pazdor, “Spatial data science of COVID-19 data,” in 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2020, pp. 1370–1375.

R. M. Harris, “Data Warehousing and Decision Support System Effectiveness Demonstrated in Service Recovery During COVID19 Health Pandemic,” in 2020 14th International Conference on Open Source Systems and Technologies (ICOSST), 2020, pp. 1–5.

U. Thange, V. K. Shukla, R. Punhani, and W. Grobbelaar, “Analyzing COVID-19 Dataset through Data Mining Tool ‘Orange,’” in 2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM), 2021, pp. 198–203.

A. S. Yadaw, Y. Li, S. Bose, R. Iyengar, S. Bunyavanich, and G. Pandey, “Clinical features of COVID-19 mortality: development and validation of a clinical prediction model,” Lancet Digit. Heal., vol. 2, no. 10, pp. e516--e525, 2020.

M. Al-Okaily, H. Alqudah, A. Matar, A. Lutfi, and A. Taamneh, “Dataset on the Acceptance of e-learning System among Universities Students’ under the COVID-19 Pandemic Conditions,” Data Br., vol. 32, p. 106176, 2020.

J. Razjouyan et al., “Differences in COVID-19-related testing and healthcare utilization by race and ethnicity in the veterans health administration,” J. racial Ethn. Heal. disparities, vol. 9, no. 2, pp. 519–526, 2022.

L. Chouchana et al., “Association of antihypertensive agents with the risk of in-hospital death in patients with Covid-19,” Cardiovasc. drugs Ther., vol. 36, no. 3, pp. 483–488, 2022.

A. M. O’Hare et al., “Age differences in the association of comorbid burden with adverse outcomes in SARS-CoV-2,” BMC Geriatr., vol. 21, no. 1, pp. 1–10, 2021.

N. Hoertel et al., “Observational study of chlorpromazine in hospitalized patients with COVID-19,” Clin. Drug Investig., vol. 41, no. 3, pp. 221–233, 2021.

M. Fartoukh et al., “Seasonal burden of severe influenza virus infection in the critically ill patients, using the Assistance Publique-Hôpitaux de Paris clinical data warehouse: a pilot study,” Ann. Intensive Care, vol. 11, no. 1, pp. 1–11, 2021.

K. H. Seal, ; ; Jennifer K. Manuel; Natalie Purcell; A. Rani Elwy Beth, and DeRonne, “DEMOGRAPHIC, SOCIAL DETERMINANTS, AND CLINICAL FACTORS ASSOCIATED WITH COVID-19 VACCINATION COMPLETION AMONG 6.2 MILLION VETERANS IN THE VETERANS HEALTH ADMINISTRATION FROM JANUARY 1, 2020 TO DECEMBER 22, 2021.,” in 2022 Annual Meeting of the Society of General Internal Medicine, 2022.

M. S. J. Smirnova, L. W. L. Weisong, Robert;, K. J. Reisman, and ; Hong Yu; William Becke, “THE ASSOCIATION OF PRESCRIBED LONG-ACTING VERSUS SHORT-ACTING OPIOIDS AND MORTALITY AMONG OLDER ADULTS,” in 2022 Annual Meeting of the Society of General Internal Medicine, 22AD.

G. Targato et al., “THE IMPACT OF COVID-19 PANDEMIC ON ONCOLOGY WORKLOAD IN AN ITALIAN REFERENCE CANCER CENTER,” in MASCC Annual Meeting on Supportive Care in Cancer, 2021.

J. H. Littell, J. Corcoran, and V. Pillai, Systematic reviews and meta-analysis. Oxford University Press, 2008.

P. Hemingway and N. Brereton, “What is a systematic review’, What is Series,” Bandolier, April, 2009.

A. M. O’Connor, K. M. Anderson, C. K. Goodell, and J. M. Sargeant, “Conducting systematic reviews of intervention questions I: writing the review protocol, formulating the question and searching the literature,” Zoonoses Public Health, vol. 61, pp. 28–38, 2014.

B. Kitchenham and S. Charters, “Guidelines for performing systematic literature reviews in software engineering,” 2007.

D. Gough, S. Oliver, and J. Thomas, An introduction to systematic reviews. Sage, 2017.

T. S. Qaid, H. Mazaar, M. Y. H. Al-Shamri, M. S. Alqahtani, A. A. Raweh, and W. Alakwaa, “Hybrid deep-learning and machine-learning models for predicting COVID-19,” Comput. Intell. Neurosci., vol. 2021, 2021.

G. Garani, A. Chernov, I. Savvas, and M. Butakova, “A data warehouse approach for business intelligence,” in 2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2019, pp. 70–75.

C. K. Leung, Y. Chen, S. Shang, and D. Deng, “Big data science on COVID-19 data,” in 2020 IEEE 14th International Conference on Big Data Science and Engineering (BigDataSE), 2020, pp. 14–21.

B. K. Seah and N. E. Selan, “Design and implementation of data warehouse with data model using survey-based services data,” in Fourth edition of the International Conference on the Innovative Computing Technology (INTECH 2014), 2014, pp. 58–64.

A. Khalaf Hamoud, A. Salah Hashim, and W. Akeel Awadh, “CLINICAL DATA WAREHOUSE A REVIEW,” Iraqi J. Comput. Informatics, vol. 44, no. 2, Dec. 2018.

M. G. Seneviratne, T. Seto, D. W. Blayney, J. D. Brooks, and T. Hernandez-Boussard, “Architecture and implementation of a clinical research data warehouse for prostate cancer,” eGEMs, vol. 6, no. 1, 2018.

O. E.-S. Sheta and A. N. Eldeen, “Building a health care data warehouse for cancer diseases,” arXiv Prepr. arXiv1211.4371, 2012.

E. Roelofs, L. Persoon, S. Nijsten, W. Wiessler, A. Dekker, and P. Lambin, “Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial,” Radiother. Oncol., vol. 108, no. 1, pp. 174–179, 2013.

J. G. DeWitt and P. M. Hampton, “Development of a data warehouse at an academic health system: knowing a place for the first time,” Acad. Med., vol. 80, no. 11, pp. 1019–1025, 2005.

J. Sreemathy, S. Nisha, G. P. RM, and others, “Data integration in ETL using TALEND,” in 2020 6th international conference on advanced computing and communication systems (ICACCS), 2020, pp. 1444–1448.

J. Sreemathy, S. Priyadharshini, K. Radha, K. Sangeerna, and G. Nivetha, “Data validation in ETL using TALEND,” in 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), 2019, pp. 1183–1186.

B. Chen, X. Weng, B. Wang, and X. Hu, “Analysis and solution of data quality in data warehouse of Chinese materia medica,” in 2009 4th International Conference on Computer Science & Education, 2009, pp. 823–827.

A. Cuzzocrea, “SPPOLAP: computing privacy-preserving OLAP data cubes effectively and efficiently algorithms, complexity analysis and experimental evaluation,” Procedia Comput. Sci., vol. 176, pp. 3831–3842, 2020.

A. S. Chauhan et al., “Predictive Big Data Analytics for Service Requests: A Framework,” Procedia Comput. Sci., vol. 198, pp. 102–111, 2022.

C. Kamga, R. Tchamna, P. Vicuna, S. Mudigonda, and B. Moghimi, “An estimation of the effects of social distancing measures on transit vehicle capacity and operations,” Transp. Res. Interdiscip. Perspect., vol. 10, p. 100398, 2021.

D. B. Knox, E. L. Hirshberg, J. Orme, I. Peltan, and M. J. Lanspa, “Effect of COVID 19 pneumonia on hyperglycemia: Is it different from non COVID pneumonia?,” Diabetes Metab. Syndr. Clin. Res. Rev., vol. 16, no. 2, p. 102407, 2022.

A. Zakaria et al., “Determinants of all-cause in-hospital mortality among patients who presented with COVID-19 to a community teaching hospital in Michigan,” Heliyon, vol. 7, no. 12, p. e08566, 2021.

H. Tong, F. Aletta, A. Mitchell, T. Oberman, and J. Kang, “Increases in noise complaints during the COVID-19 lockdown in Spring 2020: A case study in Greater London, UK,” Sci. Total Environ., vol. 785, p. 147213, 2021.

B. Pérez, C. Castellanos, and D. Correal, “Predicting student drop-out rates using data mining techniques: A case study,” in IEEE Colombian Conference on Applications in Computational Intelligence, 2018, pp. 111–125.



  • There are currently no refbacks.

Copyright (c) 2022 Ahmed Khaled AbdelLatif, Ahmed Naji Abdullah, Ahmed Munther Abboud, Zahraa Abdulkareem Mohammed, Hisham Noori Hussain, Alaa Khalaf Hamoud

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License