A data silo is data collection within a data ecosystem not shared with the system... sounds like a data engineer wrote that.
Let me translate. A data silo is a collection of data thoughts, concepts, metrics, reports, and capabilities that are mostly used by a distinct group of people within a company. This could happen because the data is on someone's local computer or personal database, there is a data set that has not made its way through formal governance or promotion to the data warehouse layer and sits over in the corner with a limited number of people using the data set.
There is an accepted sentiment that data silos are bad and should be broken down for everyone in the company to have all the information. We disagree. Honestly? Who in the company could consume all the information and not be overwhelmed by the sheer magnitude? Does it need to be organized better? Absolutely! Do we need to get rid of data that is not serving anyone? 100%.
First, why do data silos occur? Grouping is a natural response of protection for an increase in the volume of a thing. Think of raindrops, tribes, galaxies, pet hair on the floor. As data explodes in volume, natural organization and “siloing” happens out of necessity.
The speed of data consumption is continuing to increase faster than our classically trained engineers and developers can enhance data capabilities. The speed of business decisions has also increased, driving real actionable metrics to stay competitive, which leads to the business-line associates processing data and reporting capabilities outside of standard practices. They create a small pocket of knowledge that is not always tested against the wider conceptual understanding of the data throughout the company and industry.
2. Lack of Trust – Settling on the Lesser of Two EvilsData engineers and consumers need consistent data, but they do not trust the company-wide accessible data as being fit-for-purpose. Experimenting is a purpose for data that the legacy practices do not always handle well. The cost of development to bring a data set into an Enterprise Data Warehouse is often higher than the possible return on investment. When it is brought in, normalized, squeezed into existing data structures, much of the context, data shapes, and exploration abilities have been cleansed out of the data. Data silos are born out of local ingestion to personal machines or sandboxes; sometimes leading to production processes sourcing analytics from spreadsheets and connection to shadow datasets.
3. Lack of Collaboration – Defending the understanding from different perspectives.As each domain within the company starts to report their narrative from the data that is being pulled together from sandboxes, spreadsheets, warehouses, etc., there is a need to stay consistent with the baseline metrics that have been formulated and reported. In companies where siloing is very prevalent, we see a culture with political and departmental disagreements when reporting conflicting numbers on shared metrics. The divisions and lack of trust continue to increase the longer the misalignment continues.
Repeatedly, I observe the big consulting companies come into a proposal saying that an entire overhaul of the client’s data governance program is necessary to eliminate the data silos. They give a 1–3-year roadmap to be a state-of-the-art data governance shop. Millions of dollars later, if the engagement has not disintegrated already, there are many more processes, but the single issue of data explosion has not been addressed. The data industry is stuck in governance processes from 20 years ago, trying to solve today’s big enormous data.
Change the focus from governing everything to just enough governance to reduce risk and confusion. Build bridges and pathways between silos as needed to optimize revenue and let the self-organization of the data environment become an organic advantage. If this is intriguing, think of the journey to data meshing!