• Frank Shepherd

Vocabulary Lesson: "Data Catalog"

Fact: If you can’t find data, you can’t analyze it.

Corollary: Time spent searching for data is money wasted.


In most conversations that we have with organizations, one theme stands out: The data library (i.e. the collection of all data sets available to the organization, as a whole) is very reminiscent of the final scene of “Raiders of the Lost Ark” -- you know, the one where the guy is wheeling the unidentifiable wooden crate into the vast warehouse filled with unidentifiable wooden crates.


“Where is the spreadsheet that we use for our quarterly report?”

“Try over near the Ark of the Covenant.”

“Where’s that?”

“ ¯\_(ツ)_/¯ “


There is a better way.


First, let’s describe a data catalog and look at a quick sample:


Think of what happens when you search Amazon for “bacon flavored floss.” It’s like that, but for your data. Your data is cataloged with metadata (data about the data—descriptions, authors, date of creation, keywords, etc.) and those attributes or even column headings can be used to find what you’re looking for. Type in “finance” and everything related to finance comes up: Excel docs, CSV files, SaaS applications, cloud-based databases—everything. Unlike Amazon, however, there are filters and permissions in your catalog. Some things are completely hidden from unauthorized users. Some things are partially hidden, and others are fully available. Users can ask for access right from the catalog, without having to know the owner of the data source, and the access to those data sets can be logged for auditing and reporting purposes, down the road.


For those of you who use an application single-sign-on platform like Okta—think “Okta for Data.” It’s a marketplace where you can find data sets without knowing anything about where those data sets are physically located.


"How costly is this lack of organization, really?"


Scenario 1 – Junior Business Analyst with No Data Catalog

Assume that the average annual salary of a junior business analyst is about $125,000, after benefits.


Without any sort of centralized repository or data catalog, it takes them a minimum of 6 months to get oriented, find some data owners, discover a handful of useful data sources, and get permission to access them.

It takes them a minimum of 6 more months of work with those data sources to even begin producing useful insights.

Chances are pretty good that 6-12 months later, after they produce some preliminary insight for your company, they’ll drop the “Junior” from their online resume and leave for a 30% pay bump, and you’ll never even have an opportunity to counter the offer.


To sum that up:

· $250,000 net new costs for 2 years and what amounts to a junior analyst’s graduate thesis.

· The data discovery and awareness resets with their replacement, costing another 6 months.

· After 2½ years, all you have is a single kinda-reliable insight from a junior analyst, and you’ve spent $312,500.


Scenario 2 – Junior Business Analyst with No Data Catalog

Assume that the average annual salary of a non-junior business analyst is about $175,000, after benefits.


You enlist the help of a data catalog managed service provider: Cost: $50,000 per year (including 25 user licenses; in this case 1 for the analyst and an additional 24 for current staff (i.e. subject matter experts (SMEs) from other departments).

With a centralized repository or data catalog, it takes a week for all 25 users to get oriented, discover a variety of useful data sources, and get permission to access them.

Because you were able to hire a more senior analyst, they’re finding insights within 3 months. Because the data catalog makes discovery easier for that additional 2 dozen subject matter experts with decades of combined experience, you’re now driving even more insights.


The bottom line:

· $225,000 ($175,000 + $50,000) net new costs for 1 year and multiple insights from your analyst and 2 dozen SMEs

· The data discovery process does NOT reset if the analyst leaves because your data catalog is durable and persistent.


So, for $100,000 less and 18 months faster, you’re getting 25 people access to data sets, asking questions, and finding new insights to help you drive business outcomes.


Want to learn more about Data Catalogs? Contact us at info@aperotechsolutions.com