Why the pig?

Last modified date

Someone had noticed the logo change in my blog and asked why there is a pig in the logo now? Well, because originally I was gonna name this blog SQL Oinks. Why SQL Oinks? Here is the story: I had the pleasure to be in the room when my brilliant colleague Mike (amazing human and solution architect, who also writes SQL of the North) said to the SVP who owns Azure Data all up that “Power BI is just the lipstick on the pig”. Well, Mike’s at Databricks now 😀 and it puts a smile on my face every time I think of that meeting. For one reason or another I didn’t name the blog SQL Oinks, but that remark stayed with me, that’s why you see the leaping pig in the logo. But really? Why do we need the pig?

Why the pig?

Because you cannot put lipstick onto thin air.  You need a data platform that has clean data so Power BI/Tableau can put makeup on it. Why the lipstick? Because the pig is ugly: round robin Tables, clustered columnstore index Tables, Heap Tables, statistics, indexing, partitioning, etc etc. Try selling any of these to your business users and good luck with that. You need Power BI/Tableau to do the front-end story telling and hide complexity and the ugliness of the pig from your end users.

Does the pig have to be ugly?

No, not today. Some years ago, it might be necessary to have this BI serving layer between data lake and visualization layer, commonly known as Data Warehouse. Data lake is only storage, and the compute power of visualization tools such as Power BI and Tableau is not going to warrant desirable query performance because they were built for visualization, not data processing. Data warehouse came to the rescue, providing fast query performance, also bringing all these complexities in conjunction.

A few years ago, Databricks came out with Databricks SQL and the concept of Lakehouse, where all of your data sits within the lake and all of your workloads (Business Intelligence, Data Engineering, Data Science and Streaming) can all be run on the same data with in the lakehouse. Databricks SQL provides that compute layer on top of the lake, so you do not have to move/duplicate your data anywhere just to serve the visualization layer. Capabilities like Auto optimize, predictive IO, Unity Catalog are making the pig more attractive and cleaner over time. It’s not just Databricks though. Lakehouse has been embraced by all the cloud providers, Google came out with big lake, AWS has Redshift Spectrum and Athena and you can build a Lakehouse with Azure Synapse Analytics as well, as outlined by the inspiration of this blog himself here.

Can you still have a data warehouse? Of course you can. But just because you can, doesn’t mean you should. You CAN fly from London to New York connecting through Dubai, but why would you want to do that? It is expensive and inefficient.

Where is the pig going?

I don’t know. With everything happening in the Data & AI world over the past few months, I honestly have a hard time keeping up, let alone trying to predict the future. I do know it is gonna be better than today, and something amazing. That’s why the leaping pig in the logo 😀


Discover more from Data Leaps

Subscribe to get the latest posts sent to your email.

Share

Discover more from Data Leaps

Subscribe now to keep reading and get access to the full archive.

Continue reading