In a world that’s paranoid about data privacy, Databricks CEO Ali Ghodsi firmly believes that everyone should own their data. Speaking at Databricks’ Data + AI Summit, Ghodsi warned companies against using vendors, or distributing their data to anyone, including themselves.
“The idea is to stop giving your data to vendors. They’ll just lock you in. It doesn’t matter if it’s a proprietary data warehouse in the cloud, or if it’s Snowflake or even Databricks. Don’t give your data to us; don’t trust vendors,” he reiterated.
This is surprising in and of itself since data being deemed the new oil, companies have been rushing to sell their own to reap the benefits.
Ghodsi, however, has a point. With so many companies pivoting towards trying to get the best use of their data, many of them fall into the trap of overcomplicating things.
In his experience, Ghodsi said that executives at several companies have admitted to not knowing or understanding the technology behind how their data is being used, due to the sheer amount of software, including multiple data warehouses, data science and machine learning platforms, and data lakes.
Ultimately, this leads to the company’s data being locked into a silo, removing easy access and increasing costs for the company overall.
What Do You Do With Your Data Then?
Elaborating further, Databricks’ VP of field engineering APJ, Nick Eayrs told AIM that it had always been the company’s strategy to democratise data and AI. “That’s kind of our mission and purpose for being. It starts with ensuring you have control over your data,” Eayrs said.
This is the goal Databricks has been moving towards, in effectively getting company data to a point where vendors can plug their “USB sticks” into a company’s data. This ends up giving the company power over their data and how a vendor uses it, as well as ensuring that their data is being used in the most optimised way.
“They should just plug their USB stick into that data that you have in the cloud and then let the best engine win. Let’s see who’s best,” Ghodsi challenged
This also makes sense when it comes to how rapidly the industry is changing. When comparing models, Ghodsi admitted that Databricks’ DBRX model was the best open-source LLM in the market for a whole two weeks, before being outperformed by LLaMa 3, which released only a few weeks later.
Allowing customers to have the freedom to allow vendors to use their data in a controlled environment means that the companies themselves are able to better access how their data is used.
“That puts them in control of their data, which is ultimately their secret sauce. That’s what’s going to differentiate their products and services. We want them to own and control their data, and we want their data to be in an open format in a cloud of their choice. Even if they choose to take it back on-prem, so be it,” Eayrs asserted.
Next Steps for Databricks?
Obviously, as an AI company, Databricks has also begun working on how to ensure their customers come back to them when they need a vendor. However, while there are several companies already working on this, Databricks stands out with its focus on democratisation.
The company’s recent acquisition of Tabular is a testament to that. In an effort to ensure companies don’t have the problem of being confined to silos yet again, only this time in lakehouse format, the Tabular acquisition solved this problem.
“You don’t have to pick which of the two silos I have to go through, and which of the USB formats I must store this in? We don’t want it to be that way,” Ghodsi said.
While they’re currently focusing on democratising data for their clients, Eayrs said that the next steps are to ensure that customers can get the most out of their data. “Once they have their data there and they have it governed and secured, how do we help them accelerate the time to insight and value? That’s where we want to lean in and break some of the magic,” he told AIM.