Databricks lakehouse

7/23/2023

It should be noted that, unlike data warehouses, the data lake concept has not been universally accepted in the industry. Image via Databricks (click to view large version) It was described as “a new paradigm that combines the best elements of data lakes and data warehouses.” In a blog post from January, Databricks extended the data lake idea by coining a new term: the lakehouse. A key difference: data lakes were designed to deal with the internet and its masses of unstructured data. The term was coined in 2011, as a modern variation of the late-1980s concept of a data warehouse.

He replied that Databricks is “database agnostic.” The company specializes in large scale data processing, he said, but the real key to its approach is the data lake theory.Ī data lake is a repository of raw data stored in a variety of formats - anything from unstructured data like emails and PDFs, to structured data from a relational database. I started by asking Mewald how Databricks relates to modern database systems, such as Apache Cassandra and MongoDB? Mewald has an especially interesting background when it comes to AI data, having worked for four years on the Google Brain team building ML infrastructure for Google. To find out more about Databricks’ strategy in the age of AI, I spoke with Clemens Mewald, the company’s director of product management, data science and machine learning. Previously he founded ReadWriteWeb in 2003 and built it into one of the world’s most influential technology news and analysis sites. Richard is senior editor at The New Stack and writes a weekly column about what's next on the cloud native internet.

0 Comments

Databricks lakehouse

Leave a Reply.

Author

Archives

Categories