Dremio
#
- Dremio is a data virtualization platform that allows you to connect to and query data from various sources, including databases, cloud storage, and APIs.
- Dremio is a good alternative to Databricks for querying data.
- Dremio is open source and available on GitHub.
- Dremio uses open source libraries for various components. Apache Arrow is used for data encoding. Apache Parquet is used for storage.
- Started as a query engine has evolved to support data lake capabilities.
Architecture
#

Local Setup
#
cd sandbox/data/dremio
git clone https://github.com/dremio/dremio-cloud-tools.git --depth=1