Does this feed DuckDb continuously data from transactional workloads, akin to what SAP hana does? If so that would be huge - people spend lots of time trying to stitch transactional data to warehouses using Kafka/debezium.
BTW, Would be great to hear apavlo’s opinion on this.
HTAP is here! It seems like these hybrid databases are slowly gaining adoption which is really cool to see.
The most interesting part of this is the improvements to transaction handling that it seems they've made in https://github.com/alibaba/AliSQL/blob/master/wiki/duckdb/du... (its also a good high level breakdown of MySQL internals too). Ensuring that the sync between the primary tables and the analytical ones are fast and most importantly, transactional, is awesome to see.
I don't think this is meaningfully HTAP, it's gluing together two completely different databases under a single interface. As far as I can tell, it doesn't provide transactional or consistency guarantees different than what you'd get with something like Materialize.
This isn't new either, people have been building OLAP storage engines into MySQL/Postgres for years, e.g., pg_ducklake and timescale.
On a drive-by-glance it looks like if you had a tighter integrated version of PSQL FDW for DuckDB and Vector Storage - meets Vespa. I find it interesting they went with extending MySQL instead of FDW route on PSQL?
Can tiger data be used just as a simple column store?
All I want is effectively what clickhouse does in PG. I have a single table that I need fast counts on and clickhouse can do the counts fast but I have to go through the entire sync/replication to do that.
A quick scan of TimeSeries always seemed like it was really only best setup for that and to use it another way would be a bit of a struggle.
One option is TiDB. It has support for columnar data alongside row based data. However, it is MySQL compatible, but not based on MySQL code so not quite what you asked for.
Does this feed DuckDb continuously data from transactional workloads, akin to what SAP hana does? If so that would be huge - people spend lots of time trying to stitch transactional data to warehouses using Kafka/debezium.
BTW, Would be great to hear apavlo’s opinion on this.
HTAP is here! It seems like these hybrid databases are slowly gaining adoption which is really cool to see.
The most interesting part of this is the improvements to transaction handling that it seems they've made in https://github.com/alibaba/AliSQL/blob/master/wiki/duckdb/du... (its also a good high level breakdown of MySQL internals too). Ensuring that the sync between the primary tables and the analytical ones are fast and most importantly, transactional, is awesome to see.
I don't think this is meaningfully HTAP, it's gluing together two completely different databases under a single interface. As far as I can tell, it doesn't provide transactional or consistency guarantees different than what you'd get with something like Materialize.
This isn't new either, people have been building OLAP storage engines into MySQL/Postgres for years, e.g., pg_ducklake and timescale.
On a drive-by-glance it looks like if you had a tighter integrated version of PSQL FDW for DuckDB and Vector Storage - meets Vespa. I find it interesting they went with extending MySQL instead of FDW route on PSQL?
Curious how it stacks up to pg_duckdb. (pg_duckdb seems pretty clean, due to Postres' powerful extension mechanisms)
having an embedded column database for analytics in your traditional db is a massive win for productivity + operations simplicity.
at the moment I use PG + Tiger Data - couldn't find a mysql equivalent
so this as one.
Mariadb has a columnar engine already (though I did not use it myself) https://mariadb.com/docs/analytics/mariadb-columnstore/colum... and is mostly mysql compatible.
For about a year releases include a vector storage type, so it will be interesting to see it compared in performance with what Alibaba did.
Just wanted to plug that out. Given how often Postgres is plugged on HN, I think people ignore how versatile mariadb is.
Can tiger data be used just as a simple column store?
All I want is effectively what clickhouse does in PG. I have a single table that I need fast counts on and clickhouse can do the counts fast but I have to go through the entire sync/replication to do that.
A quick scan of TimeSeries always seemed like it was really only best setup for that and to use it another way would be a bit of a struggle.
Clickhouse supports MySQL protocol natively, and can also wrap/import MySQL tables. Okay so you need two connections but it works pretty well.
One option is TiDB. It has support for columnar data alongside row based data. However, it is MySQL compatible, but not based on MySQL code so not quite what you asked for.
MariaDB has supported columnar tables for a bit https://mariadb.com/resources/blog/see-columnar-storage-for-...