Show HN: Open-source Kibana alternative for logs and traces in ClickHouse

(github.com)

123 points | by mikeshi42 3 days ago ago

40 comments

nh2 3 days ago

Can you clarify: Does the full-text search for logs linearly search all logs like Loki does, or can it speed it up with an index?

The docs at https://www.hyperdx.io/docs/search don't seem to talk about this key design decision.

I have a couple 100 GB to few TB logs (all from `journald` or JSON lines), just want to store them forever, and find results fast when searching for arbitrary substrings.

Loki does not use an index, so it's pretty slow at finding results in TB-sized logs (does not return results within a few seconds, so it's not interactive).

https://quickwit.io is one thing I'm looking at integrating, that can solve much of the index-based log search.

(Note I'm not super familar with the capabilities of ClickHouse itself regarding indexed full-text search.)

[-]

mikeshi42 3 days ago

You'd generally add an index to your logs in Clickhouse to do searching (via ngram or token bloom filters typically: https://clickhouse.com/docs/en/optimize/skipping-indexes#blo...). There's other ways of indexing as well but that's generally the best for full text search. We use token bloom filter indexes today and find them quite effective (it can skip whole chunks of logs due to the bloom filter being able to say that a word did not appear in the chunk of logs).

Indeed Loki is incredibly slow - Clickhouse is deployed for logging at scale (ex. trip.com is running a 50pb logging solution that allowed them to 4x their old ES cluster volume while also running queries 4-30x faster)

[-]

nh2 3 days ago

Thanks! When using full open-source HyperDX (beyond the Kibana part), inclusive of your choices of ingestion and controlling Clickhouse, does it set up the recommended indexes automatically?

That is, is it a full drop-in for a typical Grafana + Loki deployment?

For context, I'm currently following the approach described in https://xeiaso.net/blog/prometheus-grafana-loki-nixos-2020-1... where with ~40 lines of NixOS config it pushes my entire server cluster's systemd journald logs into Grafana.

Roughly how much effort would one have to put in to achieve the same with HyperDX? If it's not too hard, I might get around to package it as a NixOS service.

[-]

mikeshi42 3 days ago

yes! the full stack includes our recommended schema which has the indexes set up - it's a drop in replacement for anything that would ingest Otel-based telemetry! If you already have Promtail setup - you might want to set up a collector or tweak the existing collector to take in Promtail via the Otel Loki Receiver: https://github.com/open-telemetry/opentelemetry-collector-co...

Overall it doesn't sound very hard to me!

valyala 2 days ago

If you want fast searching for some unique word or phrase across terabytes of logs (aka "needle in the haystack" type of search), then take a look at VictoriaLogs [1] (I'm the core developer). It uses bloom filters for quick skipping of data blocks, which do not contain the given word or phrase. Contrary to other open-source solutions for log storage and analysis, VictoriaLogs works efficiently with any types of logs containing any sets of fields, without the need in any configuration and tuning.

[1] https://docs.victoriametrics.com/victorialogs/

est 3 days ago

Hmm, this is not the "Kibana" alternative I imagined.

Kibana was supposed to be an easy UI. You go to Discover, and the data automatically shows in chronological order, I can explore it with different options.

Kibana is very suitable for non-tech or less-tech people. I hope your product find a clear target audience. With too much ES query JSON or SQL it would scare people off.

[-]

mikeshi42 3 days ago

Hrm while we aren't a 1:1 Kibana replacement today (we're not apples to apples since Kibana is locked into Elastic, whereas we're on Clickhouse) - I don't think we're too far off with our UI-based filters, Lucene filter language, and timestamp filtering/sorting/live tail.

There is a setup modal (which is intended to be set up once by the data owner, similar to maybe how you'd set up some indexes in Elastic) - and afterwards the experience is similar IMO. If you're open to sharing more I'd love to learn more mike@hyperdx.io or if you want to open an issue/join our discord.

lunarcave 3 days ago

A happy HyperDX customer here. Can't recommend it enough.

We wanted something good for tracing and logs, without the price tag we were used to from datadog. We've been pleasantly surprised by how easy it was to set up and start pumping telemetry.

The UI is super intuitive and the OOTB dashboards are great as well.

[-]

mikeshi42 3 days ago

Thank you, really appreciate feedback like that! :D

zX41ZdbW 3 days ago

It is actually really great! Works out of the box, does it with a single-page UI, and it is not slow. It's very close to a log viewer I always dreamed of. The UI is much better than Grafana.

I connected it to the system.text_log table, and it took zero time with no problems.

[-]

mikeshi42 3 days ago

thank you! means a lot coming from you ;)

Speaking of the system tables - it's awesome how much telemetry is saved in there that helps us build a really powerful preset clickhouse monitoring dashboard (heavily inspired from the built-in clickhouse one of course). We figured that alone is quite useful for teams running any Clickhouse instance and want better insights into what's going on.

rekoros 2 days ago

I've had browser and Linux VM logging to HyperDX with great success, but have been struggling to get OTel logging working with Azure Functions. Turns out, new (currently in Preview) "Flex Consumption" functions [0] natively support OTel and work with HyperDX.

[0] https://azure.microsoft.com/en-us/updates/public-preview-azu...

lpammant 2 days ago

Neat! I was looking to replace DataDog with an open source alternative. I'm collecting the logs and batch sending them to DataDog using their batch http-intake API. I'm looking for the quickest way to switch over - is there anything similar on HyperDX?

Also, I'd like to improve my observability using OTel in Cloudflare Workers but it looks like the example is out of date using an deprecated library which points to a new one to use instead. Might be worth updating the docs on that when you get a chance.

Deprecated: https://github.com/RichiCoder1/opentelemetry-sdk-workers New: https://github.com/evanderkoogh/otel-cf-workers

[-]

mikeshi42 2 days ago

Yup! Since we're built on Otel, we suggest you can use the HTTP OTLP endpoint, which is yourdomain:4318/v1/logs, we have a quick blog post on that on how to curl a payload over: https://www.hyperdx.io/blog/testing-sending-opentelemetry-ev...

But you can also look at example payloads in the otel repo: https://github.com/open-telemetry/opentelemetry-proto/blob/m...

Great call out on CF workers, will make sure we get that updated :)

valyala 2 days ago

HyperDX looks great! Are you going to add support for other backends (aka data sources) in order to become "Grafana for logs" solution? For example, I'd be glad to see the support for VictoriaLogs in HyperDX. It provides rich set of HTTP querying APIs, which could be used for building efficient UX at HyperDX - https://docs.victoriametrics.com/victorialogs/querying/#http...

mathfailure 3 days ago

I think your project needs a 'Comparison to Kibana' section. Sell your project to me: I am currently using Kibana, why should I switch?

[-]

mikeshi42 3 days ago

Totally fair! Here's a few on the top of my head, they're mostly about Elastic really but of course Kibana is only really useful on ES:

1. At my last job we were running some of the largest elastic clusters of our hyperscaler's cloud - elastic is slow, expensive and finicky to operate at decent scale. We've found the exact opposite with Clickhouse, it's fast, easy to operate, and supports things like S3-backed storage directly in their open source product. As an example, Uber switched from Elastic -> Clickhouse and halved their infra footprint while increasing volume.

2. Elastic is tricky to manage, field type conflicts come up super common at scale and are annoying to deconflict. Clickhouse is a lot more flexible in its schema to avoid those problems (and give you knobs to fine tune performance at a more granular level with their indexes/schemas)

3. We allow for both SQL and Lucene, both are relatively "standard" languages that engineers are likely already familiar in one way or another. Compared to elastic moving to ES|QL, another vendor-specific language that will be difficult to onboard to. The last thing you want during an incident is trying to recall vendor-specific languages for querying that critical data!

tl;dr - we try to make it easy to "do observability" on what we think is the best DB for observability today (Clickhouse), analogous to what Kibana did for ES.

dengolius 2 days ago

This world needs a new Kibana that is lightweight and not written in java/typescript.

maxthegeek1 3 days ago

We use HyperDX for our observability! We had been using google's observability suite before, because we're using GKE anyways, but HyperDX's search over traces is just waaaay better and I can't go back.

[-]

mikeshi42 3 days ago

Thank you Max! It's awesome to hear that :)

DAlperin 3 days ago

Super neat! Does the v2 branding mean that the more "fully featured" observability product is going away? Or is it all going to be rebuilt on top of clickhouse?

[-]

mikeshi42 3 days ago

Our v1 is completely built on Clickhouse! So v2 is making it more widely compatible to Clickhouse installations that aren't tied to our ingestion pipeline and schema. So if you're already on Clickhouse for obseravbility today, or have a preferred way of shipping data in, you can use us on top of Clickhouse now without throwing away your existing work.

We're essentially making our existing product a lot easier to deploy into more complex observability setups based on Clickhouse - while also shipping a few new capabilities like SQL and event deltas while we're at it!

[-]

ayewo 3 days ago

Timeplus Proton [1] is an OSS fork of Clickhouse that adds support for streaming queries. Timeplus Proton is wire-compatible with Clickhouse and its streaming support makes the log tailing use case you mentioned above easy to setup:

> - You can live tail events, I don't think Grafana for Clickhouse has that (I'm a big fan of tail -f)

So it sounds like your v2 will work with any DB that is wire-compatible with Clickhouse, correct?

[1] https://github.com/timeplus-io/proton

[-]

mikeshi42 2 days ago

Yup it'd work if the Clickhouse HTTP interface is preserved, along with the CH-specific functions/settings we leverage. (I'm not sure how deviated your fork is from CH and which version it's based on)

Proton looks like a neat queue/db combo - I'm going to have to dive in deeper some time

[-]

ayewo 2 days ago

I should gently point out that Proton is maintained by Timeplus [1] making it COSS (Commercial OSS) so it is definitely not my fork of ClickHouse :)

[1] https://www.timeplus.com/

pradeepchhetri 2 days ago

Streaming queries is coming in ClickHouse too: https://github.com/ClickHouse/ClickHouse/pull/63312

akdor1154 3 days ago

Very interested - I'm currently toying with Grafana set up in the same way, i wonder how this compares?

[-]

mikeshi42 3 days ago

If you're using the Grafana Clickhouse plugin - a number of things we do differently to them today:

- We support Lucene-based search, which means it's a lot easier to find the events you're looking for without needing to break into verbose SQL search. (Column/property/full text search are all super easy).

- We're optimized exclusively for Clickhouse, which means we do a number of things to optimize the queries we run and that returns you a nice performance boost (we see a 2x perf boost, but this will vary a ton of data and queries). For example we allow you to do a search on a subset of columns (so the search is performant), and then click in and expand an entire row of interest on-demand (so we only do a SELECT * for a single row). This is also a much nicer DX than needing to specify every column you might need. We have a few other optimizations as well.

- We have (what we think is) a nice chart builder - so you don't need to mess with template variables and macros to build a chart, but still lets you escape hatch out into SQL for the important bits.

- We think our event deltas feature for analyzing traces is pretty neat - afaik this isn't something you get in Grafana. - You can live tail events, I don't think Grafana for Clickhouse has that (I'm a big fan of tail -f)

Overall we focus on trying to bring an easy-to-use high-cardinality observability experience to Clickhouse, whereas Grafana seems to focus more on a highly SQL-dependent dashboard building experience (which has its own advantages of course).

edit: fixed line breaks!

[-]

ChocolateGod 2 days ago

Do you support when using S3 object storage as a backing for Clickhouse?

One of Grafana's advantages is its very low cost of running because you can send everything to object storage with very little configuration.

[-]

mikeshi42 2 days ago

Yup! We're agnostic to the storage medium underneath as Clickhouse's interface doesn't change based on the disk layout backing the table. In our hosted version we use S3 as well.

They have their docs on that setup here: https://clickhouse.com/docs/en/guides/separation-storage-com...

jillesvangurp 3 days ago

If you want an opensource / non AGPL licensed alternative for Kibana, Opensearch also includes a fork of Kibana in the form of Opensearch Dashboards.

Clickhouse not being Elastic/Opensearch based means they would need to reinvent that wheel in any case because Kibana cannot use Clickhouse for storage. So this isn't so much an alternative but an essential component to make Clickhouse useful. Since you can't use Kibana for that. From various accounts here; they seem to have done a decent job.

Of course the key strength of Kibana is that it builds on features that Elasticsearch has; like aggregations that are probably more limited in Clickhouse. Same with Opensearch Dashboards. It depends on your use case whether you actually need that of course.

One point of concern with Clickhouse is that, like Elastic, they require contributors to sign contributor agreements. This basically allows them to re-license the code base if they want to at some point. Which is of course what Elastic did several times now (they changed it back to AGPL a few weeks back). Like Elastic are well funded by VC money but still pre-IPO. Just saying that if you moved to Clickhouse because of the Elastic licensing debacle, you might just have moved that problem instead of solving it.

miah_ 3 days ago

Whats hilarious is that Kibana started out as a Open Source.

Hard to trust anything released as OSS these days that hits this site run by a for profit company.. Its all destined to have a rug pull after some VC funding. Considering HyperDX is a for profit company, I'm sure we won't have to wait long!

[-]

bboreham 3 days ago

Kibana is once again Open Source, as of 2 months ago. https://github.com/elastic/kibana/blob/main/LICENSE.txt

mathfailure 3 days ago

What do we say in such cases? It was good while it lasted!

Once that happens - eventually some new kids would appear on the block.

Such is the life.

hinkley 2 days ago

Seems particularly true for tools that have operational implications. It's very easy to justify why something should be for-pay when it's indispensible day in and day out.

mikeshi42 3 days ago

Ahh yeah fwiw it wasn't _intended_ to be a dig at the open source status of Kibana - but rather we're open source and building on top of Clickhouse.

On the commercial OSS side of things - I suspect the trend there is more nuanced than all OSS companies being suspect to the same problem, but rather companies that generally solve a "behind the API" problem are more susceptible to problems of cloud vendors taking their code and competing with them commercially (ex. if you're a DB like Redis, Mongo, Elastic - or a CLI like Terraform). We're building an end-user experience (more like Gitlab) - where experience differentiation matters a lot more than simple infrastructure hosting, something AWS is not particularly well suited at competing on!

It's been 3 years of Gitlab post-IPO and they're still MIT, and that's the boat we're on as well :)

[-]

ayewo 3 days ago

Congrats on the Show HN launch!

> It's been 3 years of Gitlab post-IPO and they're still MIT, and that's the boat we're on as well :)

I totally understand your optimism but keep in mind that GitLab was exploring a sale as at 4 months ago https://www.reuters.com/markets/deals/google-backed-software...

If they get bought or have a change of management, they might revisit the MIT license.

[-]

mdaniel 2 days ago

> they might revisit the MIT license.

IANAL, but AIUI they for sure can do that to the content of the "ee" directory, but all other contributions are just DCO, not CLA https://about.gitlab.com/community/contribute/dco-cla/#which...

jakozaur 3 days ago

You can also use Quesma and real Kibana with ClickHouse too.

Disclaimer: Co-founder of Quesma.

[-]

mdaniel 3 days ago

I didn't downvote you, but people who shill their wares without providing a link are just wasting everyone's time