Show HN: OCR Arena – A playground for OCR models

(ocrarena.ai)

72 points | by kbyatnal 3 days ago ago

26 comments

  • hakunin 8 minutes ago

    Would be great to compare these against Apple’s LiveText. This project now supports it: https://github.com/mkyt/OCRmyPDF-AppleOCR

    I’ve had great results locally. Albeit you need macOS >=13 for this.

  • ArcaneMoose 3 hours ago

    I've been really impressed with this model specifically because of how insanely cheap it is: https://replicate.com/ibm-granite/granite-vision-3.3-2b

    I didn't expect IBM to be making relevant AI models but this thing is priced at $1 per 4,000,000 output tokens... I'm using it to transcribe handwritten input text and it works very well and super fast.

    • irjustin 23 minutes ago

      Thanks for this! Will test this model out because we do a lot of in between steps to get around the output token limits.

      Super nice if it worked for our use case to simply get full output.

  • daemonologist an hour ago

    A limitation of this leaderboard approach that I want to point out is that while the large general-purpose LLMs can make greater leaps of inference (on handwriting and poor quality scans), and almost always produce better layouts and more coherent output, they can also sometimes be less correct. My experience is that they're more prone to skipping or transposing sections of text, or even hallucinating completely incorrect output, than the purpose-trained models. (A similar comparison can be made in turn to the character- or word-based OCR approaches like Tesseract, which are even less "intelligent" but also even less prone to those malbehaviors.)

    Also, some of the models are prone to infinite loops and I suspect this is not being punished appropriately; the frontend seems to get into a bad state after around 50k characters, which prevents the user from selecting a winner. Probably would be beneficial to make sure every model has an output length limit.

    Still, a really cool resource - I'm looking forward to more models being added.

    • rubikscubeguy an hour ago

      Totally agree w/ your first point! For the looping, we just added a stop condition for now in battle mode, and you can still vote on the other model afterwards. A bit of a hard problem to solve. We will add more models!

  • cdrini an hour ago

    There have been such a large number of OCR tools pop up over the past ~year; sorely in need for some benchmarks to compare them. Would love to see support for normal OCR tools like tesseract, EasyOCR, Microsoft Azure, etc. I'm using these for some projects, and my experiments with VLMs for OCR have resulted in too much hallucination for me to switch. Benchmarks comparing across this aisle would be incredibly useful.

  • zzleeper 4 hours ago

    Love this! Would have liked to see something like textract for a pre-LLM benchmark (but of course that's expensive), and also a distinction between handwritten text and printed one.

    But still, this is incredibly useful!

  • wener 2 hours ago

    Really hope there is a layout mode or ocr with bbox mode, I want to see the model restore the whole page.

    • rubikscubeguy 2 hours ago

      yeah, that would be a cool long term goal

  • fzysingularity 4 hours ago

    FYI one of the models on the battle was pretty slow to load. Are these also being rated on latency or just quality?

    • kbyatnal 2 hours ago

      Ultimately, there’s some intersection of accuracy x cost x speed that’s ideal, which can be different per use case. We’ll surface all of those metrics shortly so that you can pick the best model for the job along those axes.

    • andrewlu0 2 hours ago

      ideally we want people to rate based on quality - but i imagine some of the results are biased rn based on loading time

  • krashidov 4 hours ago

    I would be curious to see how Sonnet does. Their models are pretty solid when it comes to PDFs

    • kbyatnal 2 hours ago

      Sonnet/Opus is being added shortly!

    • rubikscubeguy 2 hours ago

      sonnet and opus are live now :)

  • ianhawes 4 hours ago

    Please add Chandra by Datalab

  • codeddesign 3 hours ago

    Most of these are general LLM’s and not specifically OCR models. Where is Google Vision, Mistral, Paddle, Nanonets, or Chandra??

    • kbyatnal 2 hours ago

      We wanted to keep the focus on (1) foundation VLMs and (2) open source OCR models.

      We had Mistral previously but had to remove it because their hosted API for OCR was super unstable and returned a lot of garbage results unfortunately.

      Paddle, Nanonets, and Chandra being added shortly!

    • rubikscubeguy 2 hours ago

      nanonets is live now!

  • arathis 4 hours ago

    Claude would be good!

    • kbyatnal 2 hours ago

      Claude coming shortly (in the next ~1 hour)

    • rubikscubeguy 2 hours ago

      claude is live now!

  • dang 5 hours ago

    [under-the-rug stub]

    [see https://news.ycombinator.com/item?id=45988611 for explanation]

    • ylhert 3 days ago

      We've got like 10 LLM arenas but nothing for OCR yet, really hope this takes off!

    • athoscouto 3 days ago

      Nice! Would love to see Azure Document Intelligence on this

    • profburial 3 days ago

      This is a killer idea!