Show HN: AI-Powered Structured Data Extraction from Any Document (93%+ Accuracy)

1 points | by chaitanyavelaga 2 months ago ago

2 comments

ares623 2 months ago

What does 93% accuracy mean here? My understanding is that tech like LLMs are not able to give a confidence score unlike traditional OCR methods. How was 93% calculated? How are the parsing errors surfaced?

[-]

chaitanyavelaga 2 months ago

Great question. The 93% refers to field-level accuracy on a labeled test set; a field is counted as correct only if it matches the ground truth (with basic normalization for dates/currency, etc). It’s not document-level accuracy. We don’t use raw LLM probabilities for confidence. The score is based on additional validation checks (cross-field consistency, format rules, reconciliation). Each field is returned with its confidence and any validation flags so errors are visible and reviewable.