17 comments

  • bambax 3 hours ago

    With all due respect and while wishing you best of luck, it's always a bit worrisome when generative AI is used in the real world with real consequences...

    In my experience, what LLMs, even some of the most advanced ones (o1, Gemini 1.5) are really good at is rationalization after the fact: explaining why they were right, even when presented with direct evidence to the contrary.

    I just ran an experiment trying to get various models put footnote references in the OCR of a text, based on the content of the footnotes. I tested 120+ different models via OpenRouter; they all failed, but the "best" ones failed in a very bizarre and I think, dangerous way: they made up some text to better fit the footnote references! And then they lied about it, saying in a "summary" paragraph that no text had been changed, and/or that they had indeed been able to place all references.

    So I guess my question is: how do you detect and flag hallucinations?

    • erispoe 2 minutes ago

      The process you described is very far from how companies who productize LLMs use them.

    • arvindveluvali 3 hours ago

      This is a really good point, but we don't think hallucinations pose a significant risk to us. You can think of Fresco like a really good scribe; we're not generating new information, just consolidating the information that the superintendent has already verbally flagged as important.

      • mayank 2 hours ago

        This seems odd. If your scribe can lie in complex and sometimes hard to detect ways, how do you not see some form of risk? What happens when (not if) your scribe misses something and real world damages ensue as a result? Are you expecting your users to cross check every report? And if so, what’s the benefit of your product?

        • arvindveluvali 2 hours ago

          We rely on multimodal input: the voiceover from the superintendent, as well as the video input. The two essentially cross check one another, so we think the likelihood of lies or hallucinations is incredibly low.

          Superintendents usually still check and, if needed, edit/enrich Fresco’s notes. Editing is way faster/easier than generating notes net new, so even in the extreme scenario where a supe needs to edit every single note, they’re still saving ~90% of the time it’d otherwise have taken to generate those notes and compile them into the right format.

    • hehehheh 3 hours ago

      It has to be the same as all AI: you need someone thorough to check what it did.

      LLM generated code needs to be read line by line. It is still useful to do that with code because reading is faster than googling then typing.

      You can't detect hallucinations in general.

      • bambax 2 hours ago

        A (costly) way is to compare responses from different models, as they don't hallucinate in exactly the same way.

    • fakedang 37 minutes ago

      Honestly this is a very nitpicky argument. The issue for site contractors is not with manually checking each entry to ensure it's correct or not. It's writing the stuff down in the first place.

      I'm exploring a similar but unrelated use case for generative AI, and in discovery interviews, what I learnt was that site contractors and engineers do not request or expect 100% accuracy, and leave adequate room for doubt. For them, it's the hours and hours of manually writing down a TON of paperwork, which in some industries is often months and months of work written by some of the poorest communicators on the planet. Because these tasks end up consuming so much time, they forgo the correct methodology and some even tend to fill up some reports with random bullshit just so that the project moves forward - in most cases, this writing work is done for liability concerns as mentioned above, rather than for the purposes of someone actually going through it. If the writing part is cleared for many of these guys, most wouldn't have a problem with the reading and correcting part.

  • neither_color 3 hours ago

    This is a good idea but I hope you've got some secret training data that isn't available on the open web. I've been able to stump ChatGPT with simple "gotcha" national electrical code questions that a foreman wouldn't have a problem answering(e.g sizing a breaker for a heater depending on different situations). There are far fewer subreddits and forums dedicated to trade specialists and as a community they're more hostile to DIY-ers and will tell you "get someone licensed." They're also not the types to write detailed reports and case studies on what they did.

    It's not that trades are super complicated in comparison to other fields like web development, it's that there's no GitHub, no source shared among all pros like "here's what I did and how I got it to work." Without a good stack overflow how does the AI judge the quality of workmanship in photos?

    You are absolutely right, btw, about google drives and one drives and hundreds of photos and all that. My experience is in dealing with general contractors on smaller jobs, not supers on mega projects, but they have similar issues. Lots of sloppy back and forths and poor tracking of change orders, etc,

    What Im trying to say, since I sort of rambled there, is that while processing and sorting and making punchlists is a good idea, I have doubts about AI's current ability to accurately spot code(as in building code, which unlike JavaScript varies by zip code) issues. Does the AI know that you dont have enough clearance at X or does that have to go into the recording?

    • arvindveluvali 2 hours ago

      Great point! We're really relying on the superintendent's expertise, transcribing/compiling what they're saying rather than flagging code violations or other notables ourselves. We think analysis should be (for now, at least) the job of the highly trained and experienced superintendent, and our job is to take care of the transcription and admin that isn't really a good use of their time.

  • rm_-rf_slash 5 hours ago

    Looks neat! I don’t work in construction but I know folks in civil engineering. Are there applicabilities with Fresco you could see in that domain?

    • arvindveluvali 5 hours ago

      Absolutely. There are a ton of industries where people conduct physical site inspections and turn those into structured documents; as in construction, those take a long time to make! We've actually had some inbound from civil engineers, and if we can be useful to folks in your network, we'd love to connect with them.

  • justinzhou13 5 hours ago

    This is super cool and there’s a ton of other industries where this is sorely needed!

  • Closi 3 hours ago

    FYI - this could be really useful in logistics operations and production too! (Which is my background, although I suspect the price point is unfortunately much too high for that application).

    • arvindveluvali 2 hours ago

      Thanks for the flag! Absolutely, there are many verticals where we think Fresco can be useful. Would love to hear your thoughts on price point.

  • kyleli626 4 hours ago

    really cool application of LLMs to a big problem - nice work.

  • StephenSmith 4 hours ago

    We make an AI camera for residential home builders. I'd love to chat to see if there's any synergy here.

    bedrockwireless.com

    Ping me, stephen [at] bedrockwireless.com