Ichigo: Local real-time voice AI

(github.com)

175 points | by egnehots 3 days ago ago

33 comments

  • emreckartal 14 hours ago

    Emre here from Homebrew Research. It's great to see Ichigo on HN!

    A quick intro: We're a Local AI company building local AI tools and training open-source models.

    Ichigo is our training method that enables LLMs to understand human speech and talk back with low latency - thanks to FishSpeech integration. It is open data, open weights, and weight initialized with Llama 3.1, extending its reasoning ability.

    Plus, we are the creators and lead maintainers of: https://jan.ai/, Local AI Assistant - an alternative to ChatGPT & https://cortex.so/, Local AI Toolkit (soft launch coming soon)

    Everything we build and train is done out in the open - we share our progress on:

    https://x.com/homebrewltd https://discord.gg/hTmEwgyrEg

    You can check out all our products on our simple website: https://homebrew.ltd/

    • gnuly 8 hours ago

      any plans to share progress on open channels like matrix.org or even irc?

  • thruflo 8 hours ago

    Great stuff. Voice AI is great to run locally not just for privacy / access to personal data but also because of the low latency requirement. If there's a delay in conversation caused by a network call, it just feels weird, like an old satellite phone call.

  • cassepipe 3 days ago

    Finally I can use one of the random facts that have entered my brain for decades now even though I can't remember where my keys are.

    If I remember correctly, "ichigo" means strawberry in japanese. You are welcome.

    • SapporoChris 8 hours ago

      Sorry, you're wrong. It means 1 5. Just kidding, it is strawberry but it can also be read as one and five. However, it is not fifteen.

      • TheCraiggers 3 hours ago

        > it can also be read as one and five. However, it is not fifteen.

        Can you help me wrap my brain around this? Does it mean six? I'm struggling to understand how a word can mean two numbers and how this would actually be used in a conversation.

        Thanks. I'm curious and trying to search for this to understand just returns anime.

        • BugsJustFindMe 3 hours ago

          > I'm struggling to understand how a word can mean two numbers

          Ichi is the word for 1. Go is the word for 5.

          • TheCraiggers 2 hours ago

            /smacks forehead.

            Can't believe I fell for that.

    • adammarples 2 hours ago

      From the book tomorrow and tomorrow and tomorrow?

    • d3w3y 20 hours ago

      There are strawberries all over the readme so I reck you're right.

      • mmastrac 15 hours ago

        Is this a continuation of the meme that GPT can't identify the number of "R"s in "strawberry"?

        • TheDong 11 hours ago

          > How many 'r's are in the word 'ichigo'?

          GPT 4o: The word "ichigo," which is the Romanized spelling (romaji) of いちご, contains one "r." It appears in the letter "r" in "chi," as the "ch" sound in romaji represents a combination of the "r" sound from "r" and "t" sound from "i."

          Thank you chatgpt. I'm glad we've burned down a bunch of forests for this.

          You can consistently get the right answer with a prompt of:

          > Write python code, and run it, to count the number of 'r' characters in いちご.

          though. For numeric stuff, telling the thing to just write python code makes it significantly better at getting right answers.

          • BugsJustFindMe 3 hours ago

            Without any special prompt change, I get

            There are no “r”s in the word “ichigo.”

            Maybe your instructions are bad.

        • dev-jayson 14 hours ago

          I think you might be on to something

    • AtlasBarfed 19 hours ago

      Getsuga tenshou!!

      • dumb1224 3 hours ago

        haha was looking for that!

        Ban-kai 卍解

    • beretguy 19 hours ago

      Tatakae!

    • zarmin 17 hours ago

      Your keys are in the fridge with the remote control.

  • tmshapland 16 hours ago

    This is a really cool project! What have people built with it? I'd love to learn about what local apps people are building on this.

    • emreckartal 14 hours ago

      Thanks! We've received feedback on use cases like live translation, safe and untrackable educational tools for kids, and language-learning apps. There are so many possibilities, and hope to see guys building amazing products on top of Ichigo.

      • itake 12 hours ago

        I just tried to use the demo website for live translation. The AI always responded in English, either ignoring my request to only respond in French or Lao, or preface the translation with english ("I can translate that to French. the translation is: ...").

        I'm trying to use chatgpt for ai translation, but the other big problem I run into is TTS and SST on non-top 40 languages (e.g. lao). Facebook has a TTS library, but it isn't open for commercial use unfortunately.

        • emreckartal 10 hours ago

          Oh, I see. We've limited it to English for simplicity for the demo. More languages are planned for future releases.

  • mentalgear 8 hours ago

    Kudos to the team, this is truly impressive work! It's exciting to see how AI connects with the local-first movement, which is also really exploding in popularity. (The idea of local-first, where data processing and functionality are prioritized on users' own devices, aligns perfectly with emerging privacy concerns and the push for decentralization.)

    Bringing AI into this space enhances user experience while respecting their autonomy over data. It feels like a promising step toward a future where we can leverage the power of AI without compromising on privacy or control. Really looking forward to seeing how this evolves!

  • famahar 14 hours ago

    Looks impressive. I'm guessing the demo isn't representative of the full possibilities of this? Tried to have a basic conversation in Japanese and it kept on sticking with English. When it did eventually speak Japanese the pronunciation was completely off. I'm really excited about the possibility of local language learning with near realtime conversation practice. Will keep an eye on this.

  • cchance 17 hours ago

    its amazing to see cool projects like this really REALLY based in opensource and open training like this wow

    • emreckartal 14 hours ago

      Thanks! It's all open research, source code, data, and weights.

  • frankensteins 12 hours ago

    Great initiative! before adding more comments, I'm trying to set up on my local Mac M3 machine. I'm having a hard time to install dependencies. Anyone here have the same issue?

    • emreckartal 10 hours ago

      Thanks! You can't run Ichigo on a Mac M3 just yet. It'll be possible to run it locally on Mac once we integrate it with Jan.ai

  • p0larboy 14 hours ago

    Tried demo but all I got was "I'm sorry, I can't quite catch that".

  • lostmsu 17 hours ago

    Very cool, but a bit less practical than some alternatives because it does not seem to do request transcription.

    • emreckartal 14 hours ago

      Actually, it does. You can turn on the transcription feature from the bottom right corner and even type to Ichigo if you want. We didn’t show it in the launch video since we were focusing on the verbal interaction side of things.

      • emreckartal 10 hours ago

        Ah, I see now.

        To clarify, while you can enable transcription to see what Ichigo says, Ichigo's design skips directly from audio to speech representations without creating a text transcription of the user’s input. This makes interactions faster but does mean that the user's spoken input isn't transcribed to text.

        The flow we use is Speech → Encoder → Speech Representations → LLM → Text → TTS. By skipping the text step, we're able to speed things up and focus on the verbal experience.

        Hope this clears things up!