NotebookLlama: An open source version of NotebookLM

(github.com)

170 points | by bibinmohan 9 hours ago ago

28 comments

  • ttul 4 hours ago

    The more I listen to NotebookLM “episodes”, the more I am convinced that Google has trained a two-speaker “podcast discussion” model that directly generates the podcast off the back of an existing multimodal backbone. The two speakers interrupt and speak over each other in an uncannily humanlike manner. I wonder whether they basically fine tuned against a huge library of actual podcasts along with the podcast transcripts and perhaps generated synthetic “input material” from the transcripts to feed in as training samples.

    In other words, take an episode of The Daily and have one language model write a hypothetical article that would summarize what the podcast was about. And then pass that article into the two—speaker model, transcribe the output, and see how well that transcript aligns with the article fed in as input.

    I am sure I’m missing essential details, but the natural sound of these podcasts cannot possibly be coming from a text transcript.

    • swyx 4 hours ago

      > the more I am convinced that Google has trained a two-speaker “podcast discussion” model that directly generates the podcast off the back of an existing multimodal backbone.

      I have good and bad news for you - they did not! We were the first podcast to interview the audio engineer who led the audio model:

      https://www.latent.space/p/notebooklm

      TLDR they did confirm that the transcript and the audio are generated separately, but yes the TTS model is trained far beyond anything we have in OSS or commercially available

    • rmorey 4 hours ago

      I feel similarly about NotebookLM, but have noticed one odd thing - occasionally Host A will be speaking, and suddenly Host B will complete their sentence. And usually when this happens, it's in a way that doesn't make sense, because Host A was just explaining something to or answering a question of Host B.

      I'm actually not sure what to make of that, but it's interesting to note

      • dleeftink 4 hours ago

        It's speaker diarisation, and depending on the quality of the resulting labelling and speaker end marker tokens, what influences the rhythm of a conversation (Or the input data just has many podcast hosts completing each other's..sandwiches?)

      • behnamoh 4 hours ago

        That's the annoying part about NLM. It ruins the illusion of having one person explaining it to the other person.

    • og_kalu 4 hours ago

      Following up on swyx, the TTS is probably Google finally releasing Soundstorm from the basement.

      https://google-research.github.io/seanet/soundstorm/examples...

  • jrm4 3 hours ago

    Great to see this: Fellow tech-geeks, ignore the NotebookLM thing at your peril.

    NotebookLM, far and away, has been the "AI Killer App" for the VAST MAJORITY of bright-but-not-particularly-techy people I know. My 70ish parents and my 8 year old kid are both just blown away by this thing and can't stop playing with it.

    Edit: As someone pointed out below, I absolutely mean just the "podcast" thing.

    • wodenokoto 18 minutes ago

      As someone who doesn’t listen to podcasts what perils will I suffer from not making podcasts in notebookLM?

    • jeffbee 3 hours ago

      Are we talking about NotebookLM generally or specifically the podcast stunt?

      • jrm4 2 hours ago

        Good question: I absolutely mean the podcast stunt.

        • dartos an hour ago

          Idk if I’d call it a killer app.

          The podcasts are grating to listen to and usually only contain very surface information I could gain from a paper’s abstract.

          It’s a wildly impressive technical achievement though.

  • lelag 5 hours ago

    Pretty weird choice of TTS engines. None of them are anywhere near state of the art as far as open TTS system goes. XTTSv2 or the new F5-TTS would have been much better choices.

    • segmondy 4 hours ago

      You can always update the code to use that. Meta releasing stuff on github is not trying to release the "bet" but to give a proof of concept. The licenses of those TTS system matters, it's not enough to be open. If this was a product for their users, they will definitely have better TTS.

  • rmorey 4 hours ago

    The sample output is very poor. Cool demo, but really just emphasizes how much of a hit product the NotebookLM team has managed to come up with, ostensibly with more or less the same foundation models already available.

  • sajid-aipm an hour ago

    I wonder, how soon they release this in other languages and with different accents epecially Se-Asian accents.

  • danpalmer 6 hours ago

    I'm not so sure this is an open source NotebookLM as it is a few experiments in an iPython notebook. What NotebookLM does at an LLM level is not particularly novel, it's the packaging as a product in a different way than what others are doing that I think is interesting. Also the "podcast" bit is really just an intro/overview of a large corpus, far more useful is being able to discuss that corpus with the bot and get cited references.

    What this does however demonstrate is that prototyping with LLMs is very fast. I'd encourage anyone who hasn't had a play around with APIs to give it a go.

    • behnamoh 4 hours ago

      > What NotebookLM does at an LLM level is not particularly novel, it's the packaging as a product...

      Disagreed. NLM is novel in how the two hosts interrupt and overlap each other. No other OSS solution does that, they just take turns talking.

      • danpalmer 3 hours ago

        Fair point, although to me the "audio overviews" are a minor feature of the product.

  • zmmmmm 5 hours ago

    It only creates the podcasts right?

    I am more interested in the other features of NotebookLM. The podcasts are fun but gimmicky.

  • alanzhuly 5 hours ago

    If we can have this running locally on mobile phone that would be pretty cool. Imagine receiving a work document (for example, product requirement documents), and then this turning it into a podcast to play for me while I am driving. I think my productivity will be through the roof and I don't need to worry about compliance issues.

    • SubiculumCode 5 hours ago

      I wish chatgpt or Claude would make an an Android Auto app that I can use while driving.

  • jklein11 5 hours ago

    Man.. the sample is pretty rough

  • mmaunder 5 hours ago

    I’d love to hear the output if anyone has used this.

    • herval 5 hours ago

      There’s an example output linked on the github page

  • gnabgib 6 hours ago

    Page title: NotebookLlama: An Open Source version of NotebookLM