3 comments

  • malvads 2 months ago

    After running into only paid tools or overly complicated setups for turning web pages into structured data for LLMs, I was pretty much tired of this, wanted free open source solution to convert websites to MD format so built Mojo (for NotebookLM, or any RAG-like solution)

    Mojo it's extremly fast, supports proxy rotation and it's MIT licensed -> https://github.com/malvads/mojo

  • firefoxd 2 months ago

    It should start by looking at robot.txt.

    • malvads 2 months ago

      Hi, thanks for your comments (it’s on the plan), since Mojo is early-stage software, there is still things that need to be integrated, however mojo is not a mass-crawler, (you have to specify directly what to crawl), so even if I add robots.txt (wich is in the plan) Evil users can still just bypass this (I expect mojo to be used by technical (non-evil) folks).

      But thanks for your suggestion :)