Thanks, I haven't heard of it before. I wonder if it's possible to use OrbitDB to download a file from IPFS though? Based on it's CID, I mean, because that was my intention with Helia. I thought that one of the nice thing about reading books is that it takes time, and if users could be seeding their files from IndexedDB to the network, they could automatically and effortlessly become "contributing citizens".
Another interesting use case would be to see if this can replace (or be an addition to) SQLite as the database in which the queries are ran.
The Pear P2P framework might be worth a look if you want to get off of GitHub and turn it into a truly distributed system. If the index must be on GitHub then what good is it to have the files on IPFS?
That's a fair point - I guess it could be abused. Databases are sorted by their number of GitHub stars, so I was hoping that with the power of the crowds it will be possible to minimize the bad effect such actor might have, by simply not voting them up.
there's been several attacks recently where a bad actor takes over a repo where the original maintainer wants to take a step back, then launch a supply chain attack. in recent cases, the attack came from obfuscated binary files in the repo rather than code. given we are dealing with documents here (books) that would be easy to hide malicious code in a file. pdfs have all sorts of execution vulnerabilities for example
Interesting - I'm kinda counting on PDF.js, which is used for PDF rendering, on doing it safely, but of course that doesn't always have to be the case. Do you have any thoughts on how to make this safer?
"Everything is done in the browser - no users, no cookies, no tracking. LocalStorage and IndexedDB are used for saving your last readings, your position in each file."
Libgen is one of the databases that are currently available (and was probably a good fit because they already had their files on IPFS), but I think that this architecture, in which the UI is decoupled from the database, the database doesn't hold any copyrighted materials, and the files are downloaded from IPFS, is quite resilient and could be used for serving all sorts of content.
There may need to be a different Builder or an extension of sphinxcontrib.serializinghtml.JSONHTMLBuilder which serializes a doctree (basically a DOM document object model) to the output representation: https://www.sphinx-doc.org/en/master/usage/builders/#sphinxc...
datasette and datasette-lite can load CSV, JSON, Parquet, and SQLite databases; supports Full-Text Search; and supports search Faceting. datasette-lite is a WASM build of datasette with the pyodide python distribution.
jupyter-lite is a WASM build of jupyter which also supports sqlite in notebooks in the browser with `import sqlite3` with the python kernel and also with a sqlite kernel: https://jupyter.org/try-jupyter/lab/
> Do static sites built with sphinx-build or jupyter-book or hugo or other jamstack static site generators work with TeaTime?
I guess it depends on what you mean by "work with TeaTime". TeaTime itself is a static site, generated using Nuxt. Nothing that it does cannot be achieved with another stack - at the end it's just HTML, CSS and JS. I haven't tried sphinx-build or jupyter-book, but there isn't a technical reason why Hugo wouldn't be able to build a TeaTime like website, using the same databases.
I haven't seen datasette before. What are the biggest benefits you think it has over sql.js-httpvfs (which I'm using now)? Is it about the ability to also use other formats, in addition to SQLite? I got the impression that sql.js-httpvfs was a bit more of a POC, and later some possibly better solutions came out, but I haven't really went that rabbit hole to figure out which one would be best.
Edit: looking a little more into datasette-lite, it seems like one of the nice benefits of sql.js-httpvfs is that it doesn't download the whole SQLite database in order to query it. This makes it possible have a 2GB database but still read it in chunks, skipping around efficiently until you find your data.
Any particular reason for choosing IPFS instead of bittorrent or other p2p protocols? It feels like every time I try an IPFS tool it just crashes, whereas I rarely have issues with torrents.
Yeah - the desire to have TeaTime run as a normal website. BitTorrent doesn't run over HTTP, unless you use WebTorrent, which most BitTorrent clients aren't 100% compatible with. This means you can basically only download from other WebTorrent nodes - and there aren't many.
I do think this might change with the introduction of things like the Direct Sockets API, but for now they are too restricted and not widely supported.
It's my first time working with IPFS and I agree it hasn't always been 100% reliable, but I do hope that if I manage to get TeaTime users to also be contributing nodes, this might actually improve the reliability of the whole network. Once it's possible to use BitTorrent in the browser, I do think it would be a great addition (https://github.com/bjesus/teatime/issues/3).
> (I'm looking into using Helia so that users are also contributing nodes in the network)
I had to look that term up <https://github.com/ipfs/helia#readme> but while sniffing around in their <https://github.com/ipfs/helia/wiki/Projects-using-Helia> I was reminded of https://github.com/orbitdb/orbitdb#readme which seems like it may be much less rolling your own parts
Thanks, I haven't heard of it before. I wonder if it's possible to use OrbitDB to download a file from IPFS though? Based on it's CID, I mean, because that was my intention with Helia. I thought that one of the nice thing about reading books is that it takes time, and if users could be seeding their files from IndexedDB to the network, they could automatically and effortlessly become "contributing citizens".
Another interesting use case would be to see if this can replace (or be an addition to) SQLite as the database in which the queries are ran.
The Pear P2P framework might be worth a look if you want to get off of GitHub and turn it into a truly distributed system. If the index must be on GitHub then what good is it to have the files on IPFS?
https://docs.pears.com/
> The databases used in TeaTime are GitHub repositories tagged with the teatime-database topic, which are published on GitHub Pages.
Couldn't this be a security issue, for a bad actors to use this tag?
That's a fair point - I guess it could be abused. Databases are sorted by their number of GitHub stars, so I was hoping that with the power of the crowds it will be possible to minimize the bad effect such actor might have, by simply not voting them up.
there's been several attacks recently where a bad actor takes over a repo where the original maintainer wants to take a step back, then launch a supply chain attack. in recent cases, the attack came from obfuscated binary files in the repo rather than code. given we are dealing with documents here (books) that would be easy to hide malicious code in a file. pdfs have all sorts of execution vulnerabilities for example
Interesting - I'm kinda counting on PDF.js, which is used for PDF rendering, on doing it safely, but of course that doesn't always have to be the case. Do you have any thoughts on how to make this safer?
some other method of collection where you can hav eknown trust of the files contributed, some method of 'registering' a submission to create trust,
PDFs are not executables.
May I recommend the old 27C3 talk "OMG WTF PDF"?
You’d be surprised what’s executable with the right attitude.
Sorry if I missed this, but is there an example instance we can play with?
Yes, https://bjesus.github.io/teatime/
"Everything is done in the browser - no users, no cookies, no tracking. LocalStorage and IndexedDB are used for saving your last readings, your position in each file."
I love this! Thanks for making this!
Is this like an open source distributed libgen?
Libgen is one of the databases that are currently available (and was probably a good fit because they already had their files on IPFS), but I think that this architecture, in which the UI is decoupled from the database, the database doesn't hold any copyrighted materials, and the files are downloaded from IPFS, is quite resilient and could be used for serving all sorts of content.
Do static sites built with sphinx-build or jupyter-book or hugo or other jamstack static site generators work with TeaTime?
sphinx-build: https://www.sphinx-doc.org/en/master/man/sphinx-build.html
There may need to be a different Builder or an extension of sphinxcontrib.serializinghtml.JSONHTMLBuilder which serializes a doctree (basically a DOM document object model) to the output representation: https://www.sphinx-doc.org/en/master/usage/builders/#sphinxc...
datasette and datasette-lite can load CSV, JSON, Parquet, and SQLite databases; supports Full-Text Search; and supports search Faceting. datasette-lite is a WASM build of datasette with the pyodide python distribution.
datasette-lite > Loading SQLite databases: https://github.com/simonw/datasette-lite#loading-sqlite-data...
jupyter-lite is a WASM build of jupyter which also supports sqlite in notebooks in the browser with `import sqlite3` with the python kernel and also with a sqlite kernel: https://jupyter.org/try-jupyter/lab/
jupyterlite/xeus-sqlite-kernel: https://github.com/jupyterlite/xeus-sqlite-kernel
(edit)
xeus-sqlite-kernel > "Loading SQLite databases from a remote URL" https://github.com/jupyterlite/xeus-sqlite-kernel/issues/6#i...
%FETCH <url> <filename> https://github.com/jupyter-xeus/xeus-sqlite/blob/ce5a598bdab...
xlite.cpp > void fetch(const std::string url, const std::string filename) https://github.com/jupyter-xeus/xeus-sqlite/blob/main/src/xl...
> Do static sites built with sphinx-build or jupyter-book or hugo or other jamstack static site generators work with TeaTime?
I guess it depends on what you mean by "work with TeaTime". TeaTime itself is a static site, generated using Nuxt. Nothing that it does cannot be achieved with another stack - at the end it's just HTML, CSS and JS. I haven't tried sphinx-build or jupyter-book, but there isn't a technical reason why Hugo wouldn't be able to build a TeaTime like website, using the same databases.
> datasette-lite > Loading SQLite databases: https://github.com/simonw/datasette-lite#loading-sqlite-data...
I haven't seen datasette before. What are the biggest benefits you think it has over sql.js-httpvfs (which I'm using now)? Is it about the ability to also use other formats, in addition to SQLite? I got the impression that sql.js-httpvfs was a bit more of a POC, and later some possibly better solutions came out, but I haven't really went that rabbit hole to figure out which one would be best.
Edit: looking a little more into datasette-lite, it seems like one of the nice benefits of sql.js-httpvfs is that it doesn't download the whole SQLite database in order to query it. This makes it possible have a 2GB database but still read it in chunks, skipping around efficiently until you find your data.
super cool
Any particular reason for choosing IPFS instead of bittorrent or other p2p protocols? It feels like every time I try an IPFS tool it just crashes, whereas I rarely have issues with torrents.
Yeah - the desire to have TeaTime run as a normal website. BitTorrent doesn't run over HTTP, unless you use WebTorrent, which most BitTorrent clients aren't 100% compatible with. This means you can basically only download from other WebTorrent nodes - and there aren't many.
I do think this might change with the introduction of things like the Direct Sockets API, but for now they are too restricted and not widely supported.
It's my first time working with IPFS and I agree it hasn't always been 100% reliable, but I do hope that if I manage to get TeaTime users to also be contributing nodes, this might actually improve the reliability of the whole network. Once it's possible to use BitTorrent in the browser, I do think it would be a great addition (https://github.com/bjesus/teatime/issues/3).