Show HN: Private, in-browser, CSV-SQL query runner using SQLite-WASM

(csvsqltool.com)

41 points | by freakynit 8 months ago ago

13 comments

pdyc 8 months ago

crashed with big file https://www.papaparse.com/resources/files/big.csv i evaluated sqlite for my tool and wrote about it here https://newbeelearn.com/blog/million-rows-csv-debug-story/

BTW i checked your sqlite.wasm to be quite compact, just 340kb are you doing custom compilation.

[-]

freakynit 8 months ago

Hey, thanks for trying out.

I did check out with the csv you linked. It worked for me. https://ibb.co/cb3SFbR. Before making it live, I did already test it with 100MB+ CSV's having like 10+ columns.

I also checked my sentry integration, but couldn't find any errors to your issue. Seems to me like OS/browser level ram restriction. In case you have anything to help me with debug this, do share..

Im using regular sqlite-wasm. No custom stuff.

Thanks..

[-]

pdyc 8 months ago

i tested on machine with 4gb ram, also i modified the query to remove the limit i.e. "select * from data"

can you let me know from where you are picking sqlite wasm? i checked here https://sqlite.org/download.html and sqlite wasm shared here is approx 830kb

Edit: never mind checked here https://github.com/sql-js/sql.js/releases/tag/v1.12.0 and its indeed around 330kb.

[-]

kfajdsl 8 months ago

Loading that CSV and running `select * from data` left me with 879MB of memory usage on the tab.

itsbrex 8 months ago

Works on my iPhone with the ‘big.csv’ file as well

[-]

freakynit 8 months ago

Thanks a lot for trying it out and verifying :)

victor106 8 months ago

This is cool. Any pointers on how you built this?

[-]

freakynit 8 months ago

Hey, thanks...

Almost all the major work is done by sqlite (packaged as wasm binary), with second most by Papaparse library. I just joined these two and put up a decent web ui on top. The only manual thing done by me is schema parsing, which is based on first 1000 rows, and downloading functionality.

[-]

nbbaier 8 months ago

How are you doing the schema parsing?

[-]

freakynit 8 months ago

(1.) Since this is a javascript environment, the types are pretty minimal and just one of each top level type (unlike languages like java or C, which, for example, clould have int, float, double, etc. for just one `number` top-level type). These, here, in javascript, are: number, string, boolean, object, null/undefined.

(2.) Then I use `typeof` operator for each field to determine appropriate javascript type.

(3.) Then, I map these javascript types to sqlite types using this mapping:::

const typeMapping = { 'number': 'REAL', 'string': 'TEXT', 'boolean': 'INTEGER', 'object': 'TEXT', 'undefined': 'TEXT' };

(4.) Then, I generate create table sql's fields declaration part. This also handles an edge case where if first 1000 rows (used for schema generation) of a particular field are all null/undefined, I default to using sqlite's TEXT type using this:::

const columns = fields.map(field => `'${field}' ${typeMapping[columnTypes[field]] || 'TEXT'}`);

(5.) Finally, I create sqlite table:::

const createTableSQL = `CREATE TABLE data (${columns.join(', ')});`;

db = new SQL.Database();

db.run(createTableSQL);

Thanks..

[-]

nbbaier 8 months ago

Nice explainer, thanks!

[-]

freakynit 8 months ago

Welcome :)

yakorevivan 8 months ago

[dead]