I did check out with the csv you linked. It worked for me. https://ibb.co/cb3SFbR. Before making it live, I did already test it with 100MB+ CSV's having like 10+ columns.
I also checked my sentry integration, but couldn't find any errors to your issue. Seems to me like OS/browser level ram restriction. In case you have anything to help me with debug this, do share..
i tested on machine with 4gb ram, also i modified the query to remove the limit i.e. "select * from data"
can you let me know from where you are picking sqlite wasm? i checked here https://sqlite.org/download.html and sqlite wasm shared here is approx 830kb
Almost all the major work is done by sqlite (packaged as wasm binary), with second most by Papaparse library. I just joined these two and put up a decent web ui on top. The only manual thing done by me is schema parsing, which is based on first 1000 rows, and downloading functionality.
(1.) Since this is a javascript environment, the types are pretty minimal and just one of each top level type (unlike languages like java or C, which, for example, clould have int, float, double, etc. for just one `number` top-level type). These, here, in javascript, are: number, string, boolean, object, null/undefined.
(2.) Then I use `typeof` operator for each field to determine appropriate javascript type.
(3.) Then, I map these javascript types to sqlite types using this mapping:::
(4.) Then, I generate create table sql's fields declaration part. This also handles an edge case where if first 1000 rows (used for schema generation) of a particular field are all null/undefined, I default to using sqlite's TEXT type using this:::
crashed with big file https://www.papaparse.com/resources/files/big.csv i evaluated sqlite for my tool and wrote about it here https://newbeelearn.com/blog/million-rows-csv-debug-story/
BTW i checked your sqlite.wasm to be quite compact, just 340kb are you doing custom compilation.
Hey, thanks for trying out.
I did check out with the csv you linked. It worked for me. https://ibb.co/cb3SFbR. Before making it live, I did already test it with 100MB+ CSV's having like 10+ columns.
I also checked my sentry integration, but couldn't find any errors to your issue. Seems to me like OS/browser level ram restriction. In case you have anything to help me with debug this, do share..
Im using regular sqlite-wasm. No custom stuff.
Thanks..
i tested on machine with 4gb ram, also i modified the query to remove the limit i.e. "select * from data"
can you let me know from where you are picking sqlite wasm? i checked here https://sqlite.org/download.html and sqlite wasm shared here is approx 830kb
Edit: never mind checked here https://github.com/sql-js/sql.js/releases/tag/v1.12.0 and its indeed around 330kb.
Loading that CSV and running `select * from data` left me with 879MB of memory usage on the tab.
Works on my iPhone with the ‘big.csv’ file as well
Thanks a lot for trying it out and verifying :)
This is cool. Any pointers on how you built this?
Hey, thanks...
Almost all the major work is done by sqlite (packaged as wasm binary), with second most by Papaparse library. I just joined these two and put up a decent web ui on top. The only manual thing done by me is schema parsing, which is based on first 1000 rows, and downloading functionality.
How are you doing the schema parsing?
(1.) Since this is a javascript environment, the types are pretty minimal and just one of each top level type (unlike languages like java or C, which, for example, clould have int, float, double, etc. for just one `number` top-level type). These, here, in javascript, are: number, string, boolean, object, null/undefined.
(2.) Then I use `typeof` operator for each field to determine appropriate javascript type.
(3.) Then, I map these javascript types to sqlite types using this mapping:::
const typeMapping = { 'number': 'REAL', 'string': 'TEXT', 'boolean': 'INTEGER', 'object': 'TEXT', 'undefined': 'TEXT' };
(4.) Then, I generate create table sql's fields declaration part. This also handles an edge case where if first 1000 rows (used for schema generation) of a particular field are all null/undefined, I default to using sqlite's TEXT type using this:::
const columns = fields.map(field => `'${field}' ${typeMapping[columnTypes[field]] || 'TEXT'}`);
(5.) Finally, I create sqlite table:::
const createTableSQL = `CREATE TABLE data (${columns.join(', ')});`;
db = new SQL.Database();
db.run(createTableSQL);
Thanks..
Nice explainer, thanks!
Welcome :)