Flame: A small language model for spreadsheet formulas (2023)

(arxiv.org)

117 points | by azhenley 6 months ago ago

21 comments

pvg 6 months ago

Small thread at the time https://news.ycombinator.com/item?id=34607738

matltc 6 months ago

I'm trying to get out of spreadsheet hell, not embrace it

[-]

ActionHank 6 months ago

But the sell here is that your boss goes to spreadsheet hell and you go to the unemployment line.

[-]

crawftv 6 months ago

This just means your boss needs a spreadsheet model guy. "Bosses" would rather die than enter a formula in spreadsheet.

[-]

dmd 6 months ago

This may have been true in 1985, but it is definitely not true today. "Bosses" love making massively complex spreadsheets.

[-]

skeeter2020 6 months ago

Many of them love consuming and faffing about with complex spreadsheets but I just watched my CTO cut & paste cells with menu items and the mouse, in a round-about A->C, B->A, C->B way. And the font is comic sans, and everything is sorted by color. Any of the more sophisticated tools or a macro is way beyond this one.

teleforce 6 months ago

Given the pervasive nature of spreadsheets in today's world, we have no choice but to embrace it but need to manage spreadsheets based data properly.

The good news is that Mathematics of Big Data book published by MIT press book is proposing associative array algebra that is capable of properly processing spreadsheets, graph networks, matrices and databases [1], [2]. According to the book this is the first time such system is being proposed. They even provided open source D4M software in Python and Matlab to demonstrate the idea, probably worth checking out [3].

[1] Mathematics of Big Data:

https://mitpress.mit.edu/9780262038393/mathematics-of-big-da...

[2] Associative Arrays: Unified Mathematics for Spreadsheets, Databases, Matrices, and Graphs:

https://arxiv.org/abs/1501.05709

[3] A D4M module for Python:

https://github.com/Accla/D4M.py

sixhobbits 6 months ago

I wish Google had spent their efforts building stuff like this into Google Sheets instead of launching a Gemini tab that follows you everywhere and says "sorry I can't help with that" if you try to use it

[-]

bsenftner 6 months ago

When you look into the data structure and organization that is underneath Google Sheets, and if you have any experience with LLMs and structured outputs, Google shit in their sheets as far as having an overly complex internal representation for a spreadsheet. Their data representation needs to be translated into something far easier to work with for machine learning models to work with that data, without a massive amount of filtering just to reach the spreadsheet data itself.

[-]

daveguy 6 months ago

> When you look into the data structure and organization that is underneath Google Sheets

I don't use Google Sheets very often, so maybe this is obvious to someone that does. But, how do you look at the data structure and organization underneath Google Sheets? Are you an employee or is this available to the general public? Through an api?

[-]

bsenftner 6 months ago

Look at their API and the data structures it uses, the relationships between them. It's more complex than necessary, ridiculously so, it feels like a student's learning project they abandoned out of boredom. API consistency, what's that?

gcr 6 months ago

Or we can make models that are used to accepting spreadsheet data as input. It’s probably just a different kinds of positional embedding even.

[-]

bsenftner 6 months ago

Believe it or not, we already do, it is just not obvious and so far, I've only exchanged info with very few that realize this, and how to access that capability.

dartos 6 months ago

I think if you trained a MLM to replace certain tokens in a CSV with formulas, you can accomplish that without worrying about the internal representation of a spreadsheet.

[-]

bsenftner 6 months ago

My point is that no training is necessary, none at all. The major LLMs already know the internal data structures and APIs of open source spreadsheets, because that information is in their training data, all open source software has their source code in major LLM training data. The trick is accessing that knowledge.

[-]

bckr 6 months ago

Is it a prompt trick?

[-]

bsenftner 6 months ago

Not so much of a "trick" as it is understanding how to create a context the retrieves/generates from the more/most accurate training data.

In the case of open source software, a specific open source app or library or API, often all it takes is to reference a library or API by name, use a few terms specific to that library or service, and you'll get an LLM context that retrieves/generates from that area of knowledge. That context knows information about the library/API that is useful for integration and will tell you all about them just by explaining what you're trying to do. Note this is essentially conversational R&D, which is then backed up by verifying the data structures, library calls and/or APIs are indeed as they are being discussed. I include verifying checks during the conversation, rather than build a strategy and then check if what was discussed is even possible.

If an integration is possible, if the above conversation yields useful data and function/API calls (which they tend to do), then it's just a matter of a well structured prompt that's probably generating structured output. The end result is a process where a useful library/API call retrieves essential data of some sort that the user can ask questions about, modify, and if modified the prompt generates the data that is then fed back into the library/API.

nashashmi 6 months ago

That would have been the future had OpenAI and the other company not ventured to offer AI search power in a chat window. small incremental AI implementations would have been the norm. Like auto reply in email. Or email summary at end of day.

6 months ago

[deleted]

6 months ago

[deleted]

fbfactchecker 6 months ago

[flagged]