AI engineers claim new algorithm reduces AI power consumption by 95%

(tomshardware.com)

370 points | by ferriswil 9 months ago ago

173 comments

djoldman 9 months ago

ABSTRACT

Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precision. We propose the linear-complexity multiplication (L-Mul) algorithm that approximates floating point number multiplication with integer addition operations. The new algorithm costs significantly less computation resource than 8-bit floating point multiplication but achieves higher precision. Compared to 8-bit floating point multiplications, the proposed method achieves higher precision but consumes significantly less bit-level computation. Since multiplying floating point numbers requires substantially higher energy compared to integer addition operations, applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by elementwise floating point tensor multiplications and 80% energy cost of dot products. We calculated the theoretical error expectation of L-Mul, and evaluated the algorithm on a wide range of textual, visual, and symbolic tasks, including natural language understanding, structural reasoning, mathematics, and commonsense question answering. Our numerical analysis experiments agree with the theoretical error estimation, which indicates that L-Mul with 4-bit mantissa achieves comparable precision as float8 e4m3 multiplications, and L-Mul with 3-bit mantissa outperforms float8 e5m2. Evaluation results on popular benchmarks show that directly applying L-Mul to the attention mechanism is almost lossless. We further show that replacing all floating point multiplications with 3-bit mantissa L-Mul in a transformer model achieves equivalent precision as using float8 e4m3 as accumulation precision in both fine-tuning and inference.

[-]

onlyrealcuzzo 9 months ago

Does this mean you can train efficiently without GPUs?

Presumably there will be a lot of interest.

[-]

crazygringo 9 months ago

No. But it does potentially mean that either current or future-tweaked GPUs could run a lot more efficiently -- meaning much faster or with much less energy consumption.

You still need the GPU parallelism though.

[-]

3abiton 9 months ago

This is still amazing work, imagine running chungus models on a single 3090.

[-]

Sohcahtoa82 9 months ago

The bottleneck on a consumer-grade GPU like a 3090 isn't the processing power, it's the lack of RAM. The PCI-Express bus ends up being your bottleneck from having to swap in parts of the model.

Even with PCIe 5.0 and 16 lanes, you only get 64 GB/s of bandwidth. If you're trying to run a model too big for your GPU, then for every token, it has to reload the entire model. With a 70B parameter model, 8 bit quantization, you're looking at just under 1 token/sec just from having to transfer parts of the model in constantly. Making the actual computation faster won't make it any faster.

[-]

dragonwriter 9 months ago

OTOH, doesn't it also mean that (given appropriate software framework support) iGPUs with less processing capacity and slower-but-more RAM available (because system RAM is comparatively cheap and plentiful compared to VRAM) without swapping anything are more competitive against consumer dGPUs with fast-but-small RAM for both inference and training with larger models?

[-]

Sohcahtoa82 9 months ago

System memory isn't that fast, either. Even with DDR5-8400, the fastest memory you can get right now, you're only looking at a memory transfer speed of 67.2 GB/s, barely faster than the PCI-E bus. So even if you could store that entire 70B model in RAM, you're still getting just under 1 token/sec, and that's assuming your CPU doesn't become a bottleneck.

Your best bet would likely be a laptop that has integrated system RAM with VRAM, but I don't think any of those offer enough RAM to store an entire 70B model. A 7B parameter model would work fine, but you could do those on a consumer-grade GPU anyways.

[-]

DeveloperErrata 9 months ago

Macbook Pros with M3 & integrated RAM & VRAM can do 70B models :)

fuzzfactor 9 months ago

I had a feeling it had to be something like massive waste due to a misguided feature of the algorithms that shouldn't have been there in the first place.

Once the "math is done" quite likely it would have paid off better than most investments for the top people to have spent a few short years working with grossly underpowered hardware until they could come up with amazing results there before scaling up. Rather than grossly overpowered hardware before there was even deep understanding of the underlying processes.

When you think about it, what we have seen from the latest ultra-high-powered "thinking" machines is truly so impressive. But if you are trying to fool somebody into believing that it's a real person it's still not "quite" there.

Maybe a good benchmark would be to take a regular PC, and without reliance on AI just pull out all the stops and put all the effort into fakery itself. No holds barred, any trick you can think of. See what the electronics is capable of this way. There are some smart engineers, this would only take a few years but looks like it would have been a lot more affordable.

Then with the same hardware if an AI alternative is not as convincing, something has got to be wrong.

It's good to find out this type of thing before you go overboard.

Regardless of speed or power, I never could have gotten an 8-bit computer to match the output of a 32-bit floating-point algorithm by using floating-point myself. Integers all the way and place the decimal where it's supposed to be when you're done.

Once it's really figured out, how do you think it would feel being the one paying the electric bills up until now?

[-]

jimmaswell 9 months ago

Faster progress was absolutely worth it. Spending years agonizing over theory to save a bit of electric would have been a massive disservice to the world.

[-]

rossjudson 9 months ago

You're sort of presuming that LLMs are going to be a massive service to the world there, aren't you? I think the jury is still out on that one.

[-]

jimmaswell 9 months ago

They already have been. Even just in programming, even just Copilot has been a life changing productivity booster.

[-]

recursive 9 months ago

I've been using copilot for several months. If I could figure out a way to measure its impact on my productivity, I'd probably see a single digit percentage boost in "productivity". This is not life-changing for me. And for some tasks, it's actually worse than nothing. As in, I spend time feeding it a task, and it just completely fails to do anything useful.

[-]

jimmaswell 9 months ago

I've been using it for over a year I think. I don't often feed it tasks with comments so much as go about things the same as usual and let it autocomplete. The time and cognitive load saved adds up massively. I've had to go without it for a bit while my workplace gets its license in order for the corporate version and the personal version has an issue with the proxy, and it's been agonizing going without it again. I almost forgot how much it sucks having to jump to google every other minute, and it was easy to start to take for granted how much context copilot was letting me not have to hold onto in my head. It really lets me work on the problem as opposed to being mired in immaterial details. It feels like I'm at least 2x slower overall without it.

[-]

atq2119 9 months ago

> I almost forgot how much it sucks having to jump to google every other minute

Even allowing for some hyperbole, your programming experience is extremely different from mine. Looking anything up outside the IDE, let alone via Google, is by far the exception for me rather than the rule.

I've long suspected that this kind of difference explains a lot of the difference in how Copilot is perceived.

[-]

namaria 9 months ago

Claiming LLMs are a massive boost for coding productivity is becoming a red flag that the claimant has a tenuous grasp on the skills necessary. Yeah if you have to look up everything all the time and you can't tell the AI slop isn't very good, you can put out code quite fast.

[-]

jimmaswell 9 months ago

At the risk of sounding like an inflated ego: I'm very good at what I do, the rest of my team frequently looks to me for guidance, my boss and boss's boss etc. have repeatedly said I'm among the most valuable people around, and I'm the one turned to in emergencies, for difficult architectural decisions, and to lead projects. I conceptually understand the ecosystem I work in very well at every layer.

What I'm not good at is memorizing API's and libraries that all use different verbs and nouns for the same thing, and other such things that are immaterial to the actual work. How do you use a mutation observer again? Hell if I remember the syntax but I know the concept, and copilot will probably spit out what I want, and I'll easily verify the output. Or how do you copy an array in JS? Or print a stack trace? Or do a node walk? You can either wade through google and stackoverflow, or copilot can tell you instantly. And I can very quickly tell if the code copilot gave me is sensible or not.

williamcotton 9 months ago

I know plenty of fantastic engineers that use LLM tools as code assistants.

I’m not sure when and why reading documentation and man pages became a sign of a lack of skill. Watch a presentation by someone like Brian Kernighan and you’ll see him joke about looking up certain compiler flags for the thousandth time!

Personally I work in C, C#, F#, Java, Kotlin, Swift, R, Ruby, Python, Postgres SQL, MySQL SQL, TypeScript, node, and whatever hundreds of libraries and DSLs are built on top. Yes, I have to look up documentation and with regularity.

[-]

fragmede 9 months ago

Add Golang and rust and JavaScript and next.js and react to the list for me. ;) If you live and work and breathe in the same kernel, operating system, and user space, and don't end up memorizing the various bits of minutiae, I'd judge you (and me) too, but it's not the 2000's, or the 90's or even the 80's anymore, and some of us don't have the luxury, or have chosen not to, live in one small niche for our entire career. At the end of the day, the client doesn't care what language you use, or the framework, or even the code quality, as long as it works. What they don't want to pay for is overage, and taking the previous developer's work and refactoring it and rewriting it in your preferred language isn't high value work, so you pick up whatever they used and run with it. Yeah that makes me less fluent in that one particular thing, not having done the same thing for 20+ years, but that's not where I deliver value. Some people do, and that's great for them and their employers, but my expertise lies elsewhere. I got real good at MFC, back in the day, and then WX and Qt and I'm working on getting good at react and such.

FpUser 9 months ago

Same opinion here. I work with way too many things to keep everything in my head. I'd rather use my head for design than to remember every function and parameter of say STL

specialist 9 months ago

For me, thus far, LLMs help me forage docs. I know what I want and it helps me narrow my search faster. Watching adepts like Simon Willison wield LLMs is on my to do list.

datavirtue 9 months ago

Nope, just want it to write tests and other low value work so I can get shit done. Some of it depends on the stakes of your job. Are you floating along day by day in big corp or are you grinding it out at a startup? Those working at the startup have to use coding assistants, period.

h_tbob 9 months ago

Hey, we were all beginners once!

On another note, even if you are experienced it helps when doing new stuff and you don’t know the proper syntax for what you want. For example let’s say your using flutter, you can just type

// bold

And it will help put the proper bold stuff in there.

soulofmischief 9 months ago

Comments like this are a great example of the Dunning-Kruger effect. Your comment is actually an indication that you don't have the mastery required to get useful, productive output from a high quality LLM.

Maybe you don't push your boundaries as an engineer and thus rarely need to know new things or at least learn new API surfaces. Maybe you don't know how to effectively prompt an LLM. Maybe you lack the mastery to analyze and refine the results. Maybe you just like doing things the slow way. I too remember a time as an early programmer where I eschewed even Intellisense and basic auto complete...

I'd recommend learning a bit more and practicing some humility and curiosity before condemning an entire class of engineers just because you don't understand their workflow. Just because you've had subpar experiences with a new tool doesn't mean it's not a useful tool in another engineer's toolkit.

[-]

namaria 9 months ago

Funny you should make claims about my skills when you have exactly zero data about my abilities or performance.

Evaluating my skills based on how I evaluated someone else's skills when they tell me about their abilities with and without a crutch, and throwing big academic sounding expressions with 'effect' in them might be intimidating to some but to me it just transparently sounds pretentious and way off mark, since, like I said, you have zero data about my abilities or output.

> I'd recommend learning a bit more and practicing some humility and curiosity before condemning an entire class of engineers

You're clearly coming from an emotional place because you feel slighted. There is no 'class of engineers' in my evaluation. I recommend reading comments more closely, thinking about their content, and not getting offended when someone points out signs of lacking skills, because you might just be advertising your own limitations.

[-]

soulofmischief 9 months ago

> Funny you should make claims about my skills when you have exactly zero data about my abilities or performance.

Didn't you just do that to an entire class of engineers:

> Claiming LLMs are a massive boost for coding productivity is becoming a red flag that the claimant has a tenuous grasp on the skills necessary

Anyway,

> Evaluating my skills based on how I evaluated someone else's skills when they tell me about their abilities with and without a crutch

Your argument rests on the assumption that LLMs are a "crutch", and you're going to have to prove that before the rest of your argument holds any water.

It sucks getting generalized, doesn't it? Feels ostracizing? That's the exact experience someone who productively and effectively uses LLMs will have upon encountering your premature judgement.

> You're clearly coming from an emotional place because you feel slighted.

You start off your post upset that I'm "making claims" about your skills (I used the word "maybe" intentionally, multiple times), and then turn around and make a pretty intense claim about me. I'm not "clearly" coming from an emotional place, you did not "trigger" me, I took a moment to educate you about being overly judgemental before fully understanding something, and pointed out the inherent hypocrisy.

> you might just be advertising your own limitations

But apparently my approach was ineffective, and you are still perceiving a world where people who approach their work differently than you are inferior. Your toxic attitude is unproductive, and while you're busy imagining yourself as some masterful engineer, people are out there getting massive productivity boosts with careful application of cutting-edge generative technologies. LLMs have been nothing short of transcendental to a curious but skilled mind.

[-]

Ygg2 9 months ago

> Didn't you just do that to an entire class of engineers

Not really. He said "if you claim LLM's are next thing since sliced butter I am doubting your abilities". Which is fair. It's not really a class as much as a group.

I've never been wowed over by LLMs. At best they are boilerplate enhancers. At worst they write plausibly looking bullshit that compiles but breaks everything. Give it something truly novel and/or fringe and it will fold like a deck of cards.

Even latest research called LLM's benefits into question: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566

That said. They are fine at generating commit messages and docs than me.

[-]

soulofmischief 9 months ago

> Not really. He said "if you claim LLM's are next thing since sliced butter I am doubting your abilities". Which is fair.

No, OP said:

> Claiming LLMs are a massive boost for coding productivity is becoming a red flag that the claimant has a tenuous grasp on the skills necessary

Quotation marks are usually reserved for direct quotes, not paraphrases or straw mans.

> I've never been wowed over by LLMs.

Cool. I have, and many others have. I'm unsure why your experience justifies invalidating the experiences of others or supporting prejudice against people who have made good use of them.

> Give it something truly novel and/or fringe and it will fold like a deck of cards.

Few thoughts are truly novel, most are derivative or synergistic. Cutting edge LLMs, when paired with a capable human, are absolutely capable of productive work. I have long, highly technical and cross-cutting discussions with GPT 4o which I simply could not have with any human that I know. Humans like that exist, but I don't know them and so I'm making due with a very good approximation.

Your and OP's lack of imagination at the capabilities of LLMs are more telling than you realize to those intimate with them, which is what makes this all quite ironic given that it started from OP making claims about how people who say LLMs massively boost productivity are giving tells that they're not skilled enough.

[-]

Ygg2 9 months ago

> Quotation marks are usually reserved for direct quotes,

Not on HN. Customary is to use > paragraph quotes like you did. However I will keep that in mind.

> Cool. I have, and many others have. I'm unsure why your experience justifies invalidating the experiences of others

If we're both grading a single student (LLM) in same field (programming), and you find it great and I find it disappointing, it means one of us is scoring it wrong.

I gave papers that demonstrate its failings, where is your counter-proof?

> Your and OP's lack of imagination at the capabilities of LLMs

It's not lack of imagination. It's terribleness of results. It can't consistently write good doc comments. I does not understand the code nor it's purpose, but roughly guesses the shape. Which is fine for writing something that's not as formal as code.

It can't read and understand specifications, and even generate something as simple as useful API for it. The novel part doesn't have to be that novel just something out of its learned corpus.

Like Yaml parser in Rust. Maybe Zig or something beyond it's gobbled data repo.

> Few thoughts are truly novel, most are derivative or synergistic.

Sure but you still need A mind to derive/synergize the noise of everyday environment into something novel.

It can't even do that but remix data into plausibly looking forms. A stochastic parrot. Great for DnD campaign. Shit for code.

[-]

soulofmischief 9 months ago

> Not on HN. Customary is to use > paragraph quotes like you did. However I will keep that in mind.

Hacker News is not some strange place where the normal rules of discourse don't apply. I assume you are familiar with the function of quotation marks.

> If we're both grading a single student (LLM) in same field (programming), and you find it great and I find it disappointing, it means one of us is scoring it wrong.

No, it means we have different criteria and general capability for evaluating the LLM. There are plenty of standard criteria which LLMs are pitted against, and we have seen continued improvement since their inception.

> It can't consistently write good doc comments. I does not understand the code nor it's purpose, but roughly guesses the shape.

Writing good documentation is certainly a challenging task. Experience has led me to understand where current LLMs typically do and don't succeed with writing tests and documentation. Generally, the more organized and straightforward the code, the better. The smaller each module is, the higher the likelihood of a good first pass. And then you can fix deficiencies in a second, manual pass. If done right, it's generally faster than not making use of LLMs for typical workflows. Accuracy also goes down for more niche subject material. All tools have limitations, and understanding them is crucial to using them effectively.

> It can't read and understand specifications, and even generate something as simple as useful API for it.

Actually, I do this all the time and it works great. Keep practicing!

In general, the stochastic parrot argument is oft-repeated but fails to recognize the general capabilities of machine learning. We're not talking about basic Markov chains, here. There are literally academic benchmarks against which transformers have blown away all initial expectations, and they continue to incrementally improve. Getting caught up criticizing the crudeness of a new, revolutionary tool is definitely my idea of unimaginative.

[-]

Ygg2 9 months ago

> Hacker News is not some strange place where the normal rules of discourse don't apply. I assume you are familiar with the function of quotation marks.

Language is all about context. I wasn't trying to be deceitful. And on HN I've never seen anyone using quotation marks to quote people.

> Writing good documentation is certainly a challenging task.

Doctests isn't same as writing documentation. Doctest are the simplest form of documentation. Given function named so and so write API doc + example. It could not even write example that passed syntax check.

> Actually, I do this all the time and it works great. Keep practicing!

Then you haven't given it interesting/complex enough problems.

Also this isn't about practice. It's about its capabilities.

> In general, the stochastic parrot argument is oft-repeated but fails to recognize the general capabilities of machine learning.

I gave it write YAML parser given Yaml org spec, and it wrote following struct:

   enum Yaml {
      Scalar(String),
      List(Vec<Box<Yaml>>),
      Map(HashMap<String, Box<Yaml>>),
   }

This is the stochastic parrot in action. Why? Because it tried to pass of JSON like structure as YAML.

Whatever LLM's are they aren't intelligent. Or they have attention spans of a fruit fly and can't figure out basic differences.

[-]

williamcotton 9 months ago

That’s not a good prompt, my friend!

soulofmischief 9 months ago

> Language is all about context. I wasn't trying to be deceitful. And on HN I've never seen anyone using quotation marks to quote people.

It's still unclear how this apparent lack of knowledge of basic writing mechanics would justify your use of quotation marks to attempt a straw man argument wherein you deliberately attempted to convince me that OP said something completely different.

> Doctests isn't same as writing documentation. Doctest are the simplest form of documentation. Given function named so and so write API doc + example. It could not even write example that passed syntax check.

That truly sounds like a skill issue. This no-true-Scotsman angle is silly. I said documentation and tests, I don't know how you got "doctests" out of that. I said "documentation", and "tests". I didn't say "the simplest form of documentation", that is another straw man on your behalf.

> Then you haven't given it interesting/complex enough problems.

Wow, the arrogance. There is absolutely nothing to justify this assumption. It's exceedingly likely that you yourself aren't capable of interacting meaningfully with LLMs for one reason or another, not that I haven't considered interesting or complex problems. I bring some extraordinarily difficult cross-domain problems to these tools and end up satisfied with the results far more often than not.

My argument is literally that cutting-edge LLMs excel with complex problems, and they do in many cases in the right hands. It's unfortunate if you can't find these problems "interesting" enough, but that hasn't stopped me from getting good enough results to justify using an LLM during research and development.

> Also this isn't about practice. It's about its capabilities.

Unfortunately, this discourse has made it clear that you do need considerable practice, because you seem to get bad results, and you're more interested in defending those bad results even if it means insulting others, instead of just considering that you might not quite be skilled enough.

> This is the stochastic parrot in action. Why? Because it tried to pass of JSON like structure as YAML.

That proves its stochasticity, but it doesn't prove it is a "stochastic parrot". As long as you lack the capability to realistically assess these models, it's no wonder that you've had such bad experiences. You didn't even bother clarifying which LLM you used, nor did you mention any parameters of your experiment or even if you attempted multiple trials with different LLMs or prompts. You failed to follow the scientific method and so it's no surprise that you got subpar results.

> Whatever LLM's are they aren't intelligent.

You have demonstrated throughout this discussion that you aren't capable of assessing machine intelligence. If you learned how to be more open-minded and took the time to learn more about these new technologies, instead of complaining about contemporary shortcomings and bashing those who do benefit from the technologies, it would likely open many doors for you.

[-]

Ygg2 9 months ago

> That truly sounds like a skill issue. This no-true-Scotsman angle is silly. I said documentation and tests, I don't know how you got "doctests" out of that. I said "documentation", and "tests". I didn't say "the simplest form of documentation", that is another straw man on your behalf.

What are you on about? Doctest is the simplest form of documentation and test. I.e. you don't have to write an in-depth test, you just need to understand what the function does. I expect even juniors can write a doctest that passes the compiler check. Not a good, not a passing one doctest, a COMPILING one. It's rate of writing a passing one was even worse.

> Wow, the arrogance. There is absolutely nothing to justify this assumption.

Ok, then. Prove what exactly hard problems did you give it?

I gave my examples, I noticed it fails at complex tasks like YAML parser in an unknown language.

I noticed when confronted with anything harder than writing pure boilerplate, it fails. E.g. it would fail 10% of the time.

> Unfortunately, this discourse has made it clear that you do need considerable practice

You can practice with a stochastic parrot all you want, it won't make it an Einstein. Programming is all about converting requirements to math, and LLMs aren't good at it. Do I need to link stuff like doing basic calculation and counting 'r' in the word 'strawberries'.

The best you can do is half the error rate, but that follows a power law. You need to double the energy to half the error rate. So unless you intend to boil the surface of the Earth to get it to be decent at programming, I don't think it's going to change anytime soon.

> You have demonstrated throughout this discussion that you aren't capable of assessing machine intelligence.

Pure ad hominem. You've demonstrated nothing outside your ""Trust me bro, it's not a bubble"" and ""You're wrong"". I'm using double double quotes so you don't assume I'm quoting you.

> You didn't even bother clarifying which LLM you used.

For YAML parser, I used Chat GPT-4o at my friend's place. For the rest of the tasks I used JetBrains AI assistant, which is a mix of Chat GPT-4, GPT-4o and GPT-3.

[-]

8 months ago

[deleted]

rockskon 9 months ago

I don't know about you but LLMs spit out garbage nonsense frequent enough that I can't trust their output in any context I cannot personally verify the validity of.

[-]

ab5tract 9 months ago

Same here. The Cody autocomplete is so off base all the time that it’s deactivated.

I serve Cody a direct question about 1-3 times a week. Of that, it gets maybe 50% correct on the first try. I don’t bother with a second try because by then I’ve already spent the equivalent amount of time looking at the relevant library source code and/or docs would have taken.

framapotari 9 months ago

If you're already a competent developer, I think that's a reasonable expectation of impact on productivity. I think the 'life-changing' part comes in helping someone get to the point of building things with code where before they couldn't (or believed they couldn't). It does a lot better job of turning the enthusiasts and code-curious into amateurs vs. empowering professionals.

[-]

jimmaswell 9 months ago

> turning the enthusiasts and code-curious into amateurs vs. empowering professionals.

I'm firmly in #2. My other comment goes over how.

I'm intrigued to see how devs in #1 grow. One might be wary those devs would grow into bad habits and not thinking for themselves, but it might be a case of the ancient Greek rant against written books hindering memorization. Could be that they'll actually grow to be even better devs unburdened by time wasted on trivial details.

datavirtue 9 months ago

If you are in maintenance mode your visits to Copilot will be rare. If you are building greenfield, use goes through the roof. All those test cases, nevermind all the POC and framework scaffolding and other boilerplate that is now completely unacceptable as a use of developer time.

[-]

recursive 9 months ago

I'm building "greenfield". I still use it at least daily, but the benefit just struggles to outweigh the cost of invoking it. Maybe I don't understand how to use it.

[-]

datavirtue 9 months ago

It really depends on what you are doing and what tech you are using. I use it to teach me or build out ideas quickly or solve for complex issues. Mostly these days I use it as a memory aid or to bounce ideas off it. In my job I have to move quickly and stay focused as I'm driving improvements to a tech stack that reaches across four verticals, each having their own quirks and tech stacks. It's great for jogging my memory and helping to flush out ideas and approaches that I then bounce off the dev teams. Super helpful.

giraffe_lady 9 months ago

"Even just in programming" the jury is still out. None of my coworkers using these are noticeably more productive than the ones who don't. Outside of programming no one gives a shit except scammers and hype chasers.

[-]

JonChesterfield 9 months ago

The people writing articles for journals that aggregate and approximate other sources are in mortal terror of LLMs. Likewise graphic designers and anyone working in (human language) translation.

I don't fear that LLMs are going to take my job as a developer. I'm pretty sure they mark a further decrease in the quality and coherence of software, along with a rapid increase in the quantity of code out there, and that seems likely to provide me with reliable employment forever. I'm basically employed in fixing bugs that didn't need to exist in the first place and that seems to cover a lot of software dev.

[-]

giraffe_lady 9 months ago

They're not scared of LLMs because of anything about LLMs. It's just that everyone with power is publicly horny to delete the remaining middle class jobs and are happy to use LLMs as a justification whether it can functionally replace those workers or not. So it's not that everyone has evaluated chatgpt and cannily realized it can do their job, they're just reading the room.

wruza 9 months ago

Are you sure it’s a life changing productivity booster? Sometimes I look at my projects and wonder how would I explain it to an LLM what this code should have done if it didn’t exist yet. Must be a shitton of boilerplate programming for copilot to be a life-changing experience.

[-]

AYBABTME 9 months ago

You haven't used them enough. Everytime an LLM reduces my search from 1min to 5s, the LLM pays.

Just summary features: save me 20min of reading a transcript, turn it into 20s. That's a huge enabler.

[-]

wruza 9 months ago

Overviews aren’t code though. In code, for me, they don’t pass 80/20 tests well enough, sometimes even on simple cases. (You get 50-80% of an existing function/block with some important context prepended and a comment, let it write the rest and check if it succeeds). It doesn’t mean that LLMs are useless. Or that I am antillamist or a denier - I’m actually an enthusiast. But this specific claim I hear often and don’t find true. Maybe true for repetitive code in boring environments where typing and remembering formats/params over and over is the main issue. Not in actual code.

If I paste the actual non-trivial code, it starts deviating fast. And it isn’t too complex, it’s just less like “parallel sort two arrays” and more like “wait for an image on a screenshot by execing scrot (with no sound) repeatedly and passing the result to this detect-cv2.py script and use all matching options described in this ts type, get stdout json as in this ts type, and if there’s a match, wait for the specified anim timeout and test again to get the settled match coords after an animation finishes; throw after a total timeout”. Not a rocket science, pretty dumb shit, but right there they fall flat and start imagining things, heavily.

I guess it shines if you ask it to make an html form, but I couldn’t call that life-changing unless I had to make these damn forms all day.

mecsred 9 months ago

If 20 mins of informations can legitimately be condensed into 20 seconds, it sounds like the original wasn't worth reading in the first place. Could have skipped the llm entirely.

[-]

bostik 9 months ago

I upvoted you, because I think you have a valid point. The tone is unnecessarily aggressive though.

Effective and information-dense communication is really hard. That doesn't mean we should just accept the useless fluff surrounding the actual information and/or analysis. People could learn a lot from the Ignoble Prize ceremony's 24/7 presentation model.

Sadly, it seems we are heading towards a future where you may need an LLM to distill the relevant information out of a sea of noise.

[-]

mecsred 9 months ago

Didn't intend for it to be aggressive, just concise. Spare me from the llm please :)

crazygringo 9 months ago

> it sounds like the original wasn't worth reading in the first place

But if that's the only place that contained the information you needed, then you have no choice.

There's a lot of material out there that is badly written, badly organized, badly presented. LLM's can be a godsend for extracting the information you actually need without wasting 20 minutes wading through the muck.

[-]

mecsred 9 months ago

Yeah I can see that use case, I just wouldn't trust an LLM to decide "is this worth reading". May as well flip a coin.

AYBABTME 9 months ago

Think of the summary of a zoom call. Or of a chapter that you're not sure if you care to read or not.

Not all content is worth consuming, and not all content is dense.

[-]

postalrat 9 months ago

If I had a recording of the zoom call I could generate a summary on demand with better tools than were available at the time the zoom call was made.

andrei_says_ 9 months ago

My experience with overviews is that they are often subtly or not so subtly inaccurate. LLMs not understanding meaning or intent carries risk of misrepresentation.

haakonhr 9 months ago

And here you're assuming that making software engineers more productive would be a service to the world. I think the jury is out on that one as well. At least for the majority of software engineering since 2010.

vrighter 9 months ago

actually, studies seem to show it makes code worse. Just like llms can confidently spout junk, devs using llms confidently check in more bugs.

BolexNOLA 9 months ago

“A bit”?

[-]

bartread 9 months ago

Yes, a large amount for - in the grand scheme of things - a short period of time (i.e., a quantity of energy usage in an intense spike that will be dwarfed by energy usage over time) can accurately be described as “a bit”.

Of course, the impact is that AI will continue to become cheaper to use, and induced demand will continue the feedback loop driving the market as a result.

michaelmrose 9 months ago

This comment lives in a fictional world where there is a singular group that could have collectively acted counterfactually. In the real world any actor that individually went this route would have gone bankrupt while the others collected money by showing actual results even if ineffeciently earned.

[-]

newyankee 9 months ago

Also it is likely that the rise of LLMs gave many researchers in allied fields the impetus to tackle with the problems that are relevant to making it more efficient and people stumbled upon a solution hiding there.

The momentum with LLMs and allied technology may last till it keeps on improving even by a few percentage points and keeps shattering human created new benchmarks every few months

Scene_Cast2 9 months ago

This is a bit like recommending to skip vacuum tubes, think hard and invent transistors.

[-]

fuzzfactor 9 months ago

This is kind of thought-provoking.

That is a good correlation when you think about how much more energy-efficient transistors are than vacuum tubes.

Vacuum tube computers were a thing for a while, but it was more out of desperation than systematic intellectual progress.

OTOH you could look at the present accomplishments like it was throwing more vacuum tubes at a problem that can not be adequately addressed that way.

What turned out to be a solid-state solution was a completely different approach from the ground up.

To the extent a more power-saving technique using the same hardware is only a matter of different software approaches, that would be something that realistically could have been accomplished before so much energy was expended.

Even though I've always thought application-specific circuits would be what really helps ML and AI a lot, and that would end up not being the exact same hardware at all.

If power is truly being wasted enough to start rearing its ugly head, somebody should be able to figure out how to fix it before it gets out-of-hand.

Ironically enough with my experience using vacuum tubes, I've felt that there were some serious losses in technology when the research momentum involved was so rapidly abandoned in favor of "solid-state everything" at any cost.

Maybe it is a good idea to abandon the energy-intensive approaches, as soon as anything completely different that's the least bit promising can barely be seen by a gifted visionary to have a glimmer of potential.

VagabundoP 9 months ago

That's just not how progress works.

Its iteritive, there are plenty of cul-de-sacs and failures. You can't really optimise until you have something that works and its a messy process that is inefficient.

You're looking at this with hindsight.

pcl 9 months ago

Isn’t this paper pretty much about spending a few short years to improve the performance? Or are you arguing that the same people who made breakthroughs over the last few years should have also done the optimization work?

[-]

fuzzfactor 9 months ago

>the same people who made breakthroughs over the last few years should have also done the optimization work

I never thought it would be ideal if it was otherwise, so I guess so.

When I first considered neural nets from state-of-the art vendors to assist with some non-linguistic situations over 30 years ago, it wasn't quite ready for prime time and I could accept that.

I just don't have generic situations all the time which would benefit me, so it's clearly my problems that have the deficiencies ;\

What's being done now with all the resources being thrown at it is highly impressive, and gaining all the time, no doubt about it. It's nice to know there are people that can afford it.

I truly look forward to more progress, and this may be the previously unreached milestone I have been detecting that might be a big one.

Still not good enough for what I need yet so far though. And I can accept that as easily as ever.

That's why I put up my estimation that not all of those 30+ years has been spent without agonizing over something ;)

pnt12 9 months ago

The GPU main advantage is its parallelism - thousands of cores compared handful of cores in CPUs.

If you're training models with billions of parameters, you're still gonna need that.

etcd 9 months ago

I feel like I have seen this idea a few times but don't recall where but stuff posted via HN.

Here https://news.ycombinator.com/item?id=41784591 but even before that. It is possibly one of those obvious ideas to people steeped in this.

To me intuitively using floats to make ultimatelty boolean like decisions seems wasteful but that seemed like the way it had to be to have diffetentiable algorithms.

yogrish 9 months ago

we used to use Fixed point multiplications (Q Format) in DSP algorithms on different DSP architectures. https://en.wikipedia.org/wiki/Q_(number_format). They used to be so fast and near accurate to floating point multiplications. Probably we need to use those DSPs blocks as part of Tensors/GPUs to realise both fast multiplications & parallelisms.

mvkel 9 months ago

Is this effectively quantizing without actually quantizing?

jart 9 months ago

It's a very crude approximation, e.g. 1.75 * 2.5 == 3 (although it seems better as the numbers get closer to 0).

I tried implementing this for AVX512 with tinyBLAS in llamafile.

    inline __m512 lmul512(__m512 x, __m512 y) {
        __m512i sign_mask = _mm512_set1_epi32(0x80000000);
        __m512i exp_mask = _mm512_set1_epi32(0x7F800000);
        __m512i mant_mask = _mm512_set1_epi32(0x007FFFFF);
        __m512i exp_bias = _mm512_set1_epi32(127);
        __m512i x_bits = _mm512_castps_si512(x);
        __m512i y_bits = _mm512_castps_si512(y);
        __m512i sign_x = _mm512_and_si512(x_bits, sign_mask);
        __m512i sign_y = _mm512_and_si512(y_bits, sign_mask);
        __m512i exp_x = _mm512_srli_epi32(_mm512_and_si512(x_bits, exp_mask), 23);
        __m512i exp_y = _mm512_srli_epi32(_mm512_and_si512(y_bits, exp_mask), 23);
        __m512i mant_x = _mm512_and_si512(x_bits, mant_mask);
        __m512i mant_y = _mm512_and_si512(y_bits, mant_mask);
        __m512i sign_result = _mm512_xor_si512(sign_x, sign_y);
        __m512i exp_result = _mm512_sub_epi32(_mm512_add_epi32(exp_x, exp_y), exp_bias);
        __m512i mant_result = _mm512_srli_epi32(_mm512_add_epi32(mant_x, mant_y), 1);
        __m512i result_bits = _mm512_or_si512(
            _mm512_or_si512(sign_result, _mm512_slli_epi32(exp_result, 23)), mant_result);
        return _mm512_castsi512_ps(result_bits);
    }

Then I used it for Llama-3.2-3B-Instruct.F16.gguf and it outputted jibberish. So you would probably have to train and design your model specifically to use this multiplication approximation in order for it to work. Or maybe I'd have to tune the model so that only certain layers and/or operations use the approximation. However the speed was decent. Prefill only dropped from 850 tokens per second to 200 tok/sec on my threadripper. Prediction speed was totally unaffected, staying at 34 tok/sec. I like how the code above generates vpternlog ops. So if anyone ever designs an LLM architecture and releases weights on Hugging Face that use this algorithm, we'll be able to run them reasonably fast without special hardware.

[-]

raluk 9 months ago

Your kernel seems to be incorrect for 1.75 * 2.5. From paper we have 1.75 == (1+0.75)*2^0 for 2.5 == (1+0.25)*2^1 so result is (1+0.75+0.25+2^-4)*2^1 == 4.125 (correct result is 4.375)

[-]

raluk 9 months ago

Extra. I am not sure if that is clear from paper, but in example of 1.75 * 2.5 we can represent 1.75 also as (1-0.125) * 2. This gives good aproximations for numbers that are close but less than power of 2. This way abs(a*b) in (1+a)*(1+b) is allways small and strictly less than 0.25.

Another example, if we have for example 1.9 * 1.9 then we need to account for overflow in (0.9 + 0.9) and this seems to induce similar overhead as expressing numbers as (1-0.05)*2 .

kayo_20211030 9 months ago

Extraordinary claims require extraordinary evidence. Maybe it's possible, but consider that some really smart people, in many different groups, have been working diligently in this space for quite a while; so claims of 95% savings on energy costs _with equivalent performance_ is in the extraordinary category. Of course, we'll see when the tide goes out.

[-]

manquer 9 months ago

It is a click bait headline the claim itself is not extraordinary. the preprint from arxiv was posted here some time back .

The 95% gains is specifically only for multiplication operations, inference is compute light and memory heavy in the first place so the actual gains would be far less smaller .

Tech journalism (all journalism really) can hardly be trusted to publish grounded news with the focus on clicks and revenue they need to survive.

[-]

ksec 9 months ago

>Tech journalism (all journalism really) can hardly be trusted to publish grounded news with the focus on clicks and revenue they need to survive.

Right now the only way to gain real knowledge is actually to read comments of those articles.

kayo_20211030 9 months ago

Thank you. That makes sense.

rob_c 9 months ago

Bingo,

We have a winner. Glad that came from someone not in my lectures on ML network design

Honestly, thanks for beeting me to this comment

throwawaymaths 9 months ago

I don't think this claim is extraordinary. Nothing proposed is mathematically impossible or even unlikely, just a pain in the ass to test (lots of retraining, fine tuning etc, and those operations are expensive when you dont have already massively parallel hardware available, otherwise you're ASIC/FPGAing for something with a huge investment risk)

If I could have a SWAG at it I would say a low resolution model like llama-2 would probably be just fine (llama-2 quantizes without too much headache) but a higher resolution model like llama-3 probably not so much, not without massive retraining anyways.

Randor 9 months ago

The energy claims up to ~70% can be verified. The inference implementation is here:

https://github.com/microsoft/BitNet

[-]

kayo_20211030 9 months ago

I'm not an AI person, in any technical sense. The savings being claimed, and I assume verified, are on ARM and x86 chips. The piece doesn't mention swapping mult to add, and a 1-bit LLM is, well, a 1-bit LLM.

Also,

> Additionally, it reduces energy consumption by 55.4% to 70.0%

With humility, I don't know what that means. It seems like some dubious math with percentages.

[-]

sroussey 9 months ago

Not every instruction on a CPU or GPU uses the same amount of power. So if you could rewrite your algorithm to use more power efficient instructions (even if you technically use more of them), you can save overall power draw.

That said, time to market has been more important than any cares of efficiency for some time. Now and in the future, there is more of a focus on it as the expenses in equipment and power have really grown.

Randor 9 months ago

> I don't know what that means. It seems like some dubious math with percentages.

I would start by downloading a 1.58 model such as: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens

Run the non-quantized version of the model on your 3090/4090 gpu and observe the power draw. Then load the 1.58 model and observe the power usage. Sure, the numbers have a wide range because there are many gpu/npu to make the comparison.

[-]

kayo_20211030 9 months ago

Good one!

littlestymaar 9 months ago

How does the liked article relate to BitNet at all? It's about the “addition is all you need” paper which AFAIK is unrelated.

[-]

Randor 9 months ago

Yeah, I get what you're saying but both are challenging the current MatMul methods. The L-Mul paper claims "a power savings of 95%" and that is the thread topic. Bitnet proves that at least 70% is possible by getting rid of MatMul.

vlovich123 9 months ago

They’ve been working on unrelated problems like structure of the network or how to build networks with better results. There have been people working on improving the efficiency of the low-level math operations and this is the culmination of those groups. Figuring this stuff out isn’t super easy.

kayo_20211030 9 months ago

re: all above/below comments. It's still an extraordinary claim.

I'm not claiming it's not possible, nor am I claiming that it's not true, or, at least, honest.

But, there will need to be evidence that using real machines, and using real energy an _equivalent performance_ is achievable. A defense that "there are no suitable chips" is a bit disingenuous. If the 95% savings actually has legs some smart chip manufacturer will do the math and make the chips. If it's correct, that chip making firm will make a fortune. If it's not, they won't.

[-]

throwawaymaths 9 months ago

> If the 95% savings actually has legs some smart chip manufacturer will do the math and make the chips

Terrible logic. By a similar logic we wouldn't be using python for machine learning at all, for example (or x86 for compute). Yet here we are.

[-]

kayo_20211030 9 months ago

What's wrong with the logic? A caveat in the paper is that the technique will save 95% energy but that the technique will not run efficiently on current chips. I'm saying that if the new technique needs new chips and saves 95% of energy costs with the same performance, someone will make the chips. I say nothing about how and why we do ML as we do today - the 100% energy usage level.

[-]

throwawaymaths 9 months ago

It's Terrible logic because it doesn't take into account the way this industry works. We don't do things because they are better. We do things because we can convince investors, because it's hirable, because we don't want to learn something new, because we're afraid our built up knowledge base is going to become obsolete, so we pull more people into our technical debt, etc.

stefan_ 9 months ago

I mean, all these smart people would rather pay NVIDIA all their money than make AMD viable. And yet they tell us its all MatMul.

[-]

dotnet00 9 months ago

It's not their job to make AMD viable, it's AMD's job to make AMD viable. NVIDIA didn't get their position for free, they spent a decade refining CUDA and its tooling before GPU-based crypto and AI kicked off.

kayo_20211030 9 months ago

Both companies are doing pretty well. Why don't you think AMD is viable?

[-]

nelup20 9 months ago

AMD's ROCm just isn't there yet compared to Nvidia's CUDA. I tried it on Linux with my AMD GPU and couldn't get things working. AFAIK on Windows it's even worse.

[-]

mattalex 9 months ago

That entirely depends on what AMD device you look at: gaming GPUs are not well supported, but their instinct line of accelerators works just as well as cuda. keep in mind that, in contrast to Nvidia, AMD uses different architectures for compute and gaming (though they are changing that in the next generation)

redleader55 9 months ago

The litimus test would be if you read in the news that Amazon, Microsoft, Google or Meta just bought billions in GPUs from AMD.

They are and have been buying AMD CPUs for a while now, which says something about AMD and Intel.

[-]

JonChesterfield 9 months ago

Microsoft and Meta are running customer facing LLM workloads on AMD's graphics cards. Oracle seems to like them too. Google is doing the TPU thing with Broadcom and Amazon seems to have decided to bet on Intel (in a presumably fatal move but time will tell). We'll find some more information on the order book in a couple of weeks at earnings.

I like that the narrative has changed from "AI only runs on Cuda" to "sure it runs fine on AMD if you must"

jhj 9 months ago

As someone who has worked in this space (approximate compute) on both GPUs and in silicon in my research, the power consumption claims are completely bogus, as are the accuracy claims:

> In this section, we show that L-Mul is more precise than fp8 e4m3 multiplications

> To be concise, we do not consider the rounding to nearest even mode in both error analysis and complexity estimation for both Mul and L-Mul

These two statements together are non-sensical. Sure, if you analyze accuracy while ignoring the part of the algorithm that gives you accuracy in the baseline you can derive whatever cherry-picked result you want.

The multiplication of two floating point values if you round to nearest even will be the correctly rounded result of multiplying the original values at infinite precision, this is how floating point rounding usually works and what IEEE 754 mandates for fundamental operations if you choose to follow those guidelines (e.g., multiplication here). But not rounding to nearest even will result in a lot more quantization noise, and biased noise at that too.

> applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by elementwise floating point tensor multiplications and 80% energy cost of dot products

A good chunk of the energy cost is simply moving data between memories (especially external DRAM/HBM/whatever) and along wires, buffering values in SRAMs and flip-flops and the like. Combinational logic cost is usually not a big deal. While having a ton of fixed-function matrix multipliers does raise the cost of combinational logic quite a bit, at most what they have will probably cut the power of an overall accelerator by 10-20% or so.

> In this section, we demonstrate that L-Mul can replace tensor multiplications in the attention mechanism without any loss of performance, whereas using fp8 multiplications for the same purpose degrades inference accuracy

I may have missed it in the paper, but they have provided no details on (re)scaling and/or using higher precision accumulation for intermediate results as one would experience on an H100 for instance. Without this information, I don't trust these evaluation results either.

_aavaa_ 9 months ago

Original discussion of the preprint: https://news.ycombinator.com/item?id=41784591

[-]

codethief 9 months ago

Ahh, there it is! I was sure we had discussed this paper before.

remexre 9 months ago

Isn't this just taking advantage of "log(x) + log(y) = log(xy)"? The IEEE754 floating-point representation stores floats as sign, mantissa, and exponent -- ignore the first two (you quantitized anyway, right?), and the exponent is just an integer storing log() of the float.

[-]

mota7 9 months ago

Not quite: It's taking advantage of (1+a)(1+b) = 1 + a + b + ab. And where a and b are both small-ish, ab is really small and can just be ignored.

So it turns the (1+a)(1+b) into 1+a+b. Which is definitely not the same! But it turns out, machine guessing apparently doesn't care much about the difference.

[-]

amelius 9 months ago

You might then as well replace the multiplication by the addition in the original network. In that case you're not even approximating anything.

Am I missing something?

[-]

dotnet00 9 months ago

They're applying that simplification to the exponent bits of an 8 bit float. The range is so small that the approximation to multiplication is going to be pretty close.

tommiegannert 9 months ago

Plus the 2^-l(m) correction term.

Feels like multiplication shouldn't be needed for convergence, just monotonicity? I wonder how well it would perform if the model was actually trained the same way.

dsv3099i 9 months ago

This trick is used a ton when doing hand calculation in engineering as well. It can save a lot of work.

You're going to have tolerance on the result anyway, so what's a little more error. :)

convolvatron 9 months ago

yes. and the next question is 'ok, how do we add'

[-]

kps 9 months ago

Yes. I haven't yet read this paper to see what exactly it says is new, but I've definitely seen log-based representations under development before now. (More log-based than the regular floating-point exponent, that is. I don't actually know the argument behind the exponent-and-mantissa form that's been pretty much universal even before IEEE754, other than that it mimics decimal scientific notation.)

dietr1ch 9 months ago

I guess that if the bulk of the computation goes into the multiplications, you can work in the log-space and simply sum, and when the time comes to actually do a sum on the original space you can go back and sum.

[-]

a-loup-e 9 months ago

Not sure how well that would work if you're often adding bias after every layer

robomartin 9 months ago

I posted this about a week ago:

https://news.ycombinator.com/item?id=41816598

This has been done for decades in digital circuits, FPGA’s, Digital Signal Processing, etc. Floating point is both resource and power intensive and using FP without the use of dedicated FP processing hardware is something that has been avoided and done without for decades unless absolutely necessary.

[-]

fidotron 9 months ago

Right, the ML people are learning, slowly, about the importance of optimizing for silicon simplicity, not just reduction of symbols in linear algebra.

Their rediscovery of fixed point was bad enough but the “omg if we represent poses as quaternions everything works better” makes any game engine dev for the last 30 years explode.

ausbah 9 months ago

a lot of things in the ML research space are rebranding an old concept w a new name as “novel”

ujikoluk 9 months ago

Explain more for the uninitiated please.

[-]

robomartin 9 months ago

Not sure there's much to explain. Using integers for math in digital circuits is far more resource and computationally efficient than floating-point math. It has been decades since I did the math on the difference. I'll just guess that it could easily be an order of magnitude better across both metrics.

At basic level it is very simple: A 10 bit bus gives you the ability to represent numbers between 0 and 1 with a resolution of approximately 0.001. 12 bits would be four times better. Integer circuits can do the math in one clock cycle. Hardware multipliers do the same. To rescale the numbers after multiplication you just take the N high bits, where N is your bus width; which is a zero clock-cycle operation. Etc.

In training a neural network, the back propagation math can be implemented using almost the same logic used for a polyphase FIR filter.

didgetmaster 9 months ago

Maybe I am just a natural skeptic, but whenever I see a headline that says 'method x reduces y by z%'; but when you read the text it instead says that optimizing some step 'could potentially reduce y by up to z%'; I am suspicious.

Why not publish some actual benchmarks that prove your claim in even a few special cases?

[-]

dragonwriter 9 months ago

Well, one, because the headline isn't from the researchers, its from a popular press report (not even the one posted here, originally, this is secondary reporting of another popular press piece) and isn't what the paper claims so it would be odd for the paper's authors to conduct benchmarks to justify it. (And, no, even the "up to 95%" isn't from the paper, the cost savings are cited per operation depending on operation and the precision the operation is conducted at, are as high as 97.3%, are based on research already done establishing the energy cost of math operations on modern compute hardware, but no end-to-end cost savings claim is made.)

And, two, because the actual energy cost savings claimed aren't even the experimental question -- the energy cost differences between various operations on modern hardware have been established in other research, the experimental issue here was whether the mathematical technique that enables using the lower energy cost operations performs competitively on output quality with existing implementations when substituted in for LLM inference.

baq 9 months ago

OTOH you have a living proof that an amazingly huge neural network can work on 20W of power, so expecting multiple orders of magnitude in power consumption reduction is not unreasonable.

[-]

etcd 9 months ago

Mitochondria are all you need.

Should be able to go more efficient as the brain has other constraints such as working at 36.7 degrees C etc.

andrewstuart 9 months ago

https://github.com/microsoft/BitNet

"The first release of bitnet.cpp is to support inference on CPUs. bitnet.cpp achieves speedups of 1.37x to 5.07x on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by 55.4% to 70.0%, further boosting overall efficiency. On x86 CPUs, speedups range from 2.37x to 6.17x with energy reductions between 71.9% to 82.2%. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. More details will be provided soon."

[-]

jdiez17 9 months ago

Damn. Seems almost too good to be true. Let’s see where this goes in two weeks.

[-]

andrewstuart 9 months ago

Intel and AMD will be extremely happy.

Nvidia will be very unhappy.

[-]

l11r 9 months ago

Their GPU will still be needed to do training. As far as I understand this will improve only interference performance and efficiency.

TheRealPomax 9 months ago

Because as disappointing as modern life is, you need clickbait headlines to drive traffic. You did the right thing by reading the article though, that's where the information is, not the title.

[-]

phtrivier 9 months ago

Fair enough, but then I want a way to penalize publishers for abusing clickbait. There is no "unread" button, and there is no way to unsubscribe to advertisement-based sites.

Even on sites that have a "Like / Don't like" button, my understanding is that clicking "Don't like" is a form of "engagement", that the suggestion algorithm are going to reward.

Give me a button that says "this article was a scam", and have the publisher give the advertisement money back. Of better yet, give the advertisement money to charity / public services / whatever.

Take a cut of the money being transfered, charge the publishers for being able to get a "clickbait free" green mark if they implement the scheme.

Track the kind of articles that generate the most clickbait-angry comment. Sell back the data.

There might a business model.

[-]

NineStarPoint 9 months ago

I doubt there’s a business model there because who is going to opt in to a scheme that loses them money?

What could work is social media giving people an easy button to block links to specific websites from appearing in their feed, or something along those lines. It’s a nice user feature, and having every clickbait article be a chance someone will choose to never see your website again could actually reign in some of the nonsense.

[-]

phtrivier 9 months ago

> I doubt there’s a business model there because who is going to opt in to a scheme that loses them money?

Agreed, of course.

In a reasonable world, that could be considered part of the basic, law mandated requirements. It would be blurry and subject to interpretation to decide what is clickbait or not, just like libel or defamation - good thing we're only a few hundred years away from someone reinventing a device to handle that, called "independent judges".

In the meantime, I suppose you would have to bring some "unreasonable" thing to it, like "brands like to have green logos on their sites to brag" ?

> What could work is social media giving people an easy button to block links to specific websites from appearing in their feed, or something along those lines.

I completely agree. It's a feature they have had the technology to implement such a thing since forever, and they've decided against it since forever.

However I wonder if that's something a browser extension could handle ? A merge of AdBlock and "saved you a click" that displays the "boring" content of the link when you hoveron a clickbaity link ?

keybored 9 months ago

Headlines: what can they do, they need that for the traffic

Reader: do the moral thing and read the article, not just the title

How is that balanced.

GistNoesis 9 months ago

Does https://en.wikipedia.org/wiki/Jevons_paradox apply in this case ?

[-]

mattxxx 9 months ago

That's interesting.

Obviously, energy cost creates a barrier to entry, so reduction of cost reduces the barrier to entry... which adds more players... which increases demand.

[-]

bicepjai 9 months ago

This is why I love HN

gosub100 9 months ago

Not necessarily a bad thing: this might give the AI charlatans enough time to actually make something useful.

narrator 9 months ago

Of course. Jevons paradox always applies.

holoduke 9 months ago

I don't think algorithms will change energy consumption. There is always max capacity needed in terms of computing. If tomorrow a new algorithm increases the performance 4 times, we will just have 4 times more computing.

Art9681 9 months ago

In the end the power consumption means the current models that are "good enough" will fit a much smaller compute budget such as edge devices. However, enthusiasts are still going to want the best hardware they can afford because inevitably, everyone will want to maximize the size and intelligence of a model they can run. So we're just going to scale. This might bring a GPT-4 level to edge devices, but we are still going to want to run what might resemble a GPT-5/6 model on the best hardware possible at the time. So don't throw away your GPU's yet. This will bring capabilities to mass market, but your high end GPU will still scale the solution n-fold and youll be able to run models with disregard to the energy savings promoted in the headline.

In other sensationalized words: "AI engineers can claim new algorithm allows them to fit GPT-5 in an RTX5090 running at 600 watts."

gcanyon 9 months ago

This isn't really the optimization I'm think about, but: given the weird and abstract nature of the functioning of ML in general and LLMs in particular, it seems reasonable to think that there might be algorithms that achieve the same, or a similar, result in an orders-of-magnitude more efficient way.

greenthrow 9 months ago

The trend of hyping up papers too early on is eroding people's faith in science due to poor journalism failing to explain that this is theoretical. The outlets that do this should pay the price but they don't, because almost every outlet does it.

panosv 9 months ago

Lemurian Labs looks like it's doing something similar: https://www.lemurianlabs.com/technology They use the Logarithmic Number System (LNS)

ein0p 9 months ago

As a rule, compute only takes less than 10% of all energy. 90% is data movement.

idiliv 9 months ago

Duplicate, posted on October 9: https://news.ycombinator.com/item?id=41784591

hello_computer 9 months ago

How does this differ from Cussen & Ullman?

https://arxiv.org/abs/2307.01415

[-]

selimthegrim 9 months ago

Cussen is an HN poster incidentally.

littlestymaar 9 months ago

Related: https://news.ycombinator.com/item?id=41784591 10 days ago

9 months ago

[deleted]

andrewstuart 9 months ago

Here is the Microsoft implementation:

https://github.com/microsoft/BitNet

syntaxing 9 months ago

I’m looking forward to Bitnet adaptation. MS just released a tool for it similar to llamacpp. Really hoping major models get retrained for it.

creativenolo 9 months ago

Simple question: if true, would power consumption stay at 100% because we’d work the algorithm harder?

I had assumed the latency etc were based on what was desirable for the use case and hardware, rather than power consumption.

asicsarecool 9 months ago

Don't assume this isn't already in place at the main AI companies

svilen_dobrev 9 months ago

i am not well versed in the math involved, but IMO if the outcome depends mostly on the differences between them numbers, as smaller-or-bigger distinction as well as their magnitudes, then exactness might not be needed. i mean, as long as the approximate "function" looks similar to the exact one, that might be good enough.

Maybe even generate a table of the approximate results and use that, in various stages? Like the way sin/cos was done 30y ago before FP coprocessors arrived

m463 9 months ago

So couldn't you design a GPU that uses or supports this algorithm to use the same power, but use bigger models, better models, or do more work?

DennisL123 9 months ago

This is a result on 8 bit numbers, right? Why not precompute all 64k possible combinations and look up the results from the table?

andrewstuart 9 months ago

The ultimate “you’re doing it wrong”.

For he sake of the climate and environment it would be nice to be true.

Bad news for Nvidia. “Sell your stock” bad.

Does it come with a demonstration?

[-]

mouse_ 9 months ago

Hypothetically, if this is true and simple as the headline implies -- AI using 95% less power doesn't mean AI will use 95% less power, it means we will do 20x more AI. As long as it's the current fad, we will throw as much power and resources at this as we can physically produce, because our economy depends on constant, accelerating growth.

[-]

etcd 9 months ago

True. A laptop power pack wattage is probably pretty much unchanged over 30 years for example.

Dylan16807 9 months ago

Bad news for Nvidia how? Even ignoring that the power savings are only on one type of instruction, 20x less power doesn't mean it runs 20x faster. You still need big fat GPUs.

If this increases integer demand and decreases floating point demand, that moderately changes future product design and doesn't do much else.

talldayo 9 months ago

> Bad news for Nvidia. “Sell your stock” bad.

People say this but then the fastest and most-used implementation of these optimizations is always written in CUDA. If this turns out to not be a hoax, I wouldn't be surprised to see Nvidia prices jump in correlation.

Nasrudith 9 months ago

Wouldn't reduced power consumption for an unfulfilled demand mean more demand for Nvida as they now need more chips to max out amount of power usage to capacity? (As concentration tends to be the more efficient way.)

faragon 9 months ago

Before reading the article I was expecting using 1-bit instead of bfloats, and using logical operators instead of arithmetic.

DrNosferatu 9 months ago

Why they don’t implement the algorithm in a FPGA to compare with a classical baseline?

Wheatman 9 months ago

Isnt 90% of the enrgy spent moving bytes around? Why would this have such a great affect?

m3kw9 9 months ago

This sounds similar to someone saying room temp super conductor was discovered

tartakovsky 9 months ago

original paper: https://news.ycombinator.com/item?id=41784591

DesiLurker 9 months ago

validity of the claim aside, why dont they say reduces by 20 times instead of 95%. its much better perspective of a fraction when fraction is tiny.

9 months ago

[deleted]

nprateem 9 months ago

Is it the one where you delete 95% of user accounts?

neuroelectron 9 months ago

Nobody is interested in this because nobody wants less capex.

quantadev 9 months ago

I wonder if someone has feed this entire "problem" into the latest Chat GPT-01 (the new model with reasoning capability), and just fed it in all the code for a Multilayer Perceptron and then given it the task/prompt of finding ways to implement the same network using only integer operations.

Surely even the OpenAI devs must have done this like the minute they got done training that model, right? I wonder if they'd even admit it was an AI that came up with the solution rather than just publishing it, and taking credit. haha.

[-]

chx 9 months ago

You are imaging LLMs are capable of much more than they actually are. Here's the only thing they are good for.

https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.

[-]

quantadev 9 months ago

No, I'm not imagining things. You are, however, imaging (incorrectly) that I'm not an expert with AI who's already seen superhuman performance out of LLM prompts in the vast majority of every software development question I've ever asked them, starting all the way back at GPT-3.5.

[-]

dotnet00 9 months ago

So, where are all of your world changing innovations driven by these superhuman capabilities?

[-]

quantadev 9 months ago

You raise a great point. Maybe I should be asking the AI for more career advice or new product ideas, rather than just letting it merely solve each specific coding challenge.