The Trinary Dream Endures

(robinsloan.com)

43 points | by FromTheArchives 10 hours ago ago

55 comments

Transistors are generally at their lowest static power dissipation if the are either fully on or off. The analog middle is great if you're trying to process continuous values, but then you're going to be forced to use a bias current to hold on in the middle, which is ok if that's the nature of the circuit.

A chip with billions of transistors can't reasonably work if most of them are in the analog mode, it'll just melt to slag, unless you have an amazing cooling system.

Also consider that there is only one threshold between values on a binary system. With a trinary system you would likely have to double the power supply voltage, and thus quadruple the power required just to maintain noise margins.

[-]

throw10920 7 hours ago

This is great point, and I'll extend it by claiming that there's a more general physical principle underneath: that it's significantly easier to build bistable systems than tristable (or higher) systems, so much so that it makes up for the fact that you need more of them.

This is far more general than electronic systems (e.g. quantum computers follow the same principle - it's far easier to build and control qubits than qutrits/qudits).

(technically, it's even easier to build systems that have a single stable configuration, but you can't really store information in those, so they're not relevant)

rini17 6 hours ago

It can be solved various ways, not only middle, electricity has negative voltages too. So you can have the third distinct "fully on" state at negative voltage. This isn't practical with silicon semiconductors but might be possible with other technology. The Soviet ternary computer Setun used custom ternary switches.

[-]

theamk 3 hours ago

there is nothing special about negative voltages, it's all relative to some point anyway.

With mixed analog/digital circuits for example, it's pretty common to treat exactly same voltages either as -2.5/0/2.5 (relative to midpoint), or as 0/2.5/5 (relative to negative rail).

What matters is having multiple treshold voltages with distinct behaviour. Setun used ferrite transformers which do have multiple thresholds (postive and negative fields) - but modern electronics, including transistors, does not.

pezezin 2 hours ago

It is perfectly viable with silicon. The venerable Fast Ethernet used PAM3, as do USB4 and GDDR7, and Gigabit Ethernet uses PAM5.

[-]

mikewarot 12 minutes ago

Those are analog systems, and thus you have to handle them with transistors operating in a linear mode, which is why there are dedicated circuits to handle the interface and translate it back into something binary as soon as possible, so that conventional logic can use the data.

Basically, every ethernet card is now a modem.

foxglacier 7 hours ago

Wouldn't you also get data loss using the linear region of transistors? The output would be have some error from the input and it would propagate through the circuit, perhaps eventually reaching on or off where it would be stuck.

bastawhiz 8 hours ago

Trinary is an efficient way of storing lots of -1/0/1 machine learning model weights. But as soon as you load it into memory, you need RAM that can store the same thing (or you're effectively losing the benefits: storage is cheap). So now you need trinary RAM, which as it turns out, isn't great for doing normal general purpose computation with. Integers and floats and boolean values don't get stored efficiently in trinary unless you toss out power of two sized values. CPU circuitry becomes more complicated to add/subtract/multiply those values. Bitwise operators in trinary become essentially impossible for the average IQ engineer to reason about. We need all new IAs, assembly languages, compilers, languages that can run efficiently without the operations that trinary machines can't perform well, etc.

So do we have special memory and CPU instructions for trinary data that lives in a special trinary address space, separate from traditional data that lives in binary address space? No, the juice isn't worth the squeeze. There's no compelling evidence this would make anything better overall: faster, smaller, more energy efficient. Every improvement that trinary potentially offers results in having to throw babies out with the bathwater. It's fun to think about I guess, but I'd bet real money that in 50 years we're still having the same conversation about trinary.

zellyn 2 hours ago

I once tried to start enumerating gate types for Trinary.

In binary, with two inputs, there are 2^2 = 4 total possible inputs (00, 01, 10, 11). Different gate types can give different outputs for each of those four inputs: each output can be 0 or 1, so that's 2^4 == 16 different possible gate types. (0, 1, A, B, not A, not B, AND, OR, NAND, NOR, XOR, XNOR, A and not B, B and not A, A or not B, B or not A)

In ternary, with two inputs, there are 3^2 = 9 total possible inputs, so 3^9 = 19,683. I'm sure there are some really sensible ones in there, but damn that's a huge search space. That's where I gave up that time around! :-)

bee_rider 9 hours ago

> Trinary didn’t make any headway in the 20th century; binary’s direct mapping to the “on”/”off” states of electric current was just too effective, or seductive; but remember that electric current isn’t actually “on” or “off”. It has taken a ton of engineering to “simulate” those abstract states in real, physical circuits, especially as they have gotten smaller and smaller.

But, I think things are actually trending the other way, right? You just slam the voltage to “on” or “off” nowadays—as things get smaller, voltages get lower, and clock times get faster, it gets harder to resolve the tiny voltage differences.

Maybe you can slam to -1. OTOH, just using 2 bits instead of one... trit(?) seems easier.

Same reason the “close window” button is in the corner. Hitting a particular spot requires precision in 1 or 2 dimensions. Smacking into the boundary is easy.

[-]

hinkley 8 hours ago

The lower voltage helps reduce leakage and capacitance in the chip as the wires get closer together.

But it does argue against more states due to the benefits of just making 1 smaller if you can and packing things closer. Though maybe we are hitting the bottom with Dennard scaling being dead. Maybe we increase pitch and double state on parts of the chip, and then generations are measured by bits per angstrom.

estimator7292 9 hours ago

Once we invented CMOS this problem pretty much went away. You can indeed just slam the transistor open and closed.

Well, until we scaled transistors down to the point where electrons quantum tunnel across the junction. Now they're leaky again.

[-]

amirhirsch 5 hours ago

Not quite. Leakage current in CMOS circuits became the dominant source of power consumption around the 90 nm and 65 nm nodes, long before quantum tunneling was a major factor, and often exceeded dynamic switching power. This led to the introduction of multiple threshold-voltage devices and body-biasing techniques to dynamically adjust Vt and curb static leakage.

russdill 9 hours ago

There's a ton of places in modern silicon where a voltage represents far more than just on or off. From the 16 levels of QLC to the various PAM technologies used by modern interconnects

[-]

hinkley 8 hours ago

I’ve wondered any number of times if 4 level gates would be useful to increase cache memory in CPUs. They aren’t great for logic, but how much decoding would they need to expand an L3 cache?

DiggyJohnson 8 hours ago

What is PAM in this context?

[-]

saxonww 8 hours ago

Pulse amplitude modulation

[-]

DiggyJohnson 7 hours ago

Thanks. That’s a deep rabbit hole upon initial glances to say the least

pontifier 3 hours ago

I remember reading somewhere that because Ternary computing is inherently reversible, that from an information theoretic point of view that ternary computations have a lower theoretical bound on energy usage, and as such could be a way to bypass heat dissipation problems in chips built with ultra-high density, large size, and high computational load.

I wasn't knowledgeable enough to evaluate that claim at the time, and I'm still not.

[-]

pontifier a minute ago

Here's a couple of sources that back up what I was talking about:

https://ieeexplore.ieee.org/document/9200021

https://en.wikipedia.org/wiki/Landauer%27s_principle

gyomu 8 hours ago

> Trinary is philosophically appealing because its ground-floor vocabulary isn’t “yes” and “no”, but rather: “yes”, “no”, and “maybe”. It’s probably a bit much to imagine that this architectural difference could cascade up through the layers of abstraction and tend to produce software with subtler, richer values … yet I do imagine it.

You can just have a struct { case yes; case no; case maybe; } data structure and pepper it throughout your code wherever you think it’d lead to subtler, richer software… sure, it’s not “at the hardware level” (whatever that means given today’s hardware abstractions) but that should let you demonstrate whatever proof of utility you want to demonstrate.

alphazard 7 hours ago

I've never understood the fascination here. Apparently some expression relating the number of possible symbols and the length of a message is closer to euler's number. I don't see why the product of those things is worth optimizing for. The alphabet size that works best is dictated by the storage technology, more symbols usually means it's harder to disambiguate.

2 is the smallest amount of symbols needed to encode information, and makes it the easiest to disambiguate symbols in any implementation, good enough for me.

[-]

kingstnap 6 hours ago

The idea is roughly that the effort needed to use a system is in some situations ~propto the number of symbols * the number of needed positions.

Here's a concrete example, imagine you needed to create some movable type because you are creating a printing press. And you need to represent all numbers upto 100 million.

In binary you need to make 53 pieces, in ternary 50, in octal 69 pieces, in decimal 81 and in hexadecimal 101.

[-]

alphazard 6 hours ago

> In binary you need to make 53 pieces, in ternary 50, in octal 69 pieces, in decimal 81 and in hexadecimal 101.

These numbers don't quite make sense to me. Hexadecimal should have 16 symbols, and then `log16(whatever) = message length`. I get what you're trying to say though.

That trend continues up until the symbols start looking the same and no one can read them, and now the most important problem is not a position on a tradeoff curve. It's that the system is no longer reliable.

If you wanted each letter to have the highest probability of successfully being read, you would use a grid, and shade or leave blank each grid square.

[-]

kingstnap 6 hours ago

The hex calculation is a bit like this

100 million in Hex is 5F5E100

You need 6*16 for the trailing + 5 pieces for the leading digit. If you wanted to do any number from 0 to 100 million.

patcon 6 hours ago

Maybe of interest, re: neuromorphic computing that's perhaps more aligned with biological efficiency.

https://github.com/yfguo91/Ternary-Spike

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

> The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increas- ing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoret- ically and experimentally prove that the binary spike acti- vation map cannot carry enough information, thus causing information loss and resulting in accuracy decreasing. To handle the problem, we propose a ternary spike neuron to transmit information. The ternary spike neuron can also enjoy the event-driven and multiplication-free operation advantages of the binary spike neuron but will boost the information ca- pacity. Furthermore, we also embed a trainable factor in the ternary spike neuron to learn the suitable spike amplitude, thus our SNN will adopt different spike amplitudes along layers, which can better suit the phenomenon that the membrane po- tential distributions are different along layers. To retain the efficiency of the vanilla ternary spike, the trainable ternary spike SNN will be converted to a standard one again via a re- parameterization technique in the inference. Extensive experi- ments with several popular network structures over static and dynamic datasets show that the ternary spike can consistently outperform state-of-the-art methods.

woadwarrior01 6 hours ago

Ternary quantized weights for LLMs are a thing. Most of the weights in Bitnet b1.58[1] class models[2][3] are ternary (-1/0/1).

[1]: https://arxiv.org/abs/2402.17764

[2]: https://huggingface.co/tiiuae/Falcon-E-3B-Instruct

[3]: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

pumplekin 9 hours ago

I've always thought we could put a bit of general purpose TCAM into general purpose computers instead of just routers and switches, and see what people can do with it.

I know (T)CAM's are used in CPU's, but I am nore thinking of the kind of research being done with TCAM's in SSD like products, so maybe we will get there some day.

[-]

hinkley 8 hours ago

There’s a lot of tech in signaling that doesn’t end up on CPUs and I’ve often wondered why.

Some of it is ending up in power circuitry.

cyberax 7 hours ago

TCAM still uses 2-bit binary storage internally, it just ignores one of the values.

notRobot 3 hours ago

Genuine question: if we can go beyond two, why not go beyond three? What makes three appealing but not a larger number?

jacobmarble 9 hours ago

In digital circuits there’s “high”, “low”, and “high impedance”.

[-]

gblargg 8 hours ago

There's low-impedance and high-impedance. Within low-impedance, there's high and low.

SteveJS 9 hours ago

https://en.wikipedia.org/wiki/Ternary_computer

anon291 8 hours ago

Mapping the three trinary values to yes no and maybe is semantic rubbish

ChrisMarshallNY 8 hours ago

I seem to remember reading about "fuzzy logic" (a now-quaint term), where a trinary state was useful.

[-]

zer00eyz 7 hours ago

"One feature that sets certain rice cookers above the rest is “fuzzy logic,” or the ability of an onboard computer to detect how quickly the rice is cooking or to what level doneness it has reached, then make real time adjustments to time and temperature accordingly. " ... From: https://www.bonappetit.com/story/zojirushi-rice-cooker

It is a term that is still quite a fair bit for marketing. I think in this case (zojirushi) it isn't trinary, rather some probalistic/baysian system to derive a boolean from a number of factors (time, temp, and so on).

[-]

jasonwatkinspdx 5 hours ago

Back in the late 80s and early 90s fuzzy logic became something of a fad in Japan because several of the leading researchers were at Japanese institutions. So it became a term of hype with a bit of flag waving involved.

I'm reasonably convinced my Zojirushi has nothing more than a way to sense when the evaporation shifts and to start the "steaming" countdown timer then, probably using the resistance of the heating coil. In other words it's just a replacement for the weight/balance mechanism in a traditional "dumb" rice cooker, not something doing more complex modeling as far as I can tell.

It is however built like a tank and "just works" so I'm entirely happy with my purchase.

[-]

zer00eyz 4 hours ago

> weight/balance mechanism in a traditional "dumb" rice cooker

These are far more interesting than that. Technology Connections YouTube channel did a great breakdown of how they really work: https://www.youtube.com/watch?v=RSTNhvDGbYI

DiggyJohnson 8 hours ago

This is off topic but how do you build and post to that blog? Homegrown or framework?

1970-01-01 7 hours ago

Isn't quantum computing "all the aries"

The quantum dream is also the trinary dream.

hyperhello 9 hours ago

Well, maybe.

Nevermark 7 hours ago

Ternary is indeed an enticing, yet ultimately flawed dream.

Quaternary allows for:

  True, “Yes”

  False, “No”

  Undetermined, “Maybe”, “Either”, True or False

And:

  Contradiction, “Invalid”, “Both”, True and False

For logical arithmetic, I.e. reducing tree expressions, True and False are enough.

But in algebraic logic, where more general constraint topologies are possible, the other two values are required.

What is the logical value of the isolated expression “(x)”? I.e. “x” unconstrained?

Or the value of the expression “(x = not x)”?

None of 4-valued logic’s values are optional or spurious for logical algebra.

—-

Many people don’t know this, but all modern computers are quaternary, with 4 quaternit bytes. We don’t just let anyone in on that. Too much power, too much footgun jeopardy, for the unwashed masses and Python “programmers”.

The tricky thicket of web standards can’t be upgraded without introducing mayhem. But Apple’s internal-only docs reveal macOS and Swift have been fully quaternary compliant on their ARM since the M1.

On other systems you can replicate this functionality, at your own risk and effort, by accessing each quaternit with their two bit legacy isomorphic abstraction. Until Rust ships safe direct support.

—-

It will revolutionize computing, from the foundations up, when widely supported.

Russell’s paradox in math is resolved. Given a set S = “The set of all sets that don’t contain themselves”, the truth value of “Is S in S” in quaternary logic, reduces to Contradiction, which indeed it is. I.e. True and False. Making S a well formed, consistent entity, and achieving full set and logical completeness with total closure. So consistency is returned to Set theory and Russell’s quest for a unification of mathematics with just sets and logic becomes possible again. He would have been ecstatic. Gödel be damned! [0]

Turing’s Incompleteness Theorem demonstrates that 2-valued bit machines are inherently inconsistent or incomplete.

Given a machine M, applied to the statement S = “M will say this statement is False”, or “M(S) = False”, it has to fail.

If M(S) returns True, we can see that S is actually False. If M(S) returns False, we can see that actually S is True.

But for a quaternary Machine M4 evaluating S4 = “M4(S4) = False”, M4(S4) returns Contradiction. True and False. Which indeed we can see S4 is. If it is either True or False, we know it is the other as well.

Due to the equivalence of Undecidability and the Turing Halting Problem, resolving one resolves the other. And so quaternary machines are profoundly more powerful and well characterized than binary machines. Far better suited for the hardest and deepest problems in computing.

It’s easy to see why the developers of Rust and Haskell are so adamant about getting this right.

[0] https://tinyurl.com/godelbedamned

[-]

IndrekR 7 hours ago

Most common quaternary storage system is probably DNA.

nzeid 7 hours ago

Not wrong, but I think the hope was more to have "8-trinit" bytes i.e. something with more states than a classic bit.

[-]

Nevermark 6 hours ago

Thank you for taking my points with exactly the seriousness they deserve.

I respond in that spirit.

Taking the convention that “byte” always means 8 n-valued bits:

One advantage of a convention of 8 quaternit bytes is they can be readily used as 8 ternary valued bytes too, albeit with reduced use of their value range.

8 quaternit bytes also have the advantage of higher resolution addressing, I.e. at the nibble = 4 quaternary bit boundaries. (The last bit of modern memory addresses indicates the upper or lower quaternary nibble.

Nevermark 5 hours ago

Edit/addition: there are some serious points in the Socratic satire there, for those who want to consider.

Despite our natural aesthetic hesitancy to equate a 4-valued bit with two 2-valued bits, we all understand they are the same. Many “binary” storage devices do the reverse, and store multiple “binary” values with higher range cells.

A bit of information (whatever it’s arity) is the same bit regardless of how it is stored or named.

We get stuck in our familiar frames and names.

Also, the points about Russell’s paradox and Turing Incompleteness are conveyed in an absurdist’s deadpan, but they are in fact actual critiques I am making. In both cases, two-valued logic, suitable only for arithmetic, is used in algebraic contexts where self-referencing and open constraints are both possible, despite the basic inability of two-valued logic to represent the values of either.

It is startling to me, the obvious limitations this out-of-the-gate bad assumption of an excluded middle in algebraic contexts, places on the generality of conclusions in both treatments, where the failings of the excluded middle are basically the "proof". Proof of what was assumed, essentially.

Anyone who cares about those topics can work through those points. Neither are as meaningless or trivial as might be expected.

Finally, four valued logic would be very useful to support at CPU instruction levels, for algebraic contexts, beyond arithmetic. Especially since no changes to memory are needed.

Interestingly, with 4-valued logic, there are two different sets of AND, OR and NOT, for two ways they can be treated. And the current bit-wise operators, acting on [tf] 2-bit 4-valued logic (True as [+-], False as [-+], [--] as unknown, and [++] as contradiction) already implement the new versions of those operations. So new instructions are only needed to implement regular AND, OR, NOT operations for 2-bit 4-valued logical values.

readthenotes1 7 hours ago

I've liked true, false, unknown, unknowable--though think there should be a something somewhere for fnord.

marshfram 9 hours ago

Analog is next. Software first, then build the machines. No more models, reductions, loss. Direct perception through measurement and differences.

[-]

cluckindan 9 hours ago

Analog was before, though. General computing was never realized using those architectures; granted, they were mechanical in nature, so that is a big ask, both figuratively and literally.

Maybe we could create continuous-valued electrical computers, but at least state, stability and error detection are going to be giant hurdles. Also, programming GUIs from Gaussian splats sounds like fun in the negative sense.

[-]

marshfram 9 hours ago

You have to withdraw from the binary in all senses to begin to imagine what an analog spatial differences measurement could function as.

Again, think software first. The brain is always a byproduct of the processes, though it is discerned as a materialist operation.

Think big, binary computers are toys in the gran scheme of things.

estimator7292 9 hours ago

You've just described vacuum tube computers as well as all the early transistorized computers. Digital computing is a relatively late concept

[-]

mcnamaratw 8 hours ago

Of course there were analog 'computers' but vacuum tubes were also used to realize digital computers in the early days.

https://en.wikipedia.org/wiki/Vacuum-tube_computer

bastawhiz 7 hours ago

We'd need a real breakthrough in physics to have such a technology that works at a scale even remotely comparable to what a low end digital CPU can do today. The thing is, there's not even any real evidence (at least to my knowledge) that there are useful threads that researchers know to pull on that could yield such a technology. Emulating analog hardware with digital hardware in anticipation of some kind of breakthrough isn't going to have any material benefits in the short to medium term.

adamthegoalie 7 hours ago

ChatGPT 5-Pro, What would it be like if we used trinary instead of binary computers? https://chatgpt.com/s/t_68f53bb9b15c8191b8d732f722243719