Ubitium is developing 'universal' processor combining CPU, GPU, DSP, and FPGA

(tomshardware.com)

25 points | by LorenDB 4 hours ago ago

11 comments

You can get a really cool multiple core chips now for just a few dollars.

https://milkv.io/duo-s This has a RISC/V, Cortex, a TPU and of course the much beloved 8051.

wmf 3 hours ago

The CTO Martin Vorbach published some research on reconfigurable processors 20 years ago: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C44&q=Mar...

sillywalk 4 hours ago

"Ubititum claims all of the transistors in its Universal Processor can be reused for everything; no “specialized cores” like those in CPUs and GPUs are required."

One has doubts, especially with only $3.7 million in funding so far.

I recall that Sun's MAJC processor had functional/instruction units that were generic - there weren't dedicated floating point or integer or simd units, they could all operate on any instruction.

[-]

written-beyond 3 hours ago

Could you go into a little more depth about how what they're building is anything different from an FPGA.

FPGA's are basically a matrix of interconnected MUXs and LUTs, providing whatever functionality a designer may require, that fits in it's die.

[-]

cdumler an hour ago

I have no specific knowledge, but another approach would be to integrate more unusual very-long-instruction-word micro-instructions, like large scale matrix functions, algorithm encode/decode functions, and very long vector operations.

As I recall, Transmeta's CPU could accept x86 instructions because the software translator, called Code Morphing Software (like Rosetta), would decompose the x86 instruction into a set of steps over a very-long-instruction-word. VLIW's design is such that all of the instructions went into separate, parallel pipelines. Each pipeline had specific set of abilities. Think, the first three pipelines might be able to do integer arithmetic, but 3 and 4 can do floats. Also, the CPU implemented a commit/rollback concept which allowed it cause "faults," like branch miss-predictions, interrupts, and instruction faults. This allowed the Transmeta CPU to emulate the x86 beyond just JIT compilations. In theory, it could emulate any other CPU. They tried going after Intel (and failed); but, I think they would have been better off trying go after any one trying to jump start a new architecture.

Part of the reason why CPUs aren't good at GPU activities is because the instructions are expected to have pretty small, definite set of inputs and outputs (registers), use a reasonable number of CPU cycles, and must devote logic to ensure a fault can be unwound (CPU doesn't crash). FPGs are cool because you can essentially have wholly independent units with their own internal state. The little units can be wired any way desired. The problem with FPGs is all that interconnect means a lot of capacitance in the lines, so much slower clock speeds.

So, maybe they are trying to strike a balance. They have targeted instructions are more FPG-like, like "perform algorithm." The instruction receives a set of flags that defines which algorithms to use and in what order (use vector as 8-bit integers, mask with 0x80, compute 16bit checksum) and a vector register. You can loading vectors and running them then finally "read perform algorithm result" with flag "get compute 16bit checksum." FPG-like and registers aren't "polluted" with intermediate state.

sillywalk 2 hours ago

If you mean about the Ubitium, then no - other than what's in the article.

If you mean more in depth about MAJC, then also no - I read an Ars Technica article around about it (and Itanium) around 25 years ago, when it came out and also the Wikipedia page.

I have no EE or CPU design background, I'd imagine most people would know far more than me. I just remembered the 'generic instruction unit' from MAJC and if this was something superficially similar but at the processor 'core' level.

imtringued an hour ago

Actually, FPGAs are a mix of everything nowadays. They have both programmable logic consisting of LUTs and flipflops in CLBs with integrated carry chains, connection boxes and routing switches, configurable SRAM blocks known as block RAM or sometimes UltraRAM, DSP blocks providing configurable arithmetic units, PLLs, conventional ARM cores, memory controllers, high speed transceivers and finally also VLIW cores for machine learning inference. Notice how a lot of the silicon area is actually taken up by hard silicon that can be connected to the programmable logic. The problem with the largest FPGAs is that you will reach the point where you are swimming in LUTs and the chip are is better spent on e.g. more memory or other hard wired logic like a processor core.

not_your_vase 4 hours ago

Which makes me remember Tachyum, and their universal Prodigy CPU, which is in a constant state of "we just need 5 more minutes, and we are done"...

mouse_ 4 hours ago

that's a really long winded way of saying "SoC with FPGA".

[-]

jacoblambda 27 minutes ago

Specifically what it looks like they are pursuing is an SoC built around an FPGA that is optimised for runtime reconfiguration speed and presumably with an ISA extension to request and then later release configurations.

westurner 24 minutes ago

From "Universal AI RISC-V processor does it all — CPU, GPU, DSP, FPGA" (2024) https://www.eenewseurope.com/en/universal-ai-risc-v-processo... :

> For over half a century, general-purpose processors have been built on the Tomasulo algorithm, developed by IBM engineer Robert Tomasulo in 1967. It’s a $500B industry built on specialised CPU, GPU and other chips for different computing tasks. Hardware startup Ubitium has shattered this paradigm with a breakthrough universal RISC-V processor that handles all computing workloads on a single, efficient chip — unlocking simpler, smarter, and more cost-effective devices across industries — while revolutionizing a 57-year-old industry standard.

Tomasulo's algorithm: https://en.wikipedia.org/wiki/Tomasulo%27s_algorithm

Intel, AMD, ARM, X-Silicon’s C-GPU RISC architecture, and Cerebras' on-chip SRAM architecture and are all Tomasulo algorithm OOO Out-of-Order execution processor architectures FWIU