Tiny hackable CUDA language model implementation

(github.com)

45 points | by markusheimerl 3 days ago ago

6 comments

oakinnagbe 40 minutes ago

Nice implementation. Have you thought about supporting LoRA fine-tuning on top of this, or is the design too low-level for that kind of extension?

qqqqqlqq 42 minutes ago

$make run -j 10

CUDA error in attention.c:91: out of memory

Command exited with non-zero status 1

1.38user 0.46system 0:00.75elapsed 246%CPU (0avgtext+0avgdata 226164maxresident)k

0inputs+0outputs (0major+25414minor)pagefaults 0swaps

make: ** [Makefile:34: run] Błąd 1

clang: warning: CUDA version 12.4 is only partially supported [-Wunknown-cuda-version]

(I have ubuntu and 8GB memory NVIDIA GeForce RTX 3050 876MiB / 8192MiB )

Gred_papa_dance an hour ago

I need more info:

* where is data (make data) how create new my own data, (questions for chat?) * how create a tokenizer (meybe separate) * how stop the code, how many memory need, how setup size of context etc. * how creating a LORA or learn with new data. * how quantize model?

In my opinion this is great idea but making a Ruby extension will be goot way to increase users using this code.

yobbo 8 hours ago

Looks very nice, but I can't find numerical gradient checks, which is helpful when verifying that backward pass is correct:

https://github.com/markusheimerl/gpt/blob/main/transformer/a...

[-]

markusheimerl 5 hours ago

I deleted the numerical checks a while back after confirming the backward pass is correct to keep the code base lean - running https://github.com/markusheimerl/gpt/blob/main/transformer/a... is also somewhat of a confirmation that the backward pass is correct, since an analytically incorrect backward pass cant fit perfectly to synthetic data.

qqqqqlqq 36 minutes ago

It works on arm ?