LoRA vs. Full Fine-Tuning: An Illusion of Equivalence

(arxiv.org)

57 points | by timbilt 4 hours ago ago

4 comments

pwillia7 an hour ago

This tracks with my feelings making and using Stable Diffusion Loras and fine tunes. Still, with the speed to train and use, Loras have worked for me in most use cases and it hasn't been worth fine tuning the entire model.

[-]

K0balt an hour ago

Yeah,it reflects the “feel” I get from lLoRa as well, especially if I overdo it. The new data becomes the preferred output even for unrelated inputs. I always felt like it was bludgeoning the model to some extent vs finetuning.

Also, LoRa tuning an extensively tuned model occasionally provokes full on delusional “insanity” or gibberish seizures.

I have had really good luck though using a highly tuned model as the training basis for a LoRa and then applying that LoRa mask to the base version of that model. I’m not sure why that seems to work better than the same LoRa training directly on the base model.

K0balt 2 hours ago

So, in layman’s terms, LoRa appears to “traumatize “ the model to some degree, connecting the vector space with strong “jumpers” (intruder dimensions) to change it’s behavior, instead of subtly conforming the entire model into a shape that accommodates the new data.

These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.

Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.

Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)

[-]

Mockapapella 27 minutes ago

Thank you for this layman explanation