This article title should have "(2004)" added; this is seriously old information.
For modern use, something about ARM CPUs would be much more useful since that's what microcontrollers all use now. No one's doing ASM programming on x86 CPUs these days (and certainly not Pentium4 CPUs).
Perhaps it's rare with full programs written in assembly, but for performance analysis and optimization I think knowledge about these kinds of tricks (but probably updated for the N generations since 2004, of course) still have relevance.
For instance Daniel Lemire's blog [1] is quite often featured here, and very often features very low-level performance analysis and improvements.
A fascinating peek into the fairly deep past (sigh) is Abrash's The Zen of Assembly language. Time pretty much overtook a planned Volume 2 but the Volume 1 is still a pretty fascinating read for a time when tweaking optimization for pre-fetch queues and the like was still a thing.
> (Intermediate)1. Adding to memory faster than adding memory to a register
I'm not familiar with Pentium but my guess is that memory store is relatively cheaper than load in many modern (out-of-order) microarchitectures.
> (Intermediate)14. Parallelization.
I feel like this is where compilers come into handy, because juggling critical paths and resource pressures at the same time sounds like a nightmare to me
This article title should have "(2004)" added; this is seriously old information.
For modern use, something about ARM CPUs would be much more useful since that's what microcontrollers all use now. No one's doing ASM programming on x86 CPUs these days (and certainly not Pentium4 CPUs).
Perhaps it's rare with full programs written in assembly, but for performance analysis and optimization I think knowledge about these kinds of tricks (but probably updated for the N generations since 2004, of course) still have relevance.
For instance Daniel Lemire's blog [1] is quite often featured here, and very often features very low-level performance analysis and improvements.
[1]: https://lemire.me/blog/
A fascinating peek into the fairly deep past (sigh) is Abrash's The Zen of Assembly language. Time pretty much overtook a planned Volume 2 but the Volume 1 is still a pretty fascinating read for a time when tweaking optimization for pre-fetch queues and the like was still a thing.
> (Intermediate)1. Adding to memory faster than adding memory to a register
I'm not familiar with Pentium but my guess is that memory store is relatively cheaper than load in many modern (out-of-order) microarchitectures.
> (Intermediate)14. Parallelization.
I feel like this is where compilers come into handy, because juggling critical paths and resource pressures at the same time sounds like a nightmare to me
> (Advanced)4. Interleaving 2 loops out of sync
Software pipelining!
What's a good resource like this for modern CPUs (especially ARM)?
Looks like this was written in 2004, or thereabouts.
I was wondering why it said P4. That's an old processor.