Floor and Ceil versus Denormals on CPU and GPU

(asawicki.info)

31 points | by ibobev 4 days ago ago

5 comments

yosefk an hour ago

Flush denormals to zero. Even their inventor had trouble writing correct code in their presence - see the Appendix to that "what every programmer should know..." paper

[-]

loicd an hour ago

> Even their inventor had trouble writing correct code in their presence

I didn't know that. Could you provide a more specific reference?

kevmo314 an hour ago

> This is not the first time we can see Nvidia taking shortcuts to achieve maximum performance of their GPUs

Why is implementing it correctly not performant? For context I have no idea how rounding is typically implemented anyways.

crote 3 hours ago

Another thing to keep in mind is that CPU processing of denormals tends to be extremely slow - I vaguely recall running into something like a 10x slowdown a decade ago.

For a lot of applications the difference between a denormal and zero is small enough to be irrelevant, so if you expect near-zero values to be common, enabling a denormals-to-zero compiler flag might give you a pretty nice performance boost for free.

[-]

adgjlsfhk1 24 minutes ago

cpus that aren't Intel are plenty fast on denormals. Intel is the only one where denormals are 100x slower. (and Intel has fixed that on their new cpus, but only on their e cores)