Inserting a 0 bit in the middle of a value

(fgiesen.wordpress.com)

39 points | by ingve 8 months ago ago

19 comments

tromp 8 months ago

    uint64 insert_zero_bit(uint64 value, int pos) {
        uint64 top_mask = ~0u64 << pos;
        return value + (value & top_mask);
    }

> It only works for inserting exactly 1 bit

But is easily generalized to inserting k bits:

        return value + ((1<<k)-1) * (value & top_mask);

explodingwaffle 8 months ago

The more and more bit-hacks I see, the more convinced I am that they should be handled by a cleverer compiler. Some intrinsic(s) where you say “do x, y, to these bits” and it just figures out the optimum way of doing it for whatever platform you compile for.

[-]

jsheard 8 months ago

This tool is handy if you want a specific bit shuffle:

https://programming.sirrida.de/calcperm.php

It doesn't always find the optimal solution but it usually gets close.

[-]

PhilipRoman 8 months ago

That's a really cool side. Sad that it doesn't allow duplicate indices. I tried implementing the same thing but with support for duplicates and it was definitely a humbling exercise. I'm curious if there is any formal mathematical theory for this (shifted-off bits would probably make it less elegant than pure rotation)

explodingwaffle 8 months ago

that page + the book Hacker's Delight are a big part of the reason I am so convinced of this :)

If compilers/languages/standard libraries provided these bit permutations, and it was just something ~everyone had learned, it would be a lot easier to work with bits without needing to come up with the bitwise ops (or use that generator). In addition it would probably make better use of the hardware: sure, people like to pretend that we’re still programming C for PDP11, but modern hardware supports more operations than C has operators for (RISC-V B extension and co have the right idea <3)

Modern compilers are probably pretty good? but I doubt they are perfect at turning code like that in OP into the best instructions. It is probably a bit late for C/C++ though. maaaybe possible to get it into LLVM and Rust.

sumtechguy 8 months ago

I always wanted to see stuff like this website put into the compiler. https://graphics.stanford.edu/~seander/bithacks.html

The very tough problem is recognizing to emit the correct code.

sixthDot 8 months ago

Well it is already the case, for example things like ROR or ROL are generated by optimizers[1] from certain patterns of bitops.

[1]: https://godbolt.org/z/eE1cx9GT4

[-]

jsheard 8 months ago

That's more like the opposite, OP wants to tell the compiler what transform to do and have it figure out a solution, but in your example you give the compiler a solution and it works backwards to figure out the intent and rewrite it to a completely different solution.

Most languages do have intrinsics for ROL/ROR at least, which you should generally use instead of relying on optimizer magic. I've certainly run into cases where those magic patterns don't get optimized (looking at you MSVC) but intrinsics always work.

pavlov 8 months ago

For smaller integers, SIMD bit permute instructions can be used to do this kind of thing rapidly and with arbitrary reordering of the bits.

badmintonbaseba 8 months ago

Nice bit twiddling hack! There are also the PDEP and PEXT instructions on x86 that could be used for this purpose, but it's possibly not worth it here.

[-]

tubs 8 months ago

On arm you can use https://developer.arm.com/documentation/dui0489/i/arm-and-th...

which is pretty convenient for this use-case.

[-]

dan-robertson 8 months ago

Am I missing something or does that instruction only clear bits in the field and not shift the adjacent bits to contract the field to be empty?

Retr0id 8 months ago

I wonder if the compiler is smart enough to emit those instructions with the right optimization flags

[-]

badmintonbaseba 8 months ago

I never saw a compiler generating these instructions on its own. Compilers do produce popcnt and bswap from recognized algorithms, so I guess it's not entirely unreasonable, but I think the benefit would be marginal.

fmbb 8 months ago

> there’s only ever between 1 and 3 anchor bits

How often are these texturers transfered and how small are they?

Doing all this to avoid sending at most three bits for a texture file sounds like a colossal waste of human life and of money.

[-]

elpocko 8 months ago

Not per file, per block of 16 pixels. They are transferred quite often - BCn is a texture compression scheme for realtime use in GPUs. Those textures are stored and processed in compressed form.

https://en.wikipedia.org/wiki/S3_Texture_Compression

8 months ago

[deleted]

8 months ago

[deleted]

AXbsk14228 8 months ago

[flagged]