COM Like a Bomb: Rust Outlook Add-in

(tritium.legal)

61 points | by piker 6 hours ago ago

28 comments

bri3d 4 hours ago

This is quite interesting: it's easy to blame the use of LLM to find the interface, but really this is a matter of needing to understand the COM calling conventions in order to interact with it.

I found the interface and a C++ sample in about two minutes of GitHub searching:

https://github.com/microsoft/SampleNativeCOMAddin/blob/5512e...

but I don't actually think this would have helped the Rust implementation; the authors already knew they wanted a BSTR and a BSTR*, they just didn't understand the COM conventions for BSTR ownership.

[-]

bigstrat2003 an hour ago

> it's easy to blame the use of LLM to find the interface, but really this is a matter of needing to understand the COM calling conventions in order to interact with it.

Sure, but I think that this perfectly illustrates why LLMs are not good at programming (and may well never get good): they don't actually understand anything. An LLM is fundamentally incapable of going "this is COM so let me make sure that the function signature matches the calling conventions", it just generates something based on the code it has seen before.

I don't blame the authors for reaching for an LLM given that Microsoft has removed the C++ example code (seriously, what's up with that nonsense?). But it does very nicely highlight why LLMs are such a bad tool.

jlarocco 4 hours ago

I don't like Windows, but I've always thought COM was pretty cool. It's a nightmare using it directly from low level languages like C++ and Rust, though. It's a perfect place to use code generation or metaprogramming.

In Python, Ruby and the Microsoft languages COM objects integrate seamlessly into the language as instances of the built-in class types.

Also, there's a fairly straightfoward conversion from C# to C++ signatures, which becomes apparent after you see a few of them. It might be explicitly spelled out in the docs somewhere.

[-]

asveikau 3 hours ago

COM is basically just reference counting and interfaces. Also, the HRESULT type tries to give some structure to 32 bit error codes.

I remember a few years back hearing hate about COM and I didn't feel like they understood what it was.

I think the legit criticisms include:

* It relies heavily on function pointers (virtual calls) so this has performance costs. Also constantly checking those HRESULTs for errors, I guess, gives you a lot more branching than exceptions.

* The idea of registration, polluting the Windows registry. These days this part is pretty optional.

[-]

snuxoll 3 hours ago

As somebody who's been, for whatever reason, toying around with writing a COM-style ABI layer in Rust, there's a lot of good ideas in there and I think a lot of the hatred comes from the DLL hell that was spawned by registration; along with the, unfortunately necessary, boilerplate.

Virtual dispatch absolutely has an overhead, but absolutely nobody in their right mind should be using COM interfaces in a critical section of code. When we're talking things like UI elements, HTTP clients, whatever, the overhead of an indirect call is negligible compared to the time spent inside a function.

The one thing I'm personally trying to see if there's any room for improvement on in a clean slate design, is error handling / HRESULT values. Exceptions get abused for flow control and stack unwinding is expensive, so even if there was a sane way to implement cross-language exception handling it's a non starter. But HRESULT leads to IErrorInfo, ISupportErrorInfo, thread local state SetErrorInfo/GetErrorInfo, which is a whole extra bunch of fun to deal with.

There's the option of going the GObject and AppKit route, using an out parameter for an Error type - but you have to worry about freeing/releasing this in your language bindings or risk leaking memory.

[-]

dleary 2 hours ago

> Virtual dispatch absolutely has an overhead, but absolutely nobody in their right mind should be using COM interfaces in a critical section of code.

I could definitely be wrong, but I think C++ style "virtual dispatch" (ie, following two pointers instead of one to get to your function) doesn't really cost anything anymore, except for the extra pointers taking up cache space.

Don't all of the Windows DirectX gaming interfaces use COM? And isn't AAA gaming performance critical?

[-]

snuxoll an hour ago

> Don't all of the Windows DirectX gaming interfaces use COM? And isn't AAA gaming performance critical?

Yes, on both counts. You will also, on average, be making fewer calls to ID3D12CommandQueue methods than one would think - you'd submit an entire vertex buffer for a model (or specific components of it that need the same pipeline state, at least) at once, allocate larger pools of memory on the GPU and directly write textures to it, etc.

This is the entire design behind D3D12, Vulkan, and Metal - more direct interaction with the GPU, batching submission, and caching command buffers for reuse.

When I'm talking about "critical sections" of code, I mean anything with a tight loop where you can reasonably expect to pin a CPU core with work. For a game, this would be things like creating vertex buffers, which is why all three major API's take these as bare pointers to data structures in memory instead of requiring discrete calls to create and populate them.

WorldMaker 2 hours ago

WinRT is certainly not a "clean slate design", but still a useful comparison to see where Microsoft themselves iterated on the COM design with decades of hindsight.

bri3d 2 hours ago

> COM is basically just reference counting and interfaces. > I remember a few years back hearing hate about COM and I didn't feel like they understood what it was.

Even in "core" COM there's also marshaling, the whole client/server IPC model, and apartments.

And, I think most people encounter COM with one of its friends attached (like in this case, OLE/Automation in the form of IDispatch), which adds an additional layer of complexity on top.

Honestly I think that COM is really nice, though. If they'd come up with some kind of user-friendly naming scheme instead of UUIDs, I don't even think it would get that much hate. It feels to me that 90% of the dislike for COM is the mental overhead of seeing and dealing with UUIDs when getting started.

Once you get past that part, it's really fast to do pretty complex stuff in; compared to the other things people have come up with like dbus or local gRPC and so on, it works really well for coordinating extensibility and lots of independent processes that need to work together.

recursive 2 hours ago

You might have been hearing some of that hate from me. I definitely don't understand COM, but I've had to use it once or twice. It's pretty far outside what I normally work on, which is all high-level garbage collected languages. I don't know if that's even the right dimension to distinguish it. I couldn't figure out how to use COM or what it's purpose was.

The task was some automated jobs doing MS word automation. This all happened about 20 years ago. I never did figure out how to get it to stop leaking memory after a couple days of searching. I think I just had the process restart periodically.

Compared to what I was accustomed to COM seemed weird and just unnecessarily difficult to work with. I was a lot less experienced then, but I haven't touched COM since. I still don't know what the intent of COM is or where it's documented, and nor have I tried to figure it out. But it's colored my impression of COM ever since.

I think there may be a lot of people like me. They had to do some COM thing because it was the only way to accomplish a task, and just didn't understand. They randomly poked it until it kind of worked, and swore never to touch it again.

[-]

duped an hour ago

> I still don't know what the intent of COM is

COM is an ABI (application binary interface). You have two programs, compiled in different languages with different memory management strategies, potentially years apart. You want them to communicate. You either

-1 use a Foreign Function Interface (FFI) provided to those languages -2 serialize/deserialize data and send it over some channel like a socket

(2) is how the internet works so we've taken to doing it that way for many different systems, even if they don't need it. (1) is how operating systems work and how the kernel and other subsystems are exposed to user space.

The problem with FFI is that it's pretty barebones. You can move bytes and call functions, but there's no standard way of composing those bytes and function calls into higher level constructs like you use in OOP languages.

COM is a standard for defining that FFI layer using OOP patterns. Programs export objects which have well defined interfaces. There's a root interface all objects implement called "Unknown", and you can find out if an object supports another interface by calling `queryInterface()` with the id of a desired interface (all interfaces have a globally unique ID). You can make sure the object doesn't lose its data out of nowhere by calling `addRef()` to bump its reference count, and `release()` to decrement it (thus removing any ambiguity over memory management, for the most part - see TFA for an example where that fails).

> where it's documented

https://learn.microsoft.com/en-us/windows/win32/com/the-comp...

[-]

asveikau 2 minutes ago

> You have two programs, compiled in different languages with different memory management strategies, potentially years ap

Sometimes they are even the same language. Windows has a few problems that I haven't seen in the Unix world, such as: each DLL potentially having an incompatible implementation of malloc, where allocating using malloc(3) in one DLL then freeing it with free(3) in another being a crash.

jstimpfle 2 hours ago

I'd say COM is also run-time type safe casting, and importantly the reference counting is uniform which might help writing wrappers for dynamic and garbage collected languages.

I'm still not sure that it brings a lot to the table for ordinary application development.

[-]

asveikau 2 hours ago

It's been a while since I've written it professionally, but I felt the fact that it has consistent idioms and conventions helped me be somewhat more productive writing C++. In the vast landscape of C++ features it winds up making some decisions for you. You can use whatever you want within your component but the COM interfaces dictate how you talk to outside.

pjmlp 4 hours ago

Not if using Delphi or C++ Builder.

For whatever reason all attempts to make COM easier to use in Visual C++, keep being sabotaged by internal teams.

It is like Windows team feels like it is a manhood test to use such low level tooling.

ok123456 4 hours ago

Using COM in Perl was pretty seamless back in its heyday.

CrimsonCape 4 hours ago

> But using C# required us to contemplate whether and which dotnet runtime our client supported. Or did we need to ship our own? Isn't this just a small launcher stub? This was just too much complexity outside of our wheelhouse to put between our product and the user. This is not to say that the C# approach isn't valid. It is just that our limited understanding of that ecosystem and its requirements counseled against shipping it as a primary entry point into our application.

You should be able to compile a relatively small, trimmed, standalone, AOT compiled library that uses native interop. (Correct me if i'm wrong, dotnet users). Then there would be no dependency on the framework.

[-]

sedatk 4 hours ago

Or you could target .NET Framework 4.8 which is supported by all Windows OSes out of the box albeit quite outdated.

[-]

Kwpolska 3 hours ago

Their add-in seems quite simple, I imagine there would be no meaningful difference between using the classic .NET Framework 4.8 and .NET 10.

pjmlp 4 hours ago

Yes, provided you are using modern COM bindings introduced in .NET Core, alongside code generators.

merb an hour ago

You can only use .net 4.8 when you create an outlook add-in.

I mean yes you can build it with native interop and aot. But then you would loose the .net benefits as well.

rconti 3 hours ago

Reference: Rage Against the Machine song "Calm Like a Bomb"

https://www.youtube.com/watch?v=h2TLwwrLKbY

meibo 5 hours ago

I will say that I'm surprised no other LLM picked this up, since the issue should be somewhat evident to people familiar with C++ and how COM works. COM APIs cannot represent "owned" strings.

Still better than whatever JS rats nest they came up with for the new Outlook.

[-]

snuxoll 2 hours ago

What do you mean by "owned" strings?

WinRT, which is ultimately just an evolution of COM, has HSTRING which can own the data inside it (as well as contain a reference to an existing chunk of memory with fast-pass strings).

LegionMammal978 4 hours ago

A lot of these automatic marshalling systems (in this case, windows-rs) can be annoyingly unintuitive or opaque in how they handle subtler details of memory ownership, character sets, how to allocate and free objects, etc. And then it's made worse by documentation that only gives the output of one marshalling system (in this case, .NET) that's different from the one you're using, so you have to translate it both backwards and forwards. I guess this is mainly a consequence of COM trying to be all things to all people, being used by both unmanaged and managed code.

ptx 4 hours ago

Couldn't the correct function signatures be generated from the COM type library? Using an LLM for this is clearly not a good fit, as the article demonstrates.

[-]

Kwpolska 3 hours ago

They would need to know what a COM type library is in the first place.

JanneVee 3 hours ago

Fun fact about BSTR, it uses memory before the string pointer to store the length.

From the CComBSTR documentation from microsoft: "The CComBSTR class is a wrapper for BSTRs, which are length-prefixed strings. The length is stored as an integer at the memory location preceding the data in the string. A BSTR is null-terminated after the last counted character but may also contain null characters embedded within the string. The string length is determined by the character count, not the first null character." https://learn.microsoft.com/en-us/cpp/atl/reference/ccombstr...

From the book ATL internals that I read about 24 years ago.

"Minor Rant on BSTRs, Embedded NUL Characters in Strings, and Life in General From the book ATL internals that i read about 24 years ago.

The compiler considers the types BSTR and OLECHAR* to be synonymous. In fact, the BSTR symbol is simply a typedef for OLECHAR. For example, from wtypes.h: typedef / [wire_marshal] / OLECHAR __RPC_FAR BSTR;

This is more than somewhat brain damaged. An arbitrary BSTR is not an OLECHAR, and an arbitrary OLECHAR is not a BSTR. One is often misled on this regard because frequently a BSTR works just fine as an OLECHAR *.

STDMETHODIMP SomeClass::put_Name (LPCOLESTR pName) ; BSTR bstrInput = ... pObj->put_Name (bstrInput) ; // This works just fine... usually SysFreeString (bstrInput) ;

In the previous example, because the bstrInput argument is defined to be a BSTR, it can contain embedded NUL characters within the string. The put_Name method, which expects a LPCOLESTR (a NUL-character-terminated string), will probably save only the characters preceding the first embedded NUL character. In other words, it will cut the string short."

I wont link to the pirated edition which is never than the one I read.

So if there is code in outlook that relies on the preceding bytes being the string length it can be the cause of the memory corruption. It would require a sesssion in the debugger to figure it out.