Binary Dependencies: Identifying the Hidden Packages We All Depend On

(vlad.website)

44 points | by PaulHoule 3 days ago ago

5 comments

woodruffw 19 minutes ago

Seth Larson gave a talk on this (with a focus on Python as well) at PyCon US last year[1] as well.

It's a non-trivial issue, in terms of balancing conflicting interests: Python (like most interpreted languages) has a story for integrating native libraries, but that story is not particularly user friendly (in terms of users, Python developers, etc. not having the domain expertise to debug failing native builds). So these ecosystems tend to develop bespoke mechanisms for stashing native binaries inside package distributions, turning a build reliability problem into an introspection problem.

[1]: https://www.youtube.com/watch?v=x9K3xPmi_tg

yjftsjthsd-h 26 minutes ago

> In almost all ecosystems, it is difficult to keep track of binary dependencies. When you depend on a package’s source code, this is normally recorded in your manifest file — pyproject.toml, package.json and so on. However, when you depend on a package’s precompiled binaries, this information is usually not recorded anywhere. This means that the binary dependency relationship between your project and whatever you’re depending on is hidden — so we can say that you have a phantom binary dependency.

I know it comes up every time... but nix does kinda exist to solve this problem. At least in pure mode.

pabs3 5 hours ago

Its possible to avoid all of those binaries (including the Linux kernel) and build from source instead.

https://bootstrappable.org/ https://lwn.net/Articles/983340/ https://github.com/fosslinux/live-bootstrap https://stagex.tools/

[-]

II2II 28 minutes ago

The point of the talk is it is non-trivial to detect those dependencies.

It looks like most of the time was spent discussing Python. I suspect that is because it is possible to create software without an explicit build stage, so you would not receive warnings about a dependency until the code is called. If the software treats it as an optional dependency, you may not receive any warnings. This sort of situation is by no means unique to interpreted languages. You can write a program in C, then load a library at run time. (I've never tried this sort of thing, so I don't know how the compiler handles unknown identifiers/symbols.) Heck, even the Linux kernel is expected to run "hidden packages" (i.e. the kernel has no means of tracking the origin of software you ask for it to run).

Yes, you can write software to detect when an inspected application loads external binaries. No, it is not trivial (especially if the software developer was trying to hide a dependency).

And just a quibble: even bootstrapping requires the use of a binary (unless you go to unbelievably extraordinary measures).

pabs3 5 hours ago

Personally I like using Debian packages to keep track of source and binary dependencies.