This is awesome! Really great write-up, and solid work by Jessie :^)
The Ladybird codebase is generally very defensive, but like every browser, our JavaScript engine is slightly less so (in the pursuit of performance.)
There are architectural lessons to learn here beyond just fixing the bugs found. We've since replaced these allocations (+ related ones) with callee-specific stack memory instead of trying to be clever with heap allocation reuse.
We're also migrating more and more of our memory management to garbage collection, which sidesteps a lot of the traditional C++ memory issues.
As others have mentioned, sandboxing & site isolation will make renderer exploitation a lot less powerful than what's demonstrated here. Even so, we obviously want to avoid it as much as possible!
This particular memory vulnerability, as I understand it, was a result of a `ReadonlySpan<>` targeting a resizable vector. A simple technique used by the scpptool-enforced safe subset of C++ to address this situation is to temporarily move the contents of the resizable vector into a non-resizable vector [1] and target the span at the non-resizable vector instead.
Upon destruction, the non-resizable vector will automatically return the contents back to the original resizable vector. (It's somewhat analogous to borrowing a slice in Rust.)
While it wouldn't necessarily prevent you from doing the flawed/buggy thing you were trying to do, it would prevent it from resulting in a memory vulnerability.
Whatever happens, large parts of the codebase + dependencies will be C++ (or C) for the foreseeable future.
We're working on integrating with Swift, but despite the team's earnest efforts, Swift/C++ interop is still young and unstable.
On a personal note, I'm increasingly feeling like "C++ with a garbage collector" might actually be a reasonable tool for the task at hand. Watching the development of Fil-C in this space..
I'm honestly not at all familiar with browsers but I really do wonder if a custom language wouldn't be a reasonable tradeoff. It's not all that insane as that is a path that has been walked before. For instance FoundationDB has their own syntax to manage their actor system which just transpiles to C++: https://github.com/apple/foundationdb/blob/main/flow/README....
V8 also has torque which I think to some degree also fits into that type of mindset.
Out of curiosity, why not C# at this point? It's pretty hard to marry C++ with a high-performant garbage collector, since underlying language semantics does not allow for e.g. compacting GCs.
What'd be the effect of Swift be on the possibility of a Windows port? I know anything end user friendly is ages away, but I don't live in Apple land, and neither does most of the world. Apple has a monopoly on iOS and huge market share on Mac, and is still at 20% or something.
The core Swift Lang has is being made more independent of Apple, and can be compiled for an increasing number of platforms thanks to the LLVM-based compiler
“ Ladybird started as a component of the SerenityOS hobby project, which only allows C++. The choice of language was not so much a technical decision, but more one of personal convenience. Andreas was most comfortable with C++ when creating SerenityOS, and now we have almost half a million lines of modern C++ to maintain.
However, now that Ladybird has forked and become its own independent project, all constraints previously imposed by SerenityOS are no longer in effect.
We have evaluated a number of alternatives, and will begin incremental adoption of Swift as a successor language, once Swift version 6 is released.”
I've only used it within XCode so can't say which is to blame, but I did frequently get the message "The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions" which seems to be a problem with the language/compiler/type system, not XCode per se
I agree with this, i've been avoiding xcode as much as possible for my little swift projects, but now I do wonder if that still stands for large codebases, I guess you could try and find some big codebases on swift on gh and see how much it takes to compile
Reentrancy bugs like this one are surprisingly common. Having reviewed lots of unsafe Rust code, unnoticed calls into outside code (that can then reenter your own code or modify your data structures, blowing everything up) is one of the most common soundness issues I've found across different projects.
The main solutions seem to be either restricting how possibly-invalidated data can be held (e.g., safe references in Rust), or having some coloring scheme (e.g., "pure" annotations) to ensure that the functions you call are unable to affect your data. Immutable languages can mitigate it somewhat, but only if you have the discipline to maintain a single source of truth for everything, and avoid operating on stale copies.
Any reasonably sophisticated web browser is going to require a decent amount of unsafe {} if only just for performance reasons. Obviously would be much easier to audit though.
Eh. It will work with your code but at some point your dependencies will have to dive into unsafe (e.g. calling C libs/kernel, SIMD, ASM by hand, etc.).
Minimize unsafe, auditing libs with Geiger, and minimizing outside dependencies to a few reliable vendors, is what is practically needed.
If this is all-new development, wouldn't it be good for the emphasis to be on correctness and security, as part of the design and coding itself?
That's something that you use fuzzing as one way to detect a failure of, not as the means of achieving correctness and security.
I'm not picking on Ladybird here specifically. Chrome and Firefox provide constant streams of security vulnerabilities. But it would be nice if Ladybird didn't start with the same problems that might be attributed to huge legacy code bases.
tbh i kinda love how they're just going for it and building from scratch but i always wonder how much focus on security upfront actually changes things long-term-you think building with fun in mind ends up missing critical stuff or does it keep devs more engaged
Even in a modern browser, a renderer exploit (the most sandboxed portion of the browser) gives you access to a large attack surface - the browser process via IPC, the kernel via syscalls, and loads of data from other websites.
So no, an exploit like this is not just “of academic value” even in a sandboxed browser.
With decades and decades of memory safety lessons in the books, it's hard to imagine how C++ was the language of choice when starting new browser from scratch in 2018.
The browser was not started with the idea of taking over the main focus of development, it was just another part of an already pretty large hobby OS project
Fine. With decades and decades of memory safety lessons in the books, it's hard to imagine how C++ was the language of choice when starting new operating system from scratch in 2018.
Their GitHub has 0,3% Swift code. They said they start once Swift 6 is out. It has been out for months. So either they abandoned Swift or haven’t really started or they are really really slow to start using it. All three options are against the article being outdated, wouldn’t you agree?
Current blockers to swift usage are found here: https://github.com/LadybirdBrowser/ladybird/issues/933
Rising tide lifts all boats, by trying to use Swift seriously, they're finding and helping fix bugs in the compiler
Because the article is from 2022 and says that they will use a custom language called Jakt which didn't pan out, it seems. Yes, I am also eager for the Swift rewrite to get off the ground.
When they started, the plan was mostly to have fun and see how far you can get when creating an OS from scratch. So picking a language in which they are experienced makes sense in that context.
One would think the same of C, where exploits trace all the way back to Morris worm in 1988, that is 36 years of thinking the problem are the developers, not the language, with new projects being started every day still.
At least C++ has mechanisms to write safer code, provided one makes use of them, even if still there are issues.
To use a modern example renaming the JavaScript file extension to a Typescript one, only gets you so far.
Then one can make use of Typescript's type system, or switch to Elm to the next level.
Always good to start the discussion but the article doesn't seems to link to an issue on the Ladybird github repo, which I would expect in the case of academic disclosure etc.
Obviously nobody is really using Ladybird yet and there will be many more such issues to address, so now is a good time to evaluate how to avoid such mistakes up front.
This is awesome! Really great write-up, and solid work by Jessie :^)
The Ladybird codebase is generally very defensive, but like every browser, our JavaScript engine is slightly less so (in the pursuit of performance.)
There are architectural lessons to learn here beyond just fixing the bugs found. We've since replaced these allocations (+ related ones) with callee-specific stack memory instead of trying to be clever with heap allocation reuse.
We're also migrating more and more of our memory management to garbage collection, which sidesteps a lot of the traditional C++ memory issues.
As others have mentioned, sandboxing & site isolation will make renderer exploitation a lot less powerful than what's demonstrated here. Even so, we obviously want to avoid it as much as possible!
This particular memory vulnerability, as I understand it, was a result of a `ReadonlySpan<>` targeting a resizable vector. A simple technique used by the scpptool-enforced safe subset of C++ to address this situation is to temporarily move the contents of the resizable vector into a non-resizable vector [1] and target the span at the non-resizable vector instead.
Upon destruction, the non-resizable vector will automatically return the contents back to the original resizable vector. (It's somewhat analogous to borrowing a slice in Rust.)
While it wouldn't necessarily prevent you from doing the flawed/buggy thing you were trying to do, it would prevent it from resulting in a memory vulnerability.
[1] https://github.com/duneroadrunner/scpptool#xslta_vector-xslt...
Very interesting, I was not familiar with your project. Thanks for sharing it here!
so is this gonna stay in c++ or are you still moving to swift
Whatever happens, large parts of the codebase + dependencies will be C++ (or C) for the foreseeable future.
We're working on integrating with Swift, but despite the team's earnest efforts, Swift/C++ interop is still young and unstable.
On a personal note, I'm increasingly feeling like "C++ with a garbage collector" might actually be a reasonable tool for the task at hand. Watching the development of Fil-C in this space..
I'm honestly not at all familiar with browsers but I really do wonder if a custom language wouldn't be a reasonable tradeoff. It's not all that insane as that is a path that has been walked before. For instance FoundationDB has their own syntax to manage their actor system which just transpiles to C++: https://github.com/apple/foundationdb/blob/main/flow/README....
V8 also has torque which I think to some degree also fits into that type of mindset.
> I'm honestly not at all familiar with browsers but I really do wonder if a custom language wouldn't be a reasonable tradeoff.
careful, last time someone said that we got Rust
Out of curiosity, why not C# at this point? It's pretty hard to marry C++ with a high-performant garbage collector, since underlying language semantics does not allow for e.g. compacting GCs.
What'd be the effect of Swift be on the possibility of a Windows port? I know anything end user friendly is ages away, but I don't live in Apple land, and neither does most of the world. Apple has a monopoly on iOS and huge market share on Mac, and is still at 20% or something.
https://x.com/GregKamradt/status/1848045525473677314
https://x.com/wycats/status/973761496277704704
The core Swift Lang has is being made more independent of Apple, and can be compiled for an increasing number of platforms thanks to the LLVM-based compiler
You can even build swiftUI apps without opening Xcode at all nowadays (albeit no code signing)
which is great.
I never learned swift but I can add features easily now or create 1-day projects using swiftUI that makes great macOS native UI's.
[flagged]
This is an FAQ
“ Ladybird started as a component of the SerenityOS hobby project, which only allows C++. The choice of language was not so much a technical decision, but more one of personal convenience. Andreas was most comfortable with C++ when creating SerenityOS, and now we have almost half a million lines of modern C++ to maintain.
However, now that Ladybird has forked and become its own independent project, all constraints previously imposed by SerenityOS are no longer in effect.
We have evaluated a number of alternatives, and will begin incremental adoption of Swift as a successor language, once Swift version 6 is released.”
https://ladybird.org/#faq
Swift struggles so much when compiling even moderately sized codebases, I worry this choice will prove untenable in the long term.
Swift or XCode?
I've experienced very fast Swift compilations, but when compiling an app - which will invoke additional tooling with XCode - is the slowest part.
In my experience anyway. I am genuinely curious!
I've only used it within XCode so can't say which is to blame, but I did frequently get the message "The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions" which seems to be a problem with the language/compiler/type system, not XCode per se
I agree with this, i've been avoiding xcode as much as possible for my little swift projects, but now I do wonder if that still stands for large codebases, I guess you could try and find some big codebases on swift on gh and see how much it takes to compile
[flagged]
One of which people?
A Rust evangelist!
Why aren't you written in Haskell?
The ones that never wrote a complex project in Rust but advocate that other do.
I actually hate Rust.
That was just a question based on curiosity. [ladybird's author does alot of Rust so I thought I would ask him]
In no way was I asking for the project to switch to it. I just wanted to know if it would help with the technicality.
Reentrancy bugs like this one are surprisingly common. Having reviewed lots of unsafe Rust code, unnoticed calls into outside code (that can then reenter your own code or modify your data structures, blowing everything up) is one of the most common soundness issues I've found across different projects.
The main solutions seem to be either restricting how possibly-invalidated data can be held (e.g., safe references in Rust), or having some coloring scheme (e.g., "pure" annotations) to ensure that the functions you call are unable to affect your data. Immutable languages can mitigate it somewhat, but only if you have the discipline to maintain a single source of truth for everything, and avoid operating on stale copies.
the solution? #[deny(unsafe_code)]
Any reasonably sophisticated web browser is going to require a decent amount of unsafe {} if only just for performance reasons. Obviously would be much easier to audit though.
Eh. It will work with your code but at some point your dependencies will have to dive into unsafe (e.g. calling C libs/kernel, SIMD, ASM by hand, etc.).
Minimize unsafe, auditing libs with Geiger, and minimizing outside dependencies to a few reliable vendors, is what is practically needed.
This is a big landmark. Ladybird has come far enough to be a worthy target for security research!
If this is all-new development, wouldn't it be good for the emphasis to be on correctness and security, as part of the design and coding itself?
That's something that you use fuzzing as one way to detect a failure of, not as the means of achieving correctness and security.
I'm not picking on Ladybird here specifically. Chrome and Firefox provide constant streams of security vulnerabilities. But it would be nice if Ladybird didn't start with the same problems that might be attributed to huge legacy code bases.
Ladybird comes from Serenity OS which has a focus of having fun and being pragmatic while building everything from scratch incrementally.
They do plan to switch to Swift: https://ladybird.org/#:~:text=Why%20build%20a%20new%20browse...
I appreciate their pragmatism though, it's allowed them to catch up to other alternative browsers in WPT coverage very quickly.
off topic, but I have never seen a link like yours before.
Today, I learned about Text Fragment Identifiers [0]. Thanks, very handy!
[0] https://web.dev/articles/text-fragments#text_fragments
Chrome and Edge have a context menu item to create a link like this when you select text ("Copy link to highlight").
Firefox 131 and up will highlight the relevant portion on the page but can't create new links in a user-friendly fashion.
> But [firefox] can't create new links in a user-friendly fashion.
It's not built-in, but there is https://addons.mozilla.org/en-US/firefox/addon/link-to-text-...
OK, fun is valid. And it's good to have expectations set.
Open source people who are looking for a more trustworthy browser than Firefox will have to look elsewhere, though.
Elsewhere… where? WebKit?
noscript
tbh i kinda love how they're just going for it and building from scratch but i always wonder how much focus on security upfront actually changes things long-term-you think building with fun in mind ends up missing critical stuff or does it keep devs more engaged
Of academic value, as ladybird has little in terms of sandboxing yet.
Cool regardless.
Even in a modern browser, a renderer exploit (the most sandboxed portion of the browser) gives you access to a large attack surface - the browser process via IPC, the kernel via syscalls, and loads of data from other websites.
So no, an exploit like this is not just “of academic value” even in a sandboxed browser.
With site isolation there's not loads of other websites in the renderer these days at least.
Assuming your site isolation works, at least. Some browsers were having trouble with it until pretty recently.
Haven't seen anyone using dwm in a while. I forgot how lean and mean it is =)
With decades and decades of memory safety lessons in the books, it's hard to imagine how C++ was the language of choice when starting new browser from scratch in 2018.
The browser was not started with the idea of taking over the main focus of development, it was just another part of an already pretty large hobby OS project
Fine. With decades and decades of memory safety lessons in the books, it's hard to imagine how C++ was the language of choice when starting new operating system from scratch in 2018.
It really isn't that hard to imagine someone starting a fun hobby project in the language they enjoyed and were the most comfortable with.
Dunno. It really is. Debugging memory corruption bugs in complex one-memory-space systems is very much not fun.
Nothing a little printf (or dbgln as it is known as in Serenity-Ladybird land) can't fix
Answer is here, although the article is outdated and the most recent news is that they are rewriting the browser at least in Swift.
https://awesomekling.github.io/Memory-safety-for-SerenityOS/
How is it outdated??
Their GitHub has 0,3% Swift code. They said they start once Swift 6 is out. It has been out for months. So either they abandoned Swift or haven’t really started or they are really really slow to start using it. All three options are against the article being outdated, wouldn’t you agree?
One of the primary Ladybird devs just gave a lightning talk at CppCon about porting their HTML parser from C++ to Swift.
https://www.youtube.com/watch?v=KCRx1jE6DnY
Current blockers to swift usage are found here: https://github.com/LadybirdBrowser/ladybird/issues/933 Rising tide lifts all boats, by trying to use Swift seriously, they're finding and helping fix bugs in the compiler
Because the article is from 2022 and says that they will use a custom language called Jakt which didn't pan out, it seems. Yes, I am also eager for the Swift rewrite to get off the ground.
Mostly because the author switched focus to yet another language, and eventually decided to focus on something else instead of programming languages.
https://github.com/sophiajt/june
When they started, the plan was mostly to have fun and see how far you can get when creating an OS from scratch. So picking a language in which they are experienced makes sense in that context.
One would think the same of C, where exploits trace all the way back to Morris worm in 1988, that is 36 years of thinking the problem are the developers, not the language, with new projects being started every day still.
At least C++ has mechanisms to write safer code, provided one makes use of them, even if still there are issues.
To use a modern example renaming the JavaScript file extension to a Typescript one, only gets you so far.
Then one can make use of Typescript's type system, or switch to Elm to the next level.
> One would think the same of C
I'm pretty sure that everyone does and did, because almost nobody wrote a browser in C either, never mind in 2018.
NetSurf from 2002 is the only one I can find?
edit: I should say after the first set, because Lynx and Mosaic are C.
Always good to start the discussion but the article doesn't seems to link to an issue on the Ladybird github repo, which I would expect in the case of academic disclosure etc.
Obviously nobody is really using Ladybird yet and there will be many more such issues to address, so now is a good time to evaluate how to avoid such mistakes up front.
Ah the github links are indeed there, my bad, it's a good write up.