Safe zero-copy operations in C#

(ssg.dev)

106 points | by sedatk 9 hours ago ago

22 comments

  • bob1029 6 hours ago

    > Spans and slice-like structures in are the future of safe memory operations in modern programming languages. Embrace them.

    I've been using Span<T> very aggressively and it makes a massive difference in cases where you need logical views into the same physical memory. All of my code has been rewritten to operate in terms of spans instead of arrays where possible.

    It can be easy to overlook ToArray() or (more likely) code that implies its use in a large codebase. Even small, occasional allocations are all it takes to move your working set out of the happy place and get the GC cranking. The difference in performance can be unreasonable in some cases.

    You can even do things like:

      var arena = stackalloc byte[1024];
      var segment0 = arena.Slice(10);
      var segment1 = arena.Slice(10, 200);
      ...
    
    The above will incur no GC pressure/activity at all. Everything happens on the stack.
    • rurban 3 hours ago

      Too many stack accesses kill performance also, because the stack is always relative. For any variable access you need an offset, whilst for a malloced access you don't.

      Just a few days ago I changed stack access to heap access in a larger generated perfect hash and got 100x better performance. It really hit the stack too hard, and the compiler didn't do the obvious optimizations.

      • uecker an hour ago

        I do not understand this comment. A stack access should never be more expensive than access to malloced data. This is easy to see: Malloc gives you a pointer. An address computation to find a variable on the stack also gives you a pointer but is much cheaper than malloc. If the compiler recomputes the address to a stack variable then this must be because it is deemed cheaper than wasting a register to cache the pointer. Nowadays compilers can transform malloc to stack allocations variables in certain cases.

      • adrian_b an hour ago

        That looks more like the effect of a bad compiler.

        On most modern CPUs, including Intel/AMD & ARM-based, memory accesses with relative-addressing to the stack have at least the same performance if not much better performance than memory accesses through pointers to the heap. The stack is also normally cached automatically by the memory prefetcher.

        So whichever was the cause of your performance increase was not an inefficiency of the stack accesses, but some other difference in the code generated by the compiler. Such an unexpected and big performance difference could normally be caused only by a compiler bug.

        A special case when variables allocated in the stack can lead to low performance is when a huge amount of variables are allocated or many arrays are allocated, and then they are only sparsely used and the initial stack is much smaller. Then any new allocation of an array or big bunch of variables may exceed the stack size, which will cause a page fault, so that the operating system will grow the stack by one memory page.

        This kind of transparent memory allocation by page faults can be much slower than the explicit memory allocation done by malloc or new. This is why big arrays should normally be allocated either statically or in the heap.

      • davidatbu 2 hours ago

        Super interesting! Was this C# or something? is there a write-up/mini-blogpost about this somewhere?

  • buybackoff 4 hours ago

    For working with arrays elements without bound checks, this is the modern alternative to pointers, without object pinning for GC and "unsafe" keyword: MemoryMarshal. GetArrayDataReference<T>(T[]). This is still totally unsafe, but is "modern safer unsafe" that works with `ref`s and makes friends with System.Runtime.CompileServices.Unfafe.

    Funny point: the verbosity of this method and SRCS.Unsafe ones make them look slower vs pointers at subconscious level for me, but they are as fast if not faster to juggle with knifes in C#.

    The `fixed` keyword is mostly for fast transient pinning of data. Raw pointers from `fixed` remain handy in some cases, e.g. for alignment when working with AVX, but even this can be done with `ref`s, which can reference an already pinned array from Pinned Object Heap or native memory. Most APIs accept `ref`s and GC continues tracking underlying objects.

    See the subtle difference here for common misuse of fixed to get array data pointer: https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC2A3ut...

    Spans are great, but sometimes raw `ref`s are a better fit for a task, to get the last bits of performance.

  • progmetaldev 6 hours ago

    I truly appreciate articles like this. I am using the Umbraco CMS, and have written code to use lower than the recommended requirements to keep the entire system running. While I don't see a use for using a Span<T> yet, I could definitely see it being useful for a website with an enormous amount of content.

    I am currently looking into making use of "public readonly record struct" for the models that I create for my views. Of course, I need to performance profile the code versus using standard classes with readonly properties where appropriate, but since most of my code is short-lived for pulling from the CMS to hydrate classes for the views, I'm not sure how much of a benefit I will get. Luckily I'm in a position to work on squeezing as much performance as possible between major projects.

    I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS, where the pages are usually fire and forget? I have spent years (since 2013) trying to squeeze every ounce of performance from the code, as I work with quite a few smaller businesses, and even the rest of my team are starting to look into Wix or Squarespace, since it doesn't require a "me" to be involved to get a site up and running.

    To my credit and/or surprise, I haven't dealt with a breach to my knowledge, and I read logs and am constantly reviewing code as it is my passion (at least working within the confines of the Umbraco CMS, although it isn't my only place of knowledge). I used to work with PHP and CodeIgniter pre-2013 (then Kohana a bit while making the jump from PHP to .NET). I enjoy C#, and feel like I am able to gain quite a bit of performance from it, but if anyone has any ideas for me on how to create even more value from this, I would be extremely interested.

    • fabian2k 40 minutes ago

      For a CMS I'd usually suspect the major bottlenecks to be in the DB queries. Especially when the language is already pretty fast by default like C#.

      You really need to measure before going to low level optimizations like this. Odds are in this case that the overhead is in the framework/CMS, and you gain the most by understanding how it works and how to use it better.

      Span<T> is really more of an optimization you should pay attention to when you write lower level library code.

    • MarkSweep 5 hours ago

      > I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS

      This response is not directly answering that "in a .NET CMS" part of your question. I'm just trying to say how to think about when to worry about optimizations.

      These sorts of micro optimizations are best considered when your are trying to solve a particular performance problem, particularly when you are dealing with a site that is not getting a lot of hits. I've experienced using small business ecommerce websites where each page load takes 5 seconds and given up trying to buy something. In that case profiling the site and figuring out the problem is very worth while.

      When you have a site getting a lot of hits, these sorts of performance optimizations can help you save cost. If your service takes 100 servers to run and you can find some performance tweaks to get down to 75 server, that may be worth the engineering effort.

      My recommendation is to use a profiler of some type. Either on your application in aggregate to identify hot spots in in search of the source of a particular performance problem. Once you identify a hot spot, construct a micro benchmark of the problem in BenchmarkDotNet and try to use tools like Span<T> to fix the problem.

    • jiggawatts 5 hours ago

      For a CMS or any similar situation, you can get huge performance improvements from higher level changes than Span<T>. Using the HTTP cache-control headers correctly in conjunction with a CDN can provide an order of magnitude improvement. Simply sending less HTML/CSS/JS by using a more efficient layout template can similarly have a multiplier effect on the entire site.

      In my experience, the biggest wins by far were achieved by using the network tab of the browser F12 tools. The next biggest was Azure Application Insights profiler running in production. Look at the top ten most expensive database queries and tune them to death.

      The use of Span<T> and the like is much more important for the authors of shared libraries more than "end users" writing a web app. Speaking of which, you can increase your usage of it by simply updating your NuGet package versions, .NET framework version to 9 or 10, etc... This will provide thousands of such micro optimisations for very little effort!

  • pjmlp an hour ago

    This is a good example to learn how to use the tools a programming language offers, just saying a programing language has a GC thus bad is meaningless, without understanding what is actually available.

    Regarding,

    > Spans and slice-like structures in are the future of safe memory operations in modern programming languages.

    It is sad how long stuff takes to reach mainstream technology, in Oberon the equivalent declaration to partition would be,

        PROCEDURE partition(span: ARRAY OF INTEGER): INTEGER
    
    And if the type is the special case of ARRAY OF BYTE (need to import SYSTEM for that), then any type representation can be mapped into a span of bytes.

    You will find similar capabilities in Cedar, Modula-2+, Modula-3, among several others.

    Modern safe memory langaguage are finally catching up with the 1990's research, pity it always takes this much for adoption of cool ideas.

    Having said this, I feel modern .NET has all the features that made me like Modula-3 back in the day, even if some are a bit convoluted like inline arrays in structs.

  • hahn-kev 5 hours ago

    I'm a C# dev main and love spans.

    I understand it's not in the same realm as Rust, but how comparable is this to some of the power that Rust gives you?

    • afdbcreid 3 hours ago

      It's a taste. C# allows the more common patterns, but Rust allows much more.

      For example, Rust allows the equivalent of storing `Span<T>` (called slice in Rust) everywhere (including on the heap, although this is rare).

      • sedatk 2 hours ago

        C# has a separate heap storable span equivalent called Memory<T>.

        • afdbcreid 20 minutes ago

          AFAIK it's less efficient though.

      • Rohansi 2 hours ago

        The restriction in C# comes from its ability to reference stack allocated memory. I'm not familiar with Rust but it probably figures it out based on the lifetime of T.

  • uecker 2 hours ago
    • aw1621107 7 minutes ago

      I don't think that's quite a 1-to-1 match for what's described in the article. Both C#'s Span<T> and your span type are type- and bounds-safe, but the former has additional restrictions placed on its usage thanks to the `ref` keyword that guarantee that it will be free of lifetime errors as well without needing to involve the runtime.

  • Freedom2 5 hours ago

    I recall a specific project involving a network appliance that generated large log streams. Our bottleneck was the log parser, which was aggressively using string.Substring() to isolate fields. This approach continuously allocated new string objects on the heap, which led to excessive pressure on the GC.

    The transition to using ReadOnlySpan<char> immediately addressed the allocation issue. We were able to represent slices of the incoming buffer without any heap allocations and the parser logic was simplified significantly.

  • smilekzs 5 hours ago

    Anecdote: 9 years ago I was at MSFT. Hands forced by long GC pauses, eventually many teams turned to hand-rolling their flavor of string_view in C#. It was literally xkcd.com/927 back then when you tried to interface with some other team's packages and each side has the same but different string_view classes. Glad to see that finally enjoying language and stdlib support.

    • ZeroConcerns 3 hours ago

      (ReadOnly)Span<T> has been available for 8 years now, and even before that, in the legacy Framework, there were common readonly-ref string slicers.

      • pjmlp an hour ago

        ArraySegment for example.