Hard Mode Rust (2022)

(matklad.github.io)

136 points by rrampage 18 hours ago | 63 comments

I can't help but feel this is reintroducing some of the problems it claims to solve in working around the difficulties.

* The fact that there is an `Oom` error that can be thrown means that there is no increase in reliability: you still don't know how much memory you're going to need, but now you have the added problem of asking the user for how much memory you're going to be using — which they are going to be guessing blindly on!

* This is because the memory usage is not much more predictable than it would be in easy mode Rust. (Also that "mem.lem()/2" scratch space is kind of crappy; if you're going to do this, do it well. Perhaps in allocating the correct amount of scratch space, you end up with a dynamic allocator at the end of your memory. Does that sound like stack space at the start of memory and heap space at the end of memory? Yes, your programming language does that for you already, but built-in instead of bolted-on.)

* Furthermore, the "easy mode" code uses lots of Box, but if you want you can get the benefits of RAII without all the boxes by allocating owned vectors scrupulously. Then you get the benefit of an ownership tracking system in the language's typesystem without having to `unsafe` your way to a half reimplementation of the same. You can get your performance without most of the mess.

* Spaghetti can be avoided (if you so desire) in the same way as the previous point.

What you do achieve is that at least you can test that Oom condition. Perhaps what you actually want is an allocator that allows for simulating a particular max heap size.

Animats 14 hours ago | root | parent |

> I can't help but feel this is reintroducing some of the problems it claims to solve in working around the difficulties.

When I read "So we do need to write our own allocator", I winced. Almost every time I've had to hunt down a hard bug in a Rust crate, one that requires a debugger, it turns out to be someone who wrote their own unsafe allocator and botched the job.

flohofwoe 13 hours ago | root | parent |

That's why languages like Zig come with several specialized allocators implemented in the stdlib which can be plugged into any other code in the stdlib that needs to allocate memory. You can still implement your own allocator of course, but often there's one already in the stdlib that fits your need (especially also because allocators are stackable/composable).

jeroenhd 13 hours ago | root | parent |

Rust also has a bunch of tested allocators ready to use, though usually in the form of crates (jemalloc, mimalloc, slab allocators, what have you). The problems caused by people implementing custom allocators are no different in either Rust or Zig.

flohofwoe 12 hours ago | root | parent | next |

Thinking about it, with proper library design the 'custom allocator' should never be baked into a dependency, but instead injected by the dependency user all the way to the top (and the dependency user can then just inject a different more robust allocator). It's a design philosophy and convention though, so a bit hard to enforce by the language.

littlestymaar 10 hours ago | root | parent |

Not the language but the standard library can, requiring the allocator to be supplied to the containers at initialization. (Which zig does, or at least did when I looked at it a few years back).

The problem with that approach is that you end up with a “function coloring problem” akin to async/await (or `Result` for that matter). Function that allocate becomes “red” functions that can only be called by a function that itself allocates.

Like Result and async/await it has the benefit of making things more explicit in the code, but on the flip side it has this contaminating effect that forces you to refactor your code more than otherwise, and also cause combinatorial explosion in helper functions (iterator combinators for instance) if the number of such effect is too high, so there's a balance between explicitness and the burden of adding too many effects like these (or you need to go full “algebraic effect” in your language design but then the complexity budget of your language takes a big step, it's unsuitable for either for Rust or Zig which already have their own share of alien-ness (borrowck and comptime, respectively)).

flohofwoe 10 hours ago | root | parent |

Odin solves that with an implicit context pointer (which among others carries an allocator) that's passed down the call chain (but that also means that all code is 'colored' with the more expensive color of an implicitly passed extra pointer parameter).

A global allocator is IMHO worse than either alternative though.

Also, from my experience with Zig, it's not such a big problem. It's actually a good thing to know which code allocates and which doesn't, actually improves trust when working with other people's code

SleepyMyroslav 5 hours ago | root | parent | next |

>A global allocator is IMHO worse than either alternative though.

Just fyi, console games settled on global allocator. Because everyone allocates and game needs to run at say 7 out 8Gb used consistently. It makes folks passing around pointers to allocators completely wasting their time and space. There are small parts of code with explicit pooling and allocator pointers but they are like 5% of total code.

It is funny when C++17 standard got PMR allocators that make a dream of explicit passing allocators around come true then folks noticed that 8 bytes in every string object are not that cheap. There are very small islands of usage of PMR allocators in the library ecosystem.

It does not make global allocators universal truth. It just shows that tradeoffs are different.

littlestymaar 10 hours ago | root | parent | prev |

I tend to like the more explicit approach in general (I like async/await and Result more than exceprion typically) but at the same time I acknowledge that there's a trade-off and that a language cannot make everything explicit without becoming unusable, so each language must make their own choice in terms of which effect is going to be explicitly written down in code and which will not. And it's going to depend a lot on the expected use case for the language.

With Zig aiming at the kind of code you can write in C it doesn't surprise me that it works pretty well. (Also I'm a bit sceptical about the actual future of the language, which IMHO came a good decade too late, if not two: I feel that Zig could have succeeded where D couldn't, but I don't see it achieving anything nowadays, as the value proposition is too low IMHO. Except as a toolchain for C cross compilation actually, but not as a language).

galangalalgol 9 hours ago | root | parent |

Austral could use its linear types to deny any dependency the ability to allocate memory, just like it can deny io, giving those dependencies access through a provided type. They did it for supply chain safety, but it also lets you specify how your entire dependency tree handles all the system resources.

Chabsff 7 hours ago | root | parent | prev |

You make the distinction between something being in the standard library vs an arbitrary external package sound like a minor detail.

It's not. It makes a world of difference. Having a hard guarantee that you are not dragging any transient dependency is often a big deal. Not to mention the maintenance guarantees.

throwaway2037 15 hours ago | prev | next |

The article starts with:

    > This criticism is aimed at RAII — the language-defining feature of C++, which was wholesale imported to Rust as well.

    > because allocating resources becomes easy, RAII encourages a sloppy attitude to resources

Then lists 4x bullet points about why it is bad.

I never once heard this criticism for RAII in C++. Am I missing something? Coming from C to C++, RAII was a godsend for me regarding resource clean-up.

flohofwoe 14 hours ago | root | parent | next |

It's quite common knowledge in the gamedev world (at least the part that writes C++ code, the other parts that uses C# or JS has roughly the same problem, just caused by unpredictable GC behaviour).

Towards the end of the 90s, game development transitioned from C to C++, and was also infected by the almighty OOP brain virus. The result was often code which allocated each tiny object on the heap and managed object lifetime through smart pointers (often shared pointers for everything). Then deep into development after a million lines of code there's 'suddenly' thousands of tiny alloc/frees per frame and all over the codebase but hidden from view because the free happens implicitly via RAII - and I have to admit in shame that I was actively contributing to this code style in my youth before the quite obvious realization (supported by hard profiling data) that automatic memory management isn't free ;)

Also the infamous '25k allocs per keystroke' in Chrome because of sloppy std::string usage:

https://groups.google.com/a/chromium.org/g/chromium-dev/c/EU...

Apart from performance, the other problem is debuggability. If you have a lot of 'memory allocation noise' to sift though in the memory debugger, it's hard to find the one allocation that causes problems.

jpc0 10 hours ago | root | parent | next |

> It's quite common knowledge in the gamedev world (at least the part that writes C++ code, the other parts that uses C# or JS has roughly the same problem, just caused by unpredictable GC behaviour).

Isn't it pretty common in gamedev to use bump allocators in a pool purely because of this. Actually isn't that pretty common in a lot of performance critical code because it is significantly more efficient?

I feel like RAII doesn't cause this, RAII solves resource leaks but if you turn off you brain you are still going to have the same issue in C. I mean how common is it in C to have a END: or FREE: or CLEANUP: label with a goto in a code path. That also is an allocation and a free in a scope just like you would have in C++...

flohofwoe 10 hours ago | root | parent |

> Isn't it pretty common in gamedev to use bump allocators

Yes, but such allocators mostly only make sense when you can simply reset the entire allocator without having to call destruction or housekeeping code for each item, e.g RAII cleanup wouldn't help much for individual allocated items, at most to discard the entire allocator. But once you only have a few allocators instead of thousands of individual items to track, RAII isn't all that useful either since keeping track of a handful things is also trivial with manual memory management.

I think it's mostly about RAII being so convenient that you stop thinking about memory management cost, garbage collectors have that exact same problem, they give you the illusion of a perfect memory system which you don't need to worry about. And then the cost slowly gets bigger and bigger until it can't be ignored anymore (e.g. I bet nobody on the Chrome team explicitly wanted a keystroke to make 25000 memory allocations, it just slowly grew under the hood unnoticed until somebody cared to look).

Many codebases might never get to the point were automatic memory management becomes a problem, but when it becomes a problem then it's often too late to fix because that problem is smeared over the entire codebade.

jpc0 2 hours ago | root | parent |

Nobody says that an object with memory allocated using a bump allocator can't use RAII for other uses. The destructor could even call delete on the pointer to the memory in the bump allocator and the bump allocator can happily implement delete as a noop or a tombstone operation of some sort. Now this does mean when you design your objects (see some talks on objects vs values for nuance there, oversimplified TLDR object exist in a place in memory but values are things that can be passed around) you need to make them aware of the allocator that allocated them. Values are generally stored in containers which the STL ones are allocator aware and you should be making your custom ones allocator aware too.

Generally the expensive part of memory allocation isn't calling malloc, it's the underlying operations, when the underlying operation is +sizeof(T) and free is a noop you can happily keep using RAII and not care.

CLARIFICATION: I'm saying having a T* member is an antipattern, make it std::unique_ptr with a custom deleter or it should be in a container, struct of arrays...

pjmlp 9 hours ago | root | parent | prev |

And yet CryEngine, Unreal, and many others do just fine.

panstromek 8 hours ago | root | parent |

Not sure if you're joking or not, actually. Both of those are pretty well known for performance problems - especially in games from the mentioned era. When Crysis came out, it was notoriously laggy and the lag never really went away. Unreal lag is also super typical, I can sometimes guess if the game is using Unreal just based on that. This might have different causes, or game-specific ones, but the variability of it seems to match the non-deterministic nature of this kind of allocation scheme.

pjmlp 8 hours ago | root | parent |

And naturally every game influencer complaining about those engines has been able to prove that memory management was the issue of their complaints.

panstromek 8 hours ago | root | parent |

As much as you were able to prove that CryEngine, Unreal, and many others do just fine.

pjmlp 7 hours ago | root | parent |

I have sales numbers, and market share adoption across the industry, proving that they do just fine, on my side.

Whereas I am expecting profiler data showing memory issues caused by RAII patterns, from your side, or any of those influencers on whatever platform they complain about.

panstromek 7 hours ago | root | parent | next |

Yea, because sales numbers are a measure of software performance and therefore Jira is the fastest software in the observable universe.

pjmlp 6 hours ago | root | parent |

Then don't complain about what you cannot prove.

itishappy 4 hours ago | root | parent | prev |

> I have sales numbers, and market share adoption across the industry, proving that they do just fine, on my side.

This proves what about their resource usage?

pjmlp 4 hours ago | root | parent |

It proves they stand above the competition, across all levels, selected by AAA studios that keep delivering across the industry.

They should know a couple of things about how to deliver something into production.

Additionally influencers complaining about them, usually never even wrote a basic hello world, let alone understand how memory management works.

itishappy 3 hours ago | root | parent |

I'm sorry, how is that related to the topic of performance?

pjmlp 3 hours ago | root | parent |

It is something AAA game studios tend to care about when picking their tools of trade.

itishappy 3 hours ago | root | parent |

So's popularity. I'm sure they consider a lot when picking tools. You're claiming this proves something that requires hard evidence to disprove, but I don't think the connection is as strong as you imply.

So, in the context of a thread about resource usage, what does the commercial success of the tools prove?

In language performance benchmarks, you often see C++ being slower than C. Well, that's a big part of the reason. There is no real reason for C++ to be slower since you can just write C code in C++, with maybe a few details like "restrict" that are not even used that much.

The big reason is that RAII encourage fine grained resource allocation and deallocaion, while explicit malloc() and free() will encourage batch allocation and deallocation, in-place modification, and reuse of previously allocated buffers, which are more efficient.

The part about "out of memory" situations is usually not of concern on a PC, memory is plentiful and the OS will manage the situation for you, which may include killing the process in a way you can't do much about. But on embedded systems, it matters.

Often, you won't hear these criticisms. C programmers are not the most vocal about programming languages, the opposite of Rust programmers I would say. I guess that's a cultural thing, with C being simple and stable. C++ programmers are more vocal, but most are happy about RAII, so they won't complain, I can see game developers as an exception though.

> Poor performance. Usually, it is significantly more efficient to allocate and free resources in batches.

This one is big. It is a lot of accounting work to individually allocate and deallocate a lot of objects.

The Zig approach is to force you to decide on and write all of your allocation and deallocation code, which I found leads to more performant code almost by default – Rust is explicitly leaving that on the table. C obviously works the same way, but doesn't have an arena allocator in the standard library.

Re: C++ vs Rust, it might be more of a pain in Rust because of this: https://news.ycombinator.com/item?id=33637092

ladyanita22 14 hours ago | root | parent |

I am happy that Rust defaults to the easier, saner, safer approach by default, but lets you bypass RAII if you want to do so.

conradev 4 hours ago | root | parent |

Yeah, I agree. It makes sense and I'm glad that Drop is in no way guaranteed. I am excited for allocator-api to eventually stabilize, too!

You're most likely to hear it from kernel programmers and Go/Zig/Odin advocates, but it rarely comes up as a criticism of C++ in particular. Perhaps that's because RAII is merely "distasteful" in that cohort, whereas there are many other qualities of C++ that might be considered total showstoppers for adoption before matters of taste are ever on the table.

There was an HN thread a few months ago[0] debating whether RAII is reason enough to disqualify Rust as a systems language, with some strong opinions in both directions.

[0]: https://news.ycombinator.com/item?id=42291417

SleepyMyroslav 4 hours ago | root | parent |

Thx for linked thread. It was an interesting read to me.

I have got a feeling that the division comes from two different system programming folks. Hard realtime folks see unbounded time and uh-oh its fatal. On other side soft realtime folks ask about probabilities and profiling data and its relatively fine for them if probability is low enough. Where both of the camps would agree if number of effects is bound somehow.

XorNot 12 hours ago | root | parent | prev |

The "hard mode" described in this article covers an issue I do keep running into with Golang funnily enough: I really hate the global imports, and I really want to just use dependency injection for things like OS-level syscalls...which is really what's happening here largely.

So it's interesting to see something very similar crop up here in a different language and domain: throw a true barrier down on when and how your code can request resources, as an explicit acknowledgement that the usage of them is a significant thing.

nayuki 3 hours ago | prev | next |

> Hard Mode means that you split your program into std binary and #![no_std] no-alloc library. Only the small binary is allowed to directly ask OS for resources. For the library, all resources must be injected. In particular, to do memory allocation, the library receives a slice of bytes of a fixed size, and should use that for all storage.

I did exactly this in my QR Code generator library, C port and second Rust port: https://github.com/nayuki/QR-Code-generator/blob/master/c/qr... , https://github.com/nayuki/QR-Code-generator/blob/master/rust...

Writing the Rust code felt way safer because the language took care of enforcing rules about borrowing parts of the buffer.

kelnos 15 hours ago | prev | next |

The criticism of RAII is a little odd to me. The author list four bullet points as to why RAII is bad, but I don't think I've ever found them to be an issue in practice. When I'm writing C (which of course does not have RAII), I rarely think all that deeply about how much I am allocating. But I also don't write C for embedded/constrained devices anymore.

eptcyka 14 hours ago | root | parent | next |

Allocating in the hot path degrades the performance unless you're using an allocator that is designed for that and you are hand-tuning to not need to allocate more memory. This can be a game or anything with a hot path you care about, doesn't have to be deployed on an embedded device.

flohofwoe 14 hours ago | root | parent | prev |

It mostly becomes a problem at scale when you need to juggle tens or hundreds of thousands of 'objects' with unpredictable lifetimes and complex interrelationships. The cases where I have seen granular memory allocation become a problem were mostly game code bases which started small and simple (e.g. a programmer implemented a new system with let's say a hundred items to handle in mind, but for one or another reason, 3 years later that system needs to handle tens of thousands of items).

siev 10 hours ago | prev | next |

Moving all allocations outside the library was also explored in [Minimalist C Libraries](https://nullprogram.com/blog/2018/06/10/)

feverzsj 13 hours ago | prev | next |

So, it's just arena allocator, which is still RAII.

tialaramex 9 hours ago | prev | next |

Probably wants a (2022) acknowledgement that this is a blog post from 2022.

kookamamie 11 hours ago | prev | next |

> Ray tracing is an embarrassingly parallel task

Yet, most implementations do not consider SIMD parallelism, or they do it in a non-robust fashion, trusting the compiler to auto-vectorize.

agentultra 7 hours ago | prev | next |

I think Haskell also pushes you in this direction. It gets super annoying to annotate every function with ‘IO’ and really difficult to test.

Sure, you still have garbage collection but it’s generally for intermediate, short lived values if you follow this pattern of allocating resources in main and divvying them out to your pure code as needed.

You can end up with some patterns that seem weird from an imperative point of view in order to keep ‘IO’ scoped to main, but it’s worth it in my experience.

Update: missing word

akshayshah 15 hours ago | prev | next |

It’s interesting that the author now works on TigerBeetle (written in Zig). As I understand it, TigerBeetle’s style guide leans heavily on this style of resource management.

gizmondo 10 hours ago | root | parent | next |

Here the author did a small comparison between the languages in the context of TigerBeetle: https://matklad.github.io/2023/03/26/zig-and-rust.html

messe 15 hours ago | root | parent | prev |

Explicitly marking areas of code that allocate is a core part of idiomatic Zig, so that doesn't surprise me all that much.

witx 11 hours ago | prev | next |

These are some really weird arguments against RAII. First and foremost it is not only used for memory allocation. Second the fact that we have RAII doesn't mean it is used like "std::lock_guard" to acquire and free the resource in the same "lifetime", always. Actually in 10+ years of c++ that's like 1% of what I use RAII for

The only point I agree with is the deallocation in batches being more efficient.

> Lack of predictability. It usually is impossible to predict up-front how much resources will the program consume. Instead, resource-consumption is observed empirically.

I really don´t understand this point.

jokoon 8 hours ago | prev | next |

I know that the C++ syntax can be messy, but to me the rust syntax can reach above levels of complex and difficult to read.

Maybe I should practice it a bit more.

purplesyringa 7 hours ago | prev | next |

> This… probably is the most sketchy part of the whole endeavor. It is `unsafe`, requires lifetimes casing, and I actually can’t get it past miri. But it should be fine, right?

I just can't. If you're ignoring the UB checker, what are you even doing in Rust? I understand that "But it should be fine, right?" is sarcastic, but I don't understand why anyone would deliberately ignore core Rust features (applies both to RAII and opsem).

forrestthewoods 12 hours ago | prev | next |

Interesting conclusion. I recently tried something similar and gave up. The extra lifetimes are awful. Including in author’s post, imho.

Rust sucks at arenas. I love Rust. But you have to embrace RAII spaghetti or you’re gonna have an awful, miserable time.

queuebert 6 hours ago | root | parent |

Fortran 77 common blocks are the GOATs of arena allocation.

duped 14 hours ago | prev | next |

I usually like matklad's writings but I have to be overly dismissive for a change.

This is a skill issue.

C, C++, Rust, it doesn't matter - if you have a systems problem where you are concerned about resource constraints and you need to care about worst-case resource allocation, you must consider the worst case when you are designing the program. RAII does not solve this problem, it never did, and anyone who thinks it does needs to practice programming more.

It is not "hard" to solve though. It's just another thing your compiler can't track and you need to be reasonably intelligent about. There are sections of a program that cannot tolerate resource exhaustion, for which you set some limit, which is then programmable in some way, which then uses some shared state to acquire and release things of which RAII may help. But you still must handle or otherwise report failures.

This is a bread and butter architecture concern for systems programming and if you're not doing it, get good at it.

flohofwoe 14 hours ago | root | parent | next |

Just like writing memory safe code in C is a skill issue right? ;)

(I write mostly C code myself, but I'm also not immune to f*cking up now and then even though I write code for decades now).

writebetterc 8 hours ago | root | parent | next |

You sure can write C which makes you less prone to making memory safety mistakes, yes. Knowing how to do that is a skill issue. As you say, sometimes our skills fail us, and that's where tooling comes in. ASan, borrow checker, whatever. There's a reason that a lot of Rust programmers find their programs filled with Arc<Mutex<T>>. That's because they didn't have the skills to write memory safe systems programs in the first place.

To be clear, of course I'm also not immune to fucking up :-).

dapperdrake 12 hours ago | root | parent | prev |

ASCII NUL terminated strings and double free don’t sound quite the same as gauging hardware requirements. Either you need a certain amount of memory or you don’t.

dzaima 13 hours ago | root | parent | prev |

As long the language in question is turing-complete, everything is equally a skill issue.

dapperdrake 12 hours ago | root | parent |

No. You seem to be thinking about computable functions. Not hardware dimensioning. It’s about resource usage on a finite physical computer. A Turing machine has a countably infinite tape.

dzaima 9 hours ago | root | parent |

Yeah, I was being overly dismissive of your comment :)

But beyond obvious hard differences in what languages in question allow about a given topic (of which indeed there are none between C/C++/Rust/Zig on the topic of memory utilization & co), everything is still just a skill issue.

More seriously, afaict the article in no way even mentions RAII solving anything about resource constraints (not even as a hypothetical thing to disprove); only about RAII simplifying doing basic management logic (at the cost of worsening reasonability about resources!).

Thinking about resource constraints is unquestionably harder than not thinking about them, which is what the article is about.

WhereIsTheTruth 12 hours ago | prev | next |

C++ problem isn't just RAII, it's the language becomes an unreadable mess, just like this Rust code

C++ and Rust share the same criticism

- poor readability

- poor code navigation

- poor compilation speed

I find it interesting how they call simplicity "hard mode", quite concerning

mplanchard 9 hours ago | prev | next |

(2022)

8 hours ago | prev | next |

[deleted]

7 hours ago | prev |

[deleted]