• 0 Posts
  • 14 Comments
Joined 1 year ago
cake
Cake day: July 1st, 2023

help-circle





  • It’s extremely common in Enterprise where costs for a 100k+ server isn’t the most expensive part of running, maintaining, servicing said server. If your home lab isn’t practicing 3-2-1 backups (at least three copies of your data, two local (on-site) but on different media/devices, and at least one copy off-site) yet, I’d spend money on that before ECC.


  • From the link:

    @PriorProjectEnglish7

    The answers in this thread are surprisingly complex, and though they contain true technical facts, their conclusions are generally wrong in terms of what it takes to maintain file integrity. The simple answer is that ECC ram in a networked file server can only protect against memory corruption in the filesystem, but memory corruption can also occur in application code and that’s enough to corrupt a file even if the file server faithfully records the broken bytestream produced by the app.

    If you run a Postgres container, and the non-ecc DB process bitflips a key or value, the ECC networked filesystem will faithfully record that corrupted key or value. If the DB bitflips a critical metadata structure in the db file-format, the db file will get corrupted even though the ECC networked filesystem recorded those corrupt bits faithfully and even though the filesystem metadata is intact.
    If you run a video transcoding container and it experiences bitflips, that can result in visual glitches or in the video metadata being invalid… again even if the networked filesystem records those corrupt bits faithfully and the filesystem metadata is fully intact.
    

    ECC in the file server prevents complete filesystem loss due to corruption of key FS metadata structures (or at least memory bit-flips… but modern checksumming fs’s like ZFS protect against bit-flips in the storage pretty well). And it protects from individual file loss due to bitflips in the file server. It does NOT protect from the app container corrupting the stream of bytes written to an individual file, which is opaque to the filesystem but which is nonetheless structured data that can be corrupted by the app. If you want ECC-levels of integrity you need to run ECC at all points in the pipeline that are writing data.

    That said, I’ve never run an ECC box in my homelab, have never knowingly experienced corruption due to bit flips, and have never knowingly had a file corruption that mattered despite storing and using many terabytes of data. If I care enough about integrity to care about ECC, I probably also care enough to run multiple pipelines on independent hardware and cross-check their results. It’s not something I would lose sleep over.


  • DDR5 has built in data checking which is ECC without the automatic correction which might be worthwhile depending on your setup.

    Your ECC on the pi i believe isn’t for the memory chip but for the on chip die’s cache for ARM.

    For me personally, if my racked server supports it, I get ECC. If it doesn’t, I don’t sweat it. Redundance in drives, power, and networking is much more important to me and are order of magnitudes higher chance of failing from my anecdotal experience. If I can save those dollars for another higher probably failure, I do that.

    DNS is a lynchpin of my network (and wife approval factor) which I splurge a bit for with physical redundance of an identical mini computer that runs it and fail over to same ip if the first box fails. Those considerations are way before if the server has ECC. Just my $0.02.








  • I’m going on 25+ years and at principal eng/architect level. My take would be to find something, try it, and find if it excites you. There isn’t a wrong answer. At worst you’ll become a generalist, fluent with more and more until you find a niche in an array of things you’re conversant in. At best you’ll dive deep into a specific area and become more and more of an expert on a topic.

    Right now I’m really into rust, rewriting tons I’ve done in the past with more experience under my belt, and learning more about web assembly. Running rust in web assembly on any platform including the user’s browser without really having to think about distribution targets is something that excites me. I think I can gleam a future that might compete with how revolutionary kubernetes has been, but even if I’m wrong the things I’ve learned will still hold up.

    If the huge array of things overwhelms you, find a problem and try to solve it. Just the act of doing that and heading into that rabbit hole can open up new worlds you never even knew existed, and helps strengthen one of what I would consider the best qualities in good devs: competent independent troubleshooting. The fun I’ve had trying my hand at bypassing att router restrictions, extracting certificates from roms, architecting my home network with self hosted kubernetes and all the home automation stuff, low level c embedded systems programming for homemade iot sensors… The things you can do with tech is usually always in reach of anyone with some time and an Internet connection.

    Also, don’t neglect the open source community. Start a project, contribute to someone else’s… Probably the biggest leap I took as a dev consisted of a simple change to a large oss project. The mentality, guardrails, rule self imposed on the project we’re incredibly impressive to me and I learned so much about the benefits of code quality, good review, automated, well everything, really opened my eyes to what a small team can do given a common goal they are passionate about, something that at times can be missing from enterprises that might have profit as king.

    Let us know what you end up at. You never know if you might inspire another dozen people with something that interests you. Good luck!