TDLR: I come to conclusion that my computer with 700W is using more that 1000W supported by PSU. Need confirmation before waste more $$$

Once and while my GPU crash, small freezes or window using gpu become noise. System still running, I can connect using ssh from other machine and kill everything to restart UI.

I am running prometheus/node_exporter that is collected by a raspberry pi, so there are a bunch of standard metrics.

First I was suspicious about temperature. Yes it get hot, but don’t appears to be clear. Sometimes work well for long period of time on hot.

Looking into the metrics I found “node_hwmon_in_volts”, gauge, “node_hwmon_in_volts Hardware monitor for voltage (input)”. That is the only electronic metric I have, the motherboard don’t appears to have a good driver for linux. I didn’t find

I have 2 other intel computers and none report that, but both my AMD and raspberry pi report it. Is the “Power” in the chart. The raspberry pi report 12, that I read as 12V, but on my AMD computer normally below 1.0.

When idling, it stay on 0.7. On load fluctuate a bit. On heavy load it goes over 1.0 many times (red line). While some times ti goes without issue, I start to see the pattern that when above 1.0, its has tendency to work bad and crash, like when doing AI or player heavy. When I downgrade the graphic “playing low”, no issue.

According with partpick my computer should use around 700W. Multiplied by 1.5 (as normally recommended) I have 1050W. So I bought a Cougar GEX x2 1000W. That according to cultists psu-tier-list it is a recommended B tier. So should be good.

Does my logic make any sense? Does anyone have a better suggestion? Can be a different problem?

  • stuner@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    8 months ago

    If you exceed the capacity of the PSU and trip one of the protection circuits, it should completely cut power. When that happened to me, it needed a power cycle before it would boot again. So I’d say that something goes wrong after the PSU. It could still be a voltage drop at the GPU (see other comment regarding cables). Maybe even just a driver/software issue.