The general consensus among experienced players on the forums has been that the actual instances of RNG-produced unwinnable situations (where there were no actions you could have possibly taken to avoid dying, or getting into a situation where you would have died) are exceedingly rare, to the point where the noise they produce isn't going to be a significant factor in any evaluation with a robust data set.
That might be broadly true when you take winrates in crawls as a whole, but remember in this thread some of the discussion is centering on the exact portion of the game where RNG deaths are (by far) most likely to happen: the first few levels of dungeon (D:1 in particular). When you focus the scope so narrowly on this area you have less opportunity for it to wash out.
More dangerously, I've not seen conclusive data or even a consistent means of evaluating "RNG death" vs "death to obvious mistakes" vs "death to choices made that a crawl-equivalent of alphazero might have avoided". I do know broadly, as a somewhat experienced player myself now, that very few games are unwinnable, but reject that anybody w/o substantive, non-anecdotal/recency biased data should have *confidence* in making estimates.
The problem isn't that win rates using player's best efforts to win aren't reflective of difficulty, but rather that the signal to noise ratio in the dataset is too large to derive actual difficulty from it. There's flat-out no way to say "this game they weren't really specifically trying their hardest to win".
We have roughly as much evidence supporting this as we have when players want to blame their losses on RNG though. Our "noise" filters are, at best, very crude.
Is it really unfair to claim that the vast majority of players that begin a game of crawl attempt to win it? Is there a non-arbitrary way to parse a "bad player mistake that gets them killed" from an "80% winrate player mistake that nevertheless still gets them killed when alphaDCSS wouldn't have died"? I don't see a clear reason for the distinction, right now.
I don't know if death by "Eh I might be able to escape this horrible situation if I burn all these consumables and subsequently try to sneak past this too-hard combat by trying enough times and waiting and shouting in enough locations, but I also might beat it with a little luck. F-it, let's give it a try, worst case I lose 10 minutes" can really be considered "difficulty".
It's actually non-trivial to define difficulty in a way that would not count such scenarios but still consistently count things you would consider "real difficulty". Quoted scenario is not unlike a failed early rush attempt in a TBS/RTS for example, complete with the benefit if your gamble pays off (more XP and more consumables still available early should you succeed, but lower expected success rate on average compared to alternative strategies...also lower IRL time investment).
My sense is that there's a very large number of deaths which fall into the category of "I don't feel like expending the effort on what I know is probably more likely to keep me alive, when there's a reasonable chance I'll survive with much much less effort", There's also a not-insignificant number of cases where someone is specifically trying to lower turncount, which may increase risk and possible maximum score at the expense of win rate. However, there's no way of separating those from the deaths which are caused by a true lack of skill.
If that hypothosis is true, it makes any conclusions about difficulty fairly suspect derived the data we have available to us.
Yet earlier you claimed that other non-difficulty outcome sources wash out over large sampling. I don't see why this isn't true for the occasional "not trying" outcomes also. Especially when you can't actually estimate % in either case.
Importantly, we can reasonably hold these deaths to be equally likely for most changes made to early/late game crawl, and thus make meaningful conclusions about player death rate data changes based on mechanical changes in spite of them. Though reducing optimized play = excessive tedium isn't a bad side goal/benefit.
Its the same reason that we exclude !boring games from the results (early escapes without the orb, or quits) because the point at which the game ends isn't reflective of the best effort of the player to win, there's just a much larger subset of games than can be readily identified which don't reflect players best efforts to win.
I think you are vastly underestimating what "best effort to win" really means. If we're being honest you could throw out nearly the entire sample set of games. For example most people don't study the code, fsim weapon damage, look up every monster in every encounter until they know all stats/hd/etc by heart, and ensure their body is well rested and definitely not on any substances like alcohol with long calculations of all potential actions for every non-trivial encounter without exception.
Since they aren't doing those things and more, they're technically not giving their "best effort to win" and can be safely not counted, right? Except that kind of position neglects how the overwhelming majority of crawl players (including many players with very high winrates!) actually play crawl. The moment someone presses "autoexplore" once, they should instantly be removed from consideration? I'm not convinced. But this is implied if we're really talking about "best effort to win".