## Difficulty vs XL

### Difficulty vs XL

How difficult is the game as we go along? What is the relationship between experience level and difficulty? This question has obvious implications for the XP curve in-game.

We can try to answer the question using this measure:

Difficulty at experience level XL = Probability of dying at experience level XL given that we have reached experience level XL.

The data required can be obtained using Sequell queries. I did it for version 0.22. Details are in the spoiler text. I only did it for lvls 1-25.
Spoiler: show
I ran

"!lg tenpercenters !boring 0.22 / xl=XL"
for each XL. This is not completely accurate because some of these games include "death by winning". But that's a very small portion of games, so I ignored it.

Here's the data for tenpercenters:

XL Prob of dying at XL Prob of surviving to XL+1 Probability of dying at XL given that you’ve reached XL
1 4.65 95.35 4.65
2 2.48 92.87 2.60094389092816
3 4.88 87.99 5.25465704748573
4 4.88 83.11 5.54608478236163
5 4.96 78.15 5.96799422452172
6 3.87 74.28 4.95201535508637
7 3.95 70.33 5.31771674744211
8 3.25 67.08 4.62107208872458
9 3.95 63.13 5.88849135360763
10 3.41 59.72 5.40155235228893
11 3.72 56 6.22906898861353
12 1.94 54.06 3.46428571428571
13 2.79 51.27 5.16093229744728
14 2.32 48.95 4.52506338989663
15 3.02 45.93 6.16956077630235
16 2.87 43.06 6.24863923361637
17 1.47 41.59 3.4138411518811
18 1.16 40.43 2.78913200288531
19 0.93 39.5 2.30027207519169
20 0.62 38.88 1.56962025316456
21 0.93 37.95 2.39197530864197
22 0.85 37.1 2.23978919631093
23 1.08 36.02 2.911051212938
24 1.08 34.94 2.99833425874514
25 1.94 33 5.55237550085861

For all players:

XL Prob of dying at XL Prob of surviving to XL+1 Probability of dying at XL given that you’ve reached XL
1 14.26 85.74 14.26
2 8.49 77.25 9.90202939118265
3 12.25 65 15.8576051779935
4 11.1 53.9 17.0769230769231
5 9.33 44.57 17.3098330241187
6 7.15 37.42 16.0421808391295
7 6.03 31.39 16.1143773383218
8 5.09 26.3 16.2153552086652
9 5.16 21.14 19.6197718631179
10 4.62 16.52 21.8543046357616
11 4.53 11.99 27.4213075060533
12 2.57 9.42 21.4345287739783
13 1.78 7.64 18.895966029724
14 1.33 6.31 17.4083769633508
15 1.09 5.22 17.2741679873217
16 0.92 4.3 17.6245210727969
17 0.61 3.69 14.1860465116279
18 0.4 3.29 10.840108401084
19 0.31 2.98 9.42249240121581
20 0.26 2.72 8.72483221476511
21 0.2 2.52 7.35294117647059
22 0.18 2.34 7.14285714285715
23 0.17 2.17 7.26495726495727
24 0.15 2.02 6.91244239631337
25 0.18 1.84 8.91089108910892

Here's a graph of difficulty vs XL, for all players and for "tenpercenters".
For this message the author bel has received thanks:
### Re: Difficulty vs XL

That looks pretty great overall for tenpercenters. Maybe you could plot the first derivative?

### Re: Difficulty vs XL

First derivative of difficulty (the data is underneath the spoiler tag):

Spoiler: show
All players
XL Prob of dying at XL Prob of surviving to XL+1 Probability of dying at XL given that you’ve reached XL First Derivative of Difficulty
1 14.26 85.74 14.26
2 8.49 77.25 9.90202939118265 -4.35797060881735
3 12.25 65 15.8576051779935 5.95557578681088
4 11.1 53.9 17.0769230769231 1.21931789892955
5 9.33 44.57 17.3098330241187 0.232909947195662
6 7.15 37.42 16.0421808391295 -1.26765218498928
7 6.03 31.39 16.1143773383218 0.072196499192291
8 5.09 26.3 16.2153552086652 0.100977870343428
9 5.16 21.14 19.6197718631179 3.40441665445269
10 4.62 16.52 21.8543046357616 2.23453277264372
11 4.53 11.99 27.4213075060533 5.56700287029168
12 2.57 9.42 21.4345287739783 -5.98677873207495
13 1.78 7.64 18.895966029724 -2.53856274425432
14 1.33 6.31 17.4083769633508 -1.48758906637321
15 1.09 5.22 17.2741679873217 -0.134208976029072
16 0.92 4.3 17.6245210727969 0.350353085475223
17 0.61 3.69 14.1860465116279 -3.43847456116903
18 0.4 3.29 10.840108401084 -3.3459381105439
19 0.31 2.98 9.42249240121581 -1.41761599986821
20 0.26 2.72 8.72483221476511 -0.697660186450705
21 0.2 2.52 7.35294117647059 -1.37189103829451
22 0.18 2.34 7.14285714285715 -0.210084033613445
23 0.17 2.17 7.26495726495727 0.122100122100123
24 0.15 2.02 6.91244239631337 -0.352514868643901
25 0.18 1.84 8.91089108910892 1.99844869279555

Tenpercenters

XL Prob of dying at XL Prob of surviving to XL+1 Probability of dying at XL given that you’ve reached XL First Derivative of Difficulty
1 4.65 95.35 4.65
2 2.48 92.87 2.60094389092816 -2.04905610907184
3 4.88 87.99 5.25465704748573 2.65371315655757
4 4.88 83.11 5.54608478236163 0.291427734875898
5 4.96 78.15 5.96799422452172 0.421909442160086
6 3.87 74.28 4.95201535508637 -1.01597886943535
7 3.95 70.33 5.31771674744211 0.365701392355739
8 3.25 67.08 4.62107208872458 -0.696644658717527
9 3.95 63.13 5.88849135360763 1.26741926488305
10 3.41 59.72 5.40155235228893 -0.486939001318705
11 3.72 56 6.22906898861353 0.827516636324601
12 1.94 54.06 3.46428571428571 -2.76478327432782
13 2.79 51.27 5.16093229744728 1.69664658316157
14 2.32 48.95 4.52506338989663 -0.635868907550655
15 3.02 45.93 6.16956077630235 1.64449738640572
16 2.87 43.06 6.24863923361637 0.079078457314024
17 1.47 41.59 3.4138411518811 -2.83479808173528
18 1.16 40.43 2.78913200288531 -0.624709148995787
19 0.93 39.5 2.30027207519169 -0.488859927693619
20 0.62 38.88 1.56962025316456 -0.730651822027132
21 0.93 37.95 2.39197530864197 0.822355055477418
22 0.85 37.1 2.23978919631093 -0.15218611233104
23 1.08 36.02 2.911051212938 0.67126201662707
24 1.08 34.94 2.99833425874514 0.087283045807136
25 1.94 33 5.55237550085861 2.55404124211347
### Re: Difficulty vs XL

Another idea could be to calculate difficulty in game turns, instead of XL. Assuming a player plays at roughly the same rate throughout the game, we could then see how "smooth" the difficulty level they experience, as they play the game.

The relevant Sequell query would be "!lg * !boring 0.22 / turn<100", and so on. The same methodology used above would work, though the table would be larger. We probably could use smaller buckets at low turncounts and bigger buckets at higher turncounts. We can perhaps take out Chei worshippers and Frogs/Nagas/Spriggans/Centaurs/Felids, but I doubt that if we use game turns, it would make much of a difference either way.

I'll get to it when I have some time.

### Re: Difficulty vs XL

I created plots for difficulty level based on turns. The turns are on a log scale.

Data under the spoiler tag.
Spoiler: show
I used queries of the form: "!lg * !boring 0.22 / !won turn>50 turn<76"

The left end of the graph starts at turn 25. I used small bucket sizes at low turns and bigger buckets at higher turns. The difficulty is normalized based on the size of the buckets.

For all players:

Turns Bucket size Prob of dying in this bucket Prob of surviving till turn T Prob of dying in this bucket given you have survived to see this bucket Difficulty of bucket divided by bucket size First derivative
25 25 0.99 99.01 0.99 3.96
50 25 2.54 96.47 2.56539743460257 10.2615897384103 6.30158973841026
75 25 2.6 93.87 2.69513838499015 10.7805535399606 0.518963801550347
100 25 2.08 91.79 2.21583040374987 8.86332161499947 -1.91723192496114
150 50 2.83 88.96 3.08312452336856 6.16624904673712 -2.69707256826235
200 50 1.82 87.14 2.04586330935252 4.09172661870504 -2.07452242803208
250 50 1.25 85.89 1.43447326141841 2.86894652283681 -1.22278009586822
300 50 1.01 84.88 1.17592269181511 2.35184538363022 -0.517101139206589
350 50 0.89 83.99 1.04853911404336 2.09707822808671 -0.254767155543514
400 50 0.76 83.23 0.904869627336588 1.80973925467318 -0.287338973413536
500 100 1.3 81.93 1.56193680163403 1.56193680163403 -0.247802453039149
600 100 1.18 80.75 1.44025387525937 1.44025387525937 -0.121682926374658
700 100 1.26 79.49 1.56037151702786 1.56037151702786 0.120117641768496
800 100 1.49 78 1.87444961630394 1.87444961630394 0.314078099276074
900 100 1.72 76.28 2.20512820512821 2.20512820512821 0.330678588824268
1000 100 1.86 74.42 2.43838489774515 2.43838489774515 0.233256692616944
1200 200 3.79 70.63 5.09271701155603 2.54635850577802 0.107973608032867
1400 200 3.5 67.13 4.95540138751239 2.47770069375619 -0.068657812021822
1600 200 3.31 63.82 4.93073141665425 2.46536570832713 -0.012334985429068
1800 200 3.1 60.72 4.8574114697587 2.42870573487935 -0.036659973447778
2000 200 2.92 57.8 4.80895915678524 2.40447957839262 -0.024226156486726
2200 200 2.73 55.07 4.72318339100346 2.36159169550173 -0.042887882890891
2400 200 2.62 52.45 4.75758126021427 2.37879063010714 0.017198934605406
2600 200 2.44 50.01 4.65204957102002 2.32602478551001 -0.052765844597127
2800 200 2.23 47.78 4.45910817836433 2.22955408918216 -0.096470696327846
3000 200 2.1 45.68 4.39514441188782 2.19757220594391 -0.031981883238254
3500 500 4.6 41.08 10.0700525394046 2.01401050788091 -0.183561698062999
4000 500 3.66 37.42 8.90944498539435 1.78188899707887 -0.23212151080204
4500 500 2.91 34.51 7.77659005879209 1.55531801175842 -0.226570985320453
5000 500 2.47 32.04 7.15734569689945 1.43146913937989 -0.123848872378528
5500 500 2.07 29.97 6.46067415730337 1.29213483146067 -0.139334307919216
6000 500 1.8 28.17 6.00600600600601 1.2012012012012 -0.090933630259473
7000 1000 3.27 24.9 11.6080937167199 1.16080937167199 -0.04039182952921
8000 1000 2.81 22.09 11.285140562249 1.1285140562249 -0.032295315447092
9000 1000 2.43 19.66 11.0004526935265 1.10004526935265 -0.028468786872252
10000 1000 2.16 17.5 10.9867751780264 1.09867751780264 -0.001367751550003
12000 2000 3.33 14.17 19.0285714285714 0.951428571428571 -0.147248946374073
14000 2000 2.4 11.77 16.9371912491178 0.846859562455892 -0.104569008972679
16000 2000 1.7 10.07 14.4435004248088 0.722175021240442 -0.124684541215451
18000 2000 1.19 8.88000000000001 11.8172790466733 0.590863952333664 -0.131311068906778
20000 2000 0.92 7.96000000000001 10.3603603603604 0.518018018018018 -0.072845934315646
22000 2000 0.71 7.25000000000001 8.91959798994974 0.445979899497487 -0.072038118520531
24000 2000 0.6 6.65000000000001 8.27586206896551 0.413793103448275 -0.032186796049212
26000 2000 0.52 6.13000000000001 7.81954887218044 0.390977443609022 -0.022815659839253
28000 2000 0.47 5.66000000000001 7.66721044045676 0.383360522022838 -0.007616921586184
30000 2000 0.44 5.22000000000001 7.773851590106 0.3886925795053 0.005332057482462

For tenpercenters:
Turns Bucket size Prob of dying in this bucket Prob of surviving till turn T Prob of dying in this bucket given you have survived to see this bucket Difficulty of bucket divided by bucket size First derivative
25 25 0.99 99.01 0.99 3.96
50 25 1.01 98 1.02009897990102 4.08039591960408 0.120395919604081
75 25 0.46 97.54 0.469387755102041 1.87755102040816 -2.20284489919592
100 25 0.46 97.08 0.471601394299774 1.8864055771991 0.008854556790934
150 50 0.93 96.15 0.957972805933251 1.9159456118665 0.029540034667404
200 50 0.39 95.76 0.405616224648986 0.811232449297972 -1.10471316256853
250 50 0.54 95.22 0.56390977443609 1.12781954887218 0.316587099574209
300 50 0.85 94.37 0.892669607225373 1.78533921445075 0.657519665578565
350 50 0.31 94.06 0.328494224859595 0.65698844971919 -1.12835076473156
400 50 0.31 93.75 0.329576865830321 0.659153731660642 0.002165281941452
500 100 1.16 92.59 1.23733333333333 1.23733333333333 0.578179601672691
600 100 0.62 91.97 0.669618749324981 0.669618749324981 -0.567714584008352
700 100 0.54 91.43 0.587147983037947 0.587147983037947 -0.082470766287034
800 100 1.08 90.35 1.18123154325714 1.18123154325714 0.59408356021919
900 100 0.85 89.5 0.940785832872164 0.940785832872164 -0.240445710384973
1000 100 1.32 88.18 1.47486033519553 1.47486033519553 0.534074502323367
1200 200 1.24 86.94 1.40621456112497 0.703107280562486 -0.771753054633045
1400 200 1.63 85.31 1.87485622268231 0.937428111341155 0.234320830778669
1600 200 1.24 84.07 1.45352244754425 0.726761223772125 -0.21066688756903
1800 200 1.24 82.83 1.47496134173903 0.737480670869513 0.010719447097388
2000 200 1.55 81.28 1.87130266811542 0.935651334057708 0.198170663188195
2200 200 0.93 80.35 1.14419291338583 0.572096456692913 -0.363554877364795
2400 200 1.08 79.27 1.34411947728687 0.672059738643435 0.099963281950522
2600 200 1.24 78.03 1.5642740002523 0.782137000126151 0.110077261482716
2800 200 0.85 77.18 1.08932461873638 0.544662309368191 -0.237474690757959
3000 200 1.08 76.1 1.39932625032392 0.699663125161959 0.155000815793767
3500 500 2.55 73.55 3.35085413929041 0.670170827858081 -0.029492297303878
4000 500 2.55 71 3.46702923181509 0.693405846363018 0.023235018504937
4500 500 1.63 69.3700000000001 2.29577464788732 0.459154929577465 -0.234250916785553
5000 500 1.7 67.6700000000001 2.45062707222142 0.490125414444284 0.03097048486682
5500 500 1.55 66.12 2.29052756021871 0.458105512043741 -0.032019902400543
6000 500 0.93 65.19 1.4065335753176 0.281306715063521 -0.176798796980221
7000 1000 3.02 62.17 4.63261236385949 0.463261236385949 0.181954521322428
8000 1000 2.4 59.77 3.86038282129644 0.386038282129644 -0.077222954256304
9000 1000 1.63 58.14 2.72712062907813 0.272712062907813 -0.113326219221831
10000 1000 1.63 56.51 2.80357757137943 0.280357757137943 0.00764569423013
12000 2000 2.79 53.72 4.93717926030791 0.246858963015395 -0.033498794122547
14000 2000 2.24 51.48 4.16976917349218 0.208488458674609 -0.038370504340786
16000 2000 1.86 49.62 3.61305361305361 0.180652680652681 -0.027835778021929
18000 2000 1.24 48.38 2.49899234179766 0.124949617089883 -0.055703063562798
20000 2000 1.32 47.06 2.72840016535758 0.136420008267879 0.011470391177996
22000 2000 1.08 45.98 2.29494262643434 0.114747131321717 -0.021672876946162
24000 2000 1.24 44.74 2.69682470639408 0.134841235319704 0.020094103997987
26000 2000 0.85 43.89 1.8998658918194 0.09499329459097 -0.039847940728734
28000 2000 1.01 42.88 2.30120756436546 0.115060378218273 0.020067083627303
30000 2000 0.46 42.42 1.07276119402985 0.053638059701493 -0.06142231851678
For this message the author bel has received thanks:
### Re: Difficulty vs XL

### Re: Difficulty vs XL

There are labels on the horizontal axes -- you just have to scroll a little bit since the image is a little too big.

### Re: Difficulty vs XL

bel wrote:There are labels on the horizontal axes -- you just have to scroll a little bit since the image is a little too big.

Oh, hehe
### Re: Difficulty vs XL

From the graph above, we can see that there are roughly two parts of the graph: from turn 1-1000, and from turn 1000 onwards.

Here are zoomed-in graphs for the two periods, with significant events labeled. (This is for all players; the tenpercenters graph is similar). Keep in mind that these are all averages.
### Re: Difficulty vs XL

Some more plots, this time of the difficulty of each floor. Dungeon only goes till D:8 because otherwise the calculations for randomized Lair entrances become too hairy.
Spoiler: show
I plotted difficulty of each floor from D:1 to D:8 (later floors become problematic because of randomized Lair entrances).

I calculated the "lairratio" (the probability of reaching Lair). Then I calculated the difficulty of each floor from Lair:1 to Lair:6 based on this lairratio.

I then concatenated the two graphs, to make them easier to visualize.

These calculations ignore portal vaults like Ossuary and Ice Cave.

Sequell queries used are of the the form:
!lairratio tenpercenters !boring 0.22
!lg tenpercenters !boring 0.22 / place=Lair:1

• D:1-4 are fairly tough. There's a sharp drop in difficulty from D:4 to D:5 and the difficulty keeps trending downward.
• While all players experience a sharp drop in difficulty when they enter Lair, the effect is much sharper for Tenpercenters. In other words, the meme "the game is won by Lair" holds much more strongly for the latter group.
### Re: Difficulty vs XL

bel wrote:[*] While all players experience a sharp drop in difficulty when they enter Lair, the effect is much sharper for Tenpercenters. In other words, the meme "the game is won by Lair" holds much more strongly for the latter group.

Note that since you're graphs don't go past the lair, it's hard to draw that conclusion (for all we know from the charts, the difficulty past the lair goes back up again)
### Re: Difficulty vs XL

I didn't calculate it, but I highly doubt that the game becomes much tougher after Lair, given the shape of the other graphs. All indications are that the game keeps becoming easier.

### Re: Difficulty vs XL

bel wrote:I didn't calculate it, but I highly doubt that the game becomes much tougher after Lair, given the shape of the other graphs. All indications are that the game keeps becoming easier.

I don't personally suspect it does either, but the data doesn't support the conclusion one way or the other, so your statement about the post-lair game doesn't really relate to the graph in question (not to say it's an incorrect statement, just that it's not supported by the graph)
### Re: Difficulty vs XL

[This discussion happened in another thread, but I think it belongs better here.]

tealizard dissents from this entire approach of using Sequell queries to determine game difficulty:

tealizard wrote:Incidentally, the plots from the other thread are essentially a souped up version of the Sequell query arguments that have such a dubious history here. Unfortunately, the data that goes into these arguments is either too sparse to be useful (when it's based on only players who do something that reasonably approximates optimal play) or just straight up garbage (includes games from players who win less than 80% of the time). Fortunately, there are good arguments from first principles that show how crawl difficulty works, namely starts pretty high as computer games go, then craters after about 3 floors.

tealizard wrote:Though bel's methods of arriving at conclusions about dcss's difficulty curve are faulty, his conclusions are in broad agreement with reasoning from the rules of the game -- i.e. the correct way to generate knowledge about crawl. This idea of using sequell data is not new and people have gone over and over this kind of thing for years, working through all these sorts of exclusions and special conditions.

Similarly, bel's theory that you should reason from the practice of players rather than the rules of the game (including optimal play analysis) is faulty. This is a recipe for a game full of cheez, where the designers pretend they don't know the rules until sufficiently many players figure them out. You guys are just relitigating settled questions, either in the mistaken belief of having found something new or in order to forestall impetus for change.

So, here's my response:

First, I see no articulation of what these "first principles" are, nor any alternate measure of difficulty curve which is derived from these "first principles" or "rules of the game". Crawl is a very complicated game with a hundred different mechanics. How are you going to handle all the mechanics starting from "first principles"? I don't like handwaving.

Second, the problem with using "optimal play" is that Crawl isn't very hard if you play optimally. The "optimal winrate" quickly reaches 100%, and as a corollary, the difficulty level quickly reaches 0. So, crawl is a combination of "intractable" and "uninteresting" if you try to analyze from "first principles" (whatever that means exactly).

Third, the idea that looking at how people who actually play the game is "straight up garbage". I am trying to look at the margin of existing Crawl development, not trying to write my own fork. Why does every Crawl release come with an associated tournament? It's partly for ironing out bugs, but it is also to get a sense of game balance. To give an explicit example of the latter, recall that Gauntlet was added in 0.23, replacing Lab. When people complained (or not) that Gauntlet is too hard, a dev (advil), pointed to Sequell queries in the tournament.

To be clear: I do not claim that Sequell queries inform Crawl development exclusively. I am saying that Crawl development incorporates how people play (quite heavily). On the flip side, I see no evidence whatsoever that Crawl development (or any other game to my knowledge, not that my knowledge is very wide), uses this "hypothetical optimal player" very heavily. (Animate Skeleton and Spectral Weapon still exist. Trog gives berserk at 1*.) They devs do use it for some things, sure, but it's not an exclusive criterion. As I point out above, Crawl is not an interesting game if you look at "hypothetical optimal play", so such a quest would be pointless anyway.

Finally, several mentions of Hellcrawl occurred in the other thread. I have played a ton of games of Hellcrawl and I like it very much. However, most of the degenerate mechanics in Crawl are present in Hellcrawl as well, to the point that "hypothetical optimal play" would have a pretty large winrate. For example, luring exists very much in Hellcrawl. Things are hard at the start of the level but once you clear a little part of the level, you can lure as much as you want. The "doom clock" is generous enough to not matter most of the time.

### Re: Difficulty vs XL

Okay, so:

1. An argument about crawl from first principles would work from the definition of crawl, that is to say from the rules of the game, and from general principles about gameplay, usability, and so on.

2. It is true that with optimal play, difficulty quickly approaches zero, which sounds a lot like your conclusion in this thread. Situations that produce an unavoidable death/loss tend to happen in the entry vault of d:1 and almost vanish in probability by d:4. You can estimate these probabilities using vault weights, spawn weights, odds of various combat events, and so forth.

3. If you develop a novel perspective on the game and you have the means, you should write your own fork. The reason dcss has regular tournaments is that it helps promote the game and create a set of practices that define an online culture around the game. I do not agree that the linked comment lends any credence to the use of Sequell queries as a basis for reasoning about crawl.

Finally: It is absolutely true that hellcrawl maintains some of the worst deficiencies of dcss -- many of these are deep problems that require us to rethink crawl at a basic level. That is why we should move forward in our thinking about crawl, not relitigate issues settled by experience with hellcrawl or rehash arguments against things dcss culture gets right, like optimal play analysis. There is still a lot of potential in continuing the work hellcrawl began, but we are at a juncture requiring a greater leap forward than we have yet seen along the way, even more than the upstairs removal.
For this message the author tealizard has received thanks:
### Re: Difficulty vs XL

I do not understand what any of that really means. Yeah you can theoretically calculate difficulty based on vault weights, spawn weights, odds of various combat events etc. That's a bit like saying that you can model human behaviour by using quantum mechanics because everything is based on physics after all. I'll believe it when I actually see it. It is definitely possible, after all: AlphaZero learnt how to play expert Go by only starting from the rules of the game. However, I'm not aware of any CrawlZero.

None of the difficulty measures in this thread reach 0 at d:4, so whatever your difficulty curve is, it doesn't look like the ones in this thread. In fact, D:4 is the hardest floor in the game (according the measures i calculated above). It is true that difficulty, overall, trends downward, but it does not drop down to zero anywhere near that fast.

The problem with a measure which implies that difficulty level is zero so early means that most of the game is completely irrelevant. It also implies something else: you cannot use "optimal play", "tedium" etc. to remove bad mechanics in most of the game. So, suppose the problem with LRD breaking walls was that it provided infinite digging, and thus it was removed because it is a degenerate mechanic. But if the "optimal winrate" was already 100% before learning LRD, then infinite digging makes no difference at all. So the professed reasoning is completely incoherent if we're working in the "hypothetical optimal" model. This is the main reason why most invocations of "optimal play" on Tavern are completely wrong.
### Re: Difficulty vs XL

Going back to the original graph. The problem I always have with the 'lethality by XL' analysis is that XP aptitudes exist. Game progression (and hence difficulty) scale with XP-gained far more than they do with XL. And that makes the random distribution of XP apt blur out features on the difficulty graph.
### Re: Difficulty vs XL

It's definitely possible. XL is a rather coarse measure -- using XP would be better. Unfortunately, there is no fine-grained measure of XP which I can get from Sequell data. At least, I don't know how to do it, maybe some Sequell wizards can figure it out. Many things in-game depend on XL, so it's not a bad criterion to use.

The coarseness of XL is one reason why I tried out plots using various other criteria.

Cocytus Succeeder

Posts: 2158

Joined: Tuesday, 3rd February 2015, 22:05

### Re: Difficulty vs XL

The comment about AlphaZero above brings to my mind another observation. AlphaZero also learnt how to play StarCraft expertly. I am not a StarCraft player, but from what I understand, AlphaZero just spammed a cheap and fast unit and then used superior micro to take down human players.

This makes me think that in complicated and sprawling video games (like Crawl is, and probably will always remain), there's almost always some broken mechanic somewhere. Optimal play will consist of exploiting the hell out of the broken mechanic.
Last edited by bel on Wednesday, 3rd July 2019, 13:54, edited 1 time in total.

### Re: Difficulty vs XL

bel wrote:It's definitely possible. XL is a rather coarse measure -- using XP would be better. Unfortunately, there is no fine-grained measure of XP which I can get from Sequell data. At least, I don't know how to do it, maybe some Sequell wizards can figure it out. Many things in-game depend on XL, so it's not a bad criterion to use.

The coarseness of XL is one reason why I tried out plots using various other criteria.

Wouldn't it be possible to query based on race, and correct by their XP aptitude?
### Re: Difficulty vs XL

It's possible, but it would be a lot of work. I might do a little bit of it anyway, because I want to see how some races like Nagas and Felids do in the early game. My impression is that Nagas need to be buffed in the early game.

### Re: Difficulty vs XL

Crawl is not complex enough for your analogy with the relationship between basic physical law and human behavior to hold water, I think. It is certainly true that it's hard to perform explicit calculations, which is why I talk about estimates instead. The reason your results do not match the picture I describe exactly is that they include a lot of noise and nonsense, which goes back to why sequell queries are not a good basis for thinking about crawl.

I actually agree about optimal play being an incomplete guide for exactly the reason that most things make no difference from that perspective. "Optimal play" is not a unique thing, just anything that reaches some theoretical maximum winrate, but reasoning based on the rules of the game is broader than optimal play analysis. A lot of analysis that people say deals with optimal play really deals with some kind of relative advantage versus naive or "normal" play and that advantage is sometimes marginal. Most of this talk is really just about the connection between what's possible according to the rules and best practices for winrate play. There's an admirable perfectionism to this kind of thinking. If anything, it is not applied scrupulously enough.
### Re: Difficulty vs XL

To compare crawl with another game I play (Brogue), I looked at the difficulty by dungeon depth in Brogue. In Brogue, you have to descend to D:26 to find the amulet of yendor and come back. There is an "extended" (below D:26), but I ignored that part for our purposes. About 13% of games either went into extended or won.

This is generally in line with my Brogue experience that difficulty level remains fairly high and roughly constant throughout the game.

Spoiler: show
To compile the data, I went to r/brogueforum and looked at all the results of the weekly contests. I then wrote a simple python script to extract deaths by depth. I then calculated difficulty level using the same method as above. Sample size is about 400.
### Re: Difficulty vs XL

tealizard wrote:Crawl is not complex enough for your analogy with the relationship between basic physical law and human behavior to hold water, I think. It is certainly true that it's hard to perform explicit calculations, which is why I talk about estimates instead. The reason your results do not match the picture I describe exactly is that they include a lot of noise and nonsense, which goes back to why sequell queries are not a good basis for thinking about crawl.

I actually agree about optimal play being an incomplete guide for exactly the reason that most things make no difference from that perspective. "Optimal play" is not a unique thing, just anything that reaches some theoretical maximum winrate, but reasoning based on the rules of the game is broader than optimal play analysis. A lot of analysis that people say deals with optimal play really deals with some kind of relative advantage versus naive or "normal" play and that advantage is sometimes marginal. Most of this talk is really just about the connection between what's possible according to the rules and best practices for winrate play. There's an admirable perfectionism to this kind of thinking. If anything, it is not applied scrupulously enough.

The problem is that I do not see any "estimates" here either. All I see is vigorous handwaving. If you say that the difficulty craters at D:3 (or D:4 or whatever), how did you derive this "estimate" from "first principles"? As far as I can see, this "estimate" is just pulled out from some random bodily orifice.

I am not against argument from the rules of the game; indeed, I do plenty of that sort of thing myself. However, to claim that it is the One True Path to Crawl Enlightenment is nonsense. Crawl is a game, not a mathematical puzzle. This is good, because Crawl makes a pretty good game, but a shitty mathematical puzzle. If anyone actually took "optimal play" seriously, they would advocate having the Orb of Zot on D:4. Or maybe D:2.

Given the Crawl is a game, looking at how people play is a completely valid way of looking at game difficulty. It is tractable and provides real estimates instead of vigorous handwaving. Moreover, you can look at subsets of players (for instance, I looked at "all players" and "tenpercenters" above) to have an idea of how the game feels for various kinds of players.

Vaults Vanquisher

Posts: 447

Joined: Thursday, 1st November 2018, 02:33

### Re: Difficulty vs XL

Your methodology, at best, measures something about dcss players, not dcss itself. I'm hearing about handwaving from someone who purports to measure the difficulty of a game by looking at games from people who barely know how to play or who aren't really trying. What you are doing in asserting the validity of this silliness is worse than handwaving.

Estimating "difficulty" in the sense you are talking about would involve probabilities of "unavoidable death" scenarios. You can find examples of those kinds of calculations on this forum. This is the sort of thing you should try thinking through yourself because it's moderately complicated and very tedious to go through on a forum. The point is that when you get a reasonable set of combat capabilities (skills, spells, items) and hp, you have so many options it's hard to contrive a situation that will kill you with significant probability given best tactics -- best practices make a lot of these situations almost impossible in the first place.
This is where mechanical excellence and one-thousand four-hundred horsepower pays off.

For this message the author tealizard has received thanks:
duvessa

Cocytus Succeeder

Posts: 2158

Joined: Tuesday, 3rd February 2015, 22:05

### Re: Difficulty vs XL

I found this cool series of posts which was done by some guy named Colin Morris a couple of years ago.

I half-remembered this from a post on Tavern back then. I had some worries about the approach.

Cocytus Succeeder

Posts: 2158

Joined: Tuesday, 3rd February 2015, 22:05

### Re: Difficulty vs XL

Hellcrawl difficulty level in Dungeon. Details are under spoiler tag.

Only looks at games on CPO. Thanks to chequers for logfile help.

Spoiler: show
Sequell doesn't track Hellcrawl games, so I looked at the raw logfile here. I looked at the most recent Hellcrawl version (v=5.7). I only looked at "difficulty=1" games with ktyp != 'quitting'. Also, I filtered for deaths in Dungeon.

It looks like Hellcrawl shares the difficulty drop of DCSS up to D:8 or so. However, the removal of Lair gives an upward push to the difficulty in the rest of the Dungeon.
### Re: Difficulty vs XL

hellcrawl is so fuckin' good
### Re: Difficulty vs XL

Updated Hellcrawl graph. Since Hellcrawl is more-or-less linear till Vaults:3, I plotted the difficulty assuming the canonical order (D -> Orc -> Sbranch -> Vaults).

Spoiler: show
Mean difficulty is 8.23 and is shown by horizontal line on the plot. Standard deviation is 4.7.

I also merged logfiles from CKO. CBRO only seems to have old version of Hellcrawl.
### Re: Difficulty vs XL

Nethack 3.6.2 difficulty plot. The turns are on a log scale

Spoiler: show
Games played on NAO. Logfile here. Removed boring games (quit and escaped). Sample size is about 4800.
For this message the author bel has received thanks:
### Re: Difficulty vs XL

bel wrote:Nethack 3.6.2 difficulty plot. The turns are on a log scale

This is a beautifully concise summary of why I don't like nethack.

Cocytus Succeeder

### Re: Difficulty vs XL

To be fair, the nethack plot doesn't look much different from the DCSS plots here.

### Re: Difficulty vs XL

Yes.

### Re: Difficulty vs XL

bel wrote:The comment about AlphaZero above brings to my mind another observation. AlphaZero also learnt how to play StarCraft expertly. I am not a StarCraft player, but from what I understand, AlphaZero just spammed a cheap and fast unit and then used superior micro to take down human players.

This makes me think that in complicated and sprawling video games (like Crawl is, and probably will always remain), there's almost always some broken mechanic somewhere. Optimal play will consist of exploiting the hell out of the broken mechanic.

I watched those games vs the two pro players, and during Wings of Liberty I was a Diamond 1 player myself (not elite, but very good at the time). Alphazero was significantly more advanced than that at the strategic level, I guess the machine learning picked up on enough reactions during its training. It's true that it used inhuman unit control in a way that effectively distorted unit balance though.

Anyway, I'm not convinced "theoretical optimal" is a valid basis for evaluating difficulty. There are levels designed in Mario Maker or even Chicken Horse that are too difficult for anybody on this forum to beat...yet "optimal" play (such as what you'd get out of TAS) would beat them every single time. This is demonstrated to an extreme with hack-specific levels of Super Mario World like "item abuse TAS". Those are STILL 100% winrate under "optimal play", but no human alive today can beat them without tool-assists.

Using player data and death rates is superior method of evaluating difficulty, assuming we're evaluating difficulty for human beings. You do run into a noise factor with literally impossible positions, however, due to RNG or bad design (some matches in Chicken Horse create temporarily impossible levels until people deploy obstacle destruction items or create a new path - it's not fair to call these "difficult" because there is literally no outcome variance between elite play and true newbies). Same goes for "true" RNG deaths in crawl - those rare events are not an example of difficulty, they are a noise factor complicating an evaluation of difficulty.

### Re: Difficulty vs XL

TheMeInTeam wrote:Using player data and death rates is superior method of evaluating difficulty, assuming we're evaluating difficulty for human beings. You do run into a noise factor with literally impossible positions, however, due to RNG or bad design [....] Same goes for "true" RNG deaths in crawl - those rare events are not an example of difficulty, they are a noise factor complicating an evaluation of difficulty.

The general consensus among experienced players on the forums has been that the actual instances of RNG-produced unwinnable situations (where there were no actions you could have possibly taken to avoid dying, or getting into a situation where you would have died) are exceedingly rare, to the point where the noise they produce isn't going to be a significant factor in any evaluation with a robust data set.

The problem isn't that win rates using player's best efforts to win aren't reflective of difficulty, but rather that the signal to noise ratio in the dataset is too large to derive actual difficulty from it. There's flat-out no way to say "this game they weren't really specifically trying their hardest to win".

The much larger problem is that a very large amount of the time people get into situations where they could avoid death, have some idea of what it might take to do so, and choose not to because it would be flat out more trouble than dying and starting a new game. I don't know if death by "Eh I might be able to escape this horrible situation if I burn all these consumables and subsequently try to sneak past this too-hard combat by trying enough times and waiting and shouting in enough locations, but I also might beat it with a little luck. F-it, let's give it a try, worst case I lose 10 minutes" can really be considered "difficulty".

My sense is that there's a very large number of deaths which fall into the category of "I don't feel like expending the effort on what I know is probably more likely to keep me alive, when there's a reasonable chance I'll survive with much much less effort", There's also a not-insignificant number of cases where someone is specifically trying to lower turncount, which may increase risk and possible maximum score at the expense of win rate. However, there's no way of separating those from the deaths which are caused by a true lack of skill.

If that hypothosis is true, it makes any conclusions about difficulty fairly suspect derived the data we have available to us.

Its the same reason that we exclude !boring games from the results (early escapes without the orb, or quits) because the point at which the game ends isn't reflective of the best effort of the player to win, there's just a much larger subset of games than can be readily identified which don't reflect players best efforts to win.
### Re: Difficulty vs XL

Zerothly, I should say that I primarily made these plots for fun. If they lead to some useful insights, that's a bonus.

That said, we should keep in mind the adage that "all models are false, some are useful". So, what is that I'm trying to do here? I'm trying to look at places where Crawl's difficulty curve is out of whack with the "ideal" difficulty curve -- whatever that might be.

If we keep this goal in mind, the "optimal play" difficulty measure may or may not be true, but it's completely useless -- because it says that everything after D:2 (or D:3) is completely irrelevant, because "optimal winrate" quickly reaches 100% (or close enough to not matter).

Using Sequell data is just another way of "playtesting" the game and seeing how hard it is. Instead of a few people playing the game and offering feedback -- saying this branch is too hard, the AC on this monster could be reduced, and so on -- I am looking at a bigger data set. Is the data set dirty and full of noise? Obviously. But there is some detectable signal in the noise.

I looked back at the Hellcrawl thread in CYC. Suppose, we look at the point where the monster spawns were being tuned. What I see is people playing the game and offering their thoughts on whether X is too easy, Y is too hard, this branch needs cuts etc. If people worked on the basis of "upstairs exist, consumables exist, so the game is completely trivial after D:2, so there's no point in tuning monster spawns.", then nothing would be done.

To again clarify: I am not opposed to arguments from the rules of the game etc. I do plenty of that sort of thing myself. I only claim that this kind of argument is not the only way to analyze game mechanics. Other data, appropriately used, can be useful -- indeed, much more useful in many cases.

The general consensus among experienced players on the forums has been that the actual instances of RNG-produced unwinnable situations (where there were no actions you could have possibly taken to avoid dying, or getting into a situation where you would have died) are exceedingly rare, to the point where the noise they produce isn't going to be a significant factor in any evaluation with a robust data set.

That might be broadly true when you take winrates in crawls as a whole, but remember in this thread some of the discussion is centering on the exact portion of the game where RNG deaths are (by far) most likely to happen: the first few levels of dungeon (D:1 in particular). When you focus the scope so narrowly on this area you have less opportunity for it to wash out.

More dangerously, I've not seen conclusive data or even a consistent means of evaluating "RNG death" vs "death to obvious mistakes" vs "death to choices made that a crawl-equivalent of alphazero might have avoided". I do know broadly, as a somewhat experienced player myself now, that very few games are unwinnable, but reject that anybody w/o substantive, non-anecdotal/recency biased data should have *confidence* in making estimates.

The problem isn't that win rates using player's best efforts to win aren't reflective of difficulty, but rather that the signal to noise ratio in the dataset is too large to derive actual difficulty from it. There's flat-out no way to say "this game they weren't really specifically trying their hardest to win".

We have roughly as much evidence supporting this as we have when players want to blame their losses on RNG though. Our "noise" filters are, at best, very crude.

Is it really unfair to claim that the vast majority of players that begin a game of crawl attempt to win it? Is there a non-arbitrary way to parse a "bad player mistake that gets them killed" from an "80% winrate player mistake that nevertheless still gets them killed when alphaDCSS wouldn't have died"? I don't see a clear reason for the distinction, right now.

I don't know if death by "Eh I might be able to escape this horrible situation if I burn all these consumables and subsequently try to sneak past this too-hard combat by trying enough times and waiting and shouting in enough locations, but I also might beat it with a little luck. F-it, let's give it a try, worst case I lose 10 minutes" can really be considered "difficulty".

It's actually non-trivial to define difficulty in a way that would not count such scenarios but still consistently count things you would consider "real difficulty". Quoted scenario is not unlike a failed early rush attempt in a TBS/RTS for example, complete with the benefit if your gamble pays off (more XP and more consumables still available early should you succeed, but lower expected success rate on average compared to alternative strategies...also lower IRL time investment).

My sense is that there's a very large number of deaths which fall into the category of "I don't feel like expending the effort on what I know is probably more likely to keep me alive, when there's a reasonable chance I'll survive with much much less effort", There's also a not-insignificant number of cases where someone is specifically trying to lower turncount, which may increase risk and possible maximum score at the expense of win rate. However, there's no way of separating those from the deaths which are caused by a true lack of skill.

If that hypothosis is true, it makes any conclusions about difficulty fairly suspect derived the data we have available to us.

Yet earlier you claimed that other non-difficulty outcome sources wash out over large sampling. I don't see why this isn't true for the occasional "not trying" outcomes also. Especially when you can't actually estimate % in either case.

Importantly, we can reasonably hold these deaths to be equally likely for most changes made to early/late game crawl, and thus make meaningful conclusions about player death rate data changes based on mechanical changes in spite of them. Though reducing optimized play = excessive tedium isn't a bad side goal/benefit.

Its the same reason that we exclude !boring games from the results (early escapes without the orb, or quits) because the point at which the game ends isn't reflective of the best effort of the player to win, there's just a much larger subset of games than can be readily identified which don't reflect players best efforts to win.

I think you are vastly underestimating what "best effort to win" really means. If we're being honest you could throw out nearly the entire sample set of games. For example most people don't study the code, fsim weapon damage, look up every monster in every encounter until they know all stats/hd/etc by heart, and ensure their body is well rested and definitely not on any substances like alcohol with long calculations of all potential actions for every non-trivial encounter without exception.

Since they aren't doing those things and more, they're technically not giving their "best effort to win" and can be safely not counted, right? Except that kind of position neglects how the overwhelming majority of crawl players (including many players with very high winrates!) actually play crawl. The moment someone presses "autoexplore" once, they should instantly be removed from consideration? I'm not convinced. But this is implied if we're really talking about "best effort to win".

### Re: Difficulty vs XL

Not directed at one specific person-if you mean "difficult" to only mean "difficult for the best players", you're talking about something which is not what most people think of when they hear the word "difficult".

Normally when people say something is difficult they mean it is hard for an average player, and then things that are hard for a skilled player are REALLY difficult.
Even compared to places where people discuss other RLs, fighting games, shmups etc, I don't feel like people speak past each other in this way like they do here. The average player matters if you're wondering what is difficult. Situations that kill the average player are difficult, and if you don't think so you're really asking a different(more specific) question than "is this difficult".

Cocytus Succeeder

### Re: Difficulty vs XL

Back to DCSS.

How much harder is the second rune S-branch as compared to the first rune S-branch? By my rough calculation, the second rune branch is about twice as easy as the first one.

Spoiler: show
I make some rough assumptions, but they shouldn't be too far from the truth.

First, I check how many people enter the first rune branch. Queries of the form:
Code:
`!lm * !boring 0.22 br=Snake urune<1 x=count(gid)<Sequell> 17565 milestones for * (!boring 0.22 br=Snake urune<1):                 count(game_key)=5722`

Total attempts to get first S-branch rune = 21454

Then I see how many people die in the first S-branch without finding a rune.
Code:
`!lg * !boring 0.22 br=(Snake|Shoals|Spider|Swamp) urune<1               x=count(gid)<Sequell> 4821 games for * (!boring 0.22 br=(Snake|Shoals|Spider|Swamp)                 urune<1): count(game_key)=4821`

Total deaths in first S-branch = 4821

So about 22.4% die in their attempt

We now look at second S-branch (with one rune)
Code:
`!lm * !boring 0.22 br.enter=(Swamp|Shoals|Snake|Spider) urune=1               x=count(gid)<Sequell> 8867 milestones for * (!boring 0.22                 br.enter=(Swamp|Shoals|Snake|Spider) urune=1):                 count(game_key)=8857`

About 8857 games attempt the second S-branch.

Deaths in second S-branch:
Code:
`!lg * !boring 0.22 br=(Snake|Shoals|Spider|Swamp) urune=1               x=count(gid)<Sequell> 1067 games for * (!boring 0.22 br=(Snake|Shoals|Spider|Swamp)                 urune=1): count(game_key)=1067`

So about 12% of people die in their attempt.

22.4 / 12 = 1.86

### Re: Difficulty vs XL

bel wrote:Then I see how many people die in the first S-branch without finding a rune.
Code:
`!lg * !boring 0.22 br=(Snake|Shoals|Spider|Swamp) urune<1               x=count(gid)<Sequell> 4821 games for * (!boring 0.22 br=(Snake|Shoals|Spider|Swamp)                 urune<1): count(game_key)=4821`
That isn't what this query gives you. It gives you how many people died in either S-branch while having no runes. Players often do levels 1-3 (and part of 4) of an S branch without getting the rune, then do levels 1-3 of the other S branch - this query catches deaths that happened in the second S branch as well. You're taking a bunch of the second S-branch deaths and counting them as first S-branch deaths.

(There's also a survivorship bias issue here; players that get the rune from the first S-branch are likely to be better than the players that don't, so the "win" rate for the second S-branch is going to be higher even if there's no difficulty difference.)

### Re: Difficulty vs XL

Yes, both of the points are true.

As I said, I made some approximations in my calculations. In particular, I assumed that whenever a person enters an S-branch, they do it till the end (or till they die). This is sometimes not true because branch ends are much tougher than the rest of the branch.

However, because DCSS is highly non-linear at the time of the rune branches, I don't know of any easier way to perform the calculations. Someone could just as well decide to do Elf in between, or Depths or whatever.

I suppose I could look at milestone data within a game, but my Sequell skills aren't that good.

Another way could be to just look at deaths on Sbranch:\$ (with and without a rune), because most of the action happens there. However, that has other problems: the first few floors of the first S-branch are (probably) more difficult than the first few floors of the second S-branch, so ignoring both of them doesn't seem right.

Another way could be incorporate the XP of the character in the deaths somehow -- the XP of a character doing their second rune would be higher. But higher XP characters die less anyway because DCSS becomes easier as the game goes on.

As for survivorship bias, well that is always present, in all of the queries in this thread. It's probably bigger here than in many other queries because most people who have played DCSS have never got a rune.

Another way could be to just look at deaths on Sbranch:\$ (with and without a rune), because most of the action happens there. However, that has other problems: the first few floors of the first S-branch are (probably) more difficult than the first few floors of the second S-branch, so ignoring both of them doesn't seem right.

I'm not sure we can make this conclusion so easily. People frequently select S branch order based on what they anticipate will be the easier one for their build to do first. In extreme cases like a mummy drawing spider as an S branch, the 2nd branch is so much harder/more dangerous than the first that you skip it in favor of elf/vaults/depths first. Frail species w/o a shield or Dmsl might feel the same way about shoals. This will of course make dying before 2nd rune technically more likely (3 areas compared to 1), but the build that enters the branch for its 2nd rune will be much stronger than a typical 2nd S branch.

There's a lot of noise for this query in particular. Enough that I expect it's less predictive than ignoring it and just using XL.

Cocytus Succeeder

Posts: 2158

Joined: Tuesday, 3rd February 2015, 22:05

### Re: Difficulty vs XL

Shoals might be an outlier because it's often harder than Vaults: 1-4. The other three are of roughly similar difficulty imo.

I wanted to develop some kind of quantitative measure for my general feeling that "there should be only one S-branch in the game because the first one is interesting while the second one is a slog."

After running the queries, I'm not so sure about my feeling. By the rough measure above, about 12% of attempts for the 2nd rune in the S-branches end in deaths; so it's a bit of a stretch to call it a "slog". Still, I think it would be good overall if there was only one S-branch.

For this message the author bel has received thanks:
powergame

### Re: Difficulty vs XL

I cannot contribute to the main point of this discussion (and I hope I'm not derailing anything). But the following might be tangential: I do not think that a flat difficulty curve contributes to keeping an RPG game interesting.

I think this is generally the dilemma of RPGs as opposed to, say adventure games or a strategic board games: On the one hand, "character improvement" is their very core and their strategic point of interest. On the other hand that "character progression" essentially goes like this: At level 1, you do 1 point of damage per round to a goblin with 10 HP. At level 100 you do 100 damage per round to a dragon with 1000 HP.

Since, personally, I'm unable to suspend my disbelief in that illusion, I really have difficulty to continue playing any RPG with any sort of attention in the long run. Most games, including crawl, deal with this dilemma by making things more complicated as the game continues. That means that gaining accurate knowledge of game mechanics and their interactions becomes the actual objective in the long run, far more than play itself. I don't know if there is a better solution. But I do think that if you keep difficulty constant, you're dealing with a game design paradox.
### Re: Difficulty vs XL

bel wrote:Shoals might be an outlier because it's often harder than Vaults: 1-4. The other three are of roughly similar difficulty imo.

I wanted to develop some kind of quantitative measure for my general feeling that "there should be only one S-branch in the game because the first one is interesting while the second one is a slog."

After running the queries, I'm not so sure about my feeling. By the rough measure above, about 12% of attempts for the 2nd rune in the S-branches end in deaths; so it's a bit of a stretch to call it a "slog". Still, I think it would be good overall if there was only one S-branch.

Players can skip an S branch if they want. I've actually done this recently as I mentioned in another thread (skipped spider as a mummy).

I'm not convinced shoals is more dangerous than other S branches. Your game data is aggregate across versions correct (edit: seems not, just for .22, but the point is similar then). Swamp is considerably more dangerous now than it used to be, with long range smite-targted grasping roots and dangerous projectiles. It has only slightly less stair pull/push than shoals too.

I suspect shoals might have more difference in survival between experienced and inexperienced players compared to other branches. This is because there are many extra tricks in shoals:

- Wand of flame on many things to damage with steam + block LoS
- Hexing wands or hex spells tend to be very effective on many of the enemies, including threatening ones
- Very few things in shoals see invis, a trait shared with spider but not swamp/snake
- Very few things in shoals have rPois, so magic like OTR or mephitic cloud can trivialize many of its encounters if you happen not to have hexes

In addition to this, snake/shoals tend to have shops and shoals in particular has the best loot drops of the S branches, so it power spikes the player more. However, newer players taking their MiBe there the first time might not anticipate the dangers of barbs, mesmerize, getting blown off stairs, getting stairs flooded, or getting pin-cushioned by mighted ranged enemies...so they die with otherwise strong setups because they don't know the above.

Contrast this to alternative S branches. You can fly over deep water to avoid grasping roots and still use wand of flame in swamps similar to shoals...but there's not a lot of extras there. In snake the predominant strategy is to walk away, mostly complicated by guardian serpents. Spider is mostly fast melee stuff, so once you learn to respect tarantellas and not to chase orb spiders deep into the fog your only real consideration is ghost moths.

Once you learn the extra variety in Shoals, I don't think it's more dangerous than Swamp. But it does have a larger learning curve initially and this might skew its results, further complicated by shambling mangroves/thorn hunters being more dangerous in swamp now than previously...especially the mangroves.

IMO Vaults 1-4 would only have a lower death rate than shoals in a vacuum. People go into shoals at significantly lower levels.

### Re: Difficulty vs XL

The first is this article: How Slay the Spire's devs use data to balance their roguelike deck-builder. Slay the Spire is a hybrid roguelike/deck-building game where you choose cards to fight against enemies. Since there are too many interactions to try to analyze, the devs looked at the game logs to determine balance. For instance, one simple measure they used is to see how often a player chose a card when they were given the option to choose it; and then the devs connected this choice to whether the player won or not. If a particular card is chosen by most winning players, it's likely to be overpowered.

From the perspective of the players themselves, something similar was created by one of the top players in the world (Jorbs). He created a Google Doc with the cards chosen from his runs, using a similar methodology. He assigned each card an Elo rating, with a higher rating meaning that he values the card more.

------------------------------

To connect it to what we were discussing above, let me quote my post above, where I contrasted two ways of looking at Crawl's difficulty level. One is arguments using the rules of the game, the other is looking at player data.

bel wrote:Zerothly, I should say that I primarily made these plots for fun. If they lead to some useful insights, that's a bonus.

That said, we should keep in mind the adage that "all models are false, some are useful". So, what is that I'm trying to do here? I'm trying to look at places where Crawl's difficulty curve is out of whack with the "ideal" difficulty curve -- whatever that might be.

If we keep this goal in mind, the "optimal play" difficulty measure may or may not be true, but it's completely useless -- because it says that everything after D:2 (or D:3) is completely irrelevant, because "optimal winrate" quickly reaches 100% (or close enough to not matter).

Using Sequell data is just another way of "playtesting" the game and seeing how hard it is. Instead of a few people playing the game and offering feedback -- saying this branch is too hard, the AC on this monster could be reduced, and so on -- I am looking at a bigger data set. Is the data set dirty and full of noise? Obviously. But there is some detectable signal in the noise.

I looked back at the Hellcrawl thread in CYC. Suppose, we look at the point where the monster spawns were being tuned. What I see is people playing the game and offering their thoughts on whether X is too easy, Y is too hard, this branch needs cuts etc. If people worked on the basis of "upstairs exist, consumables exist, so the game is completely trivial after D:2, so there's no point in tuning monster spawns.", then nothing would be done.

To again clarify: I am not opposed to arguments from the rules of the game etc. I do plenty of that sort of thing myself. I only claim that this kind of argument is not the only way to analyze game mechanics. Other data, appropriately used, can be useful -- indeed, much more useful in many cases.

If you watch Jorbs's streams, you'll see that he uses a lot of reasoning using the rules of the game. But he (and the Slay the Spire devs) also do a lot of player-data based analysis. As I tried to argue above, this method is not illegitimate because it cannot give the "true" or "optimal" difficulty level -- Crawl and Slay the Spire are games, not mathematical puzzles.

### Re: Difficulty vs XL

Re: bel

Slay the Spire is a clone of a popular and influential board game Dominion. It started a new board gaming genre. What they use for balancing reminds me of neural networks. Another card-based board game, Race for the Galaxy, has a computer version with neural networks. You can download it for free. The way I understand neural networks work, it doesn't try to understand the game at all. Rather, the programmer set a bunch of "sensors" to watch for specific conditions and rate each card in each situation. So, for example, it may work like this: if I have This card in hand, and there are 9 victory points left, and the end of the game is 4 cards away, then playing This card gives 0.273443321 chance of winning. In the same situation, playing That card would be 0.27653462 chance of winning. Therefore, play That card.

The downside of neural networks based AI is, I believe, is that it can't justify its actions to save its life. It just plays those cards in a very specific situation because thousands of simulated plays has shown that leads to victory with such and such probability. I made extensive use of computer version of Race for the Galaxy when learning to playing it. It's a brutal teacher and it's very effective, but you need to very carefully analyze its actions.

Even so, I've found that the implementation using neural networks has a bit of a weakness at strategic level (planning a couple of turns ahead). There are combos it consistently undervalues. I guess the programmer didn't catch all the factors that influence the outcome of the game, so neurons/sensors couldn't catch them.

Neural networks tend to work especially well in card games, because the data is fuzzy but states relatively well defined. Player might or might not have specific cards in hand, so exhaustive enumeration is prohibitively expensive, and time-limited algorithms can produce unsatisfactory results (Las Vegas / Monte Carlo, I don't remember which is which). Also in card games, available decisions and factors tend to be much better defined than in a game like Crawl, which has a great deal of strategic factor in skill training and spells, and which monster or item might or might not show up. I would be highly surprised if someone could define an effective neural network for DCSS. The complexity reminds me of 4X games, which are infamous for bad AI (and that's why they ultimately failed).

-----------

I think the way people play DCSS has a bit of herd mentality. Tavern, reddit, the wiki all suggest certain approaches which are not necessarily true or up to date. I used to play Crawl before draconians were cold-blooded. Nowadays the most common ranged enemy in Lair is a rime drake (frost), not firedrake. I very much doubt Lair is a great choice for a young draconian to train.

Swamp HAS became a lot harder compared to a couple of years ago. It has a distinct semi-open layout, the old Swamp was very open. On the plus side, it limits vision. The downside is that stealth might not be as good in there because fighting noise is going to attract stuff anyway (I recently won a DsFi with antennae, so I got to experience first-hand how much fighting noise actually does, and my stealth was two stars short of perfect). The same character reaped great benefits from Stealth in Shoals. Swamp is quite tricksy now, a spriggan druid can cast Might on a hydra. Old Swamp was infamous for requiring poison and perhaps electricity resistance. In new Swamp, I find that area damage/summoning is more useful because there are more choke points, and a flaming weapon because hydras are harder to spot from farther away.

The branch that has probably changed the least is Spider Nest (entropy weavers). Even Snake pit got naga sharpshooters and shock serpents.

---------

Also players tend to over-react to nerf changes. Nerfs have a big psychological effect in that many players seem to default to the second best option rather than actually trying the thing.

--------

My final 2 cents - judging by these forums, it seems Necromutation and especially Statue Form has became somewhat of a "ascension kit" in extended. I don't remember that being the case several years ago. A new meta has formed. Meta is not always right, but people tend to follow others when in doubt.

Posts: 584

Joined: Tuesday, 11th December 2018, 19:14

### Re: Difficulty vs XL

Most people say necromutation is bad. Statue form has a divided opinion. Some people really like it in extended, others don't bother or can take or leave it.

Many recent nerfs have been:

a) removing the option completely on shaky/overly general justification (no way to "actually try the thing").
b) nerfs to already-questionable options on the stated/unsupported basis that they were too strong (confusing touch, agony, dispel undead). Dispel undead was then made less expensive to learn/use, but amusingly the main issue with it has barely changed.

Stuff like BVC was nerfed but remains a very effective spell. Stabbers are somewhat brought down despite that they were already on the weak side. It's a mixed bag, some of the other magic changes were good (VM is likely too complete with its starting book, IE is a challenge pick unless you want to just barely train it and use freeze until you transition to a weapon).

I'd be interested to see how alphaX would do with crawl and 4x games. I expect it could probably become very good at the former, eventually outperforming top players. The latter would be an issue with modern 4x because their performance is so *expletive* awful that it might take even our best computers ages to do say a million games to learn. You'd have to fix the game to quit animating crap off-screen (from player perspective) first for example. Even during a turn with animations turned off, I can somehow make inputs faster than the game can handle them, and run afoul of input buffering. Webtiles has an excuse (connection), 4x does not. That's pathetic. No way you can sim tons of games with an AI using a game in that sorry state. IMO that's why 4x has mostly failed more so than their complexity/AI.

### Re: Difficulty vs XL

b0rsuk wrote: The way I understand neural networks work, it doesn't try to understand the game at all. Rather, the programmer set a bunch of "sensors" to watch for specific conditions and rate each card in each situation.

This: https://www.youtube.com/watch?v=aircAruvnKk is a great 20 minute explanation of how neural networks work, additionally I'd highly recommend all the 3Blue1Brown series, it's very clear and digestible.
