Save Files Issues

The save format has been rewritten, and now it uses a single file that consists of interleaved chunks compressed with zlib. It does everything with transactional safety, and appears to work without problems so far.

I've deleted the now-irrelevant proposals, and most of the following discussion is outdated. — KiloByte 2010-09-09 01:16

Plan (Jan 2010)

I (sorear) was going to roll a transactional container format by reusing bits and pieces of SQLite, but Darshan proposed simply using sqlite as is - CREATE TABLE levels (place STRING, data BLOB).

Databases are a terrible way for storing blobs. They suffer from command overhead, fragmentation and so on. There are far better solutions. We could have used tokyo-cabinet, but it's made mostly for short pieces of data, that's why I implemented my own solution.

Each blob will be internally compressed using zlib, of course.

Done.

I have a few other things I want to do with saves:

* Move to storing enums as compressed strings. The code is already there, but it needs to be integrated into the save code and debugged.

The vast majority of save compatibility issues comes from sources other than enums changing. And since almost every field is an enum, we'd suffer a massive increase of the file size for no gain.

* (Maybe) eliminate tags. Looking at old Crawl code (3.x), tags were introduced to simplify save compatibility before minor versions were invented; they only help us as long as they are never edited, a constraint we've long since abandoned. Getting rid of them would remove a layer of indirection from the save code and maybe make it easier to understand.

Good idea, I've done so.

* Store metadata in the saves! We can make a second table of strings to (untyped), and store generically useful data in a self-describing format that can be easily read with external tools. Save version, game version, whereis-block, are the big ones.

Storing the save version in a separate block doesn't give us anything. I moved the game version, player's name and such to a chunk of its own. Whereis data is happy in its current place where it can be retrieved without running Crawl/sqlite/whatever.

* More on a pie-in-the-sky note, write-ahead command logging! If you store the state of the RNG in the save file, then if you restore the same save twice and run the same commands, the same things will happen. So, we don't actually have to rewrite the entire level and character info - we only need to store the commands which have been entered since the last full save. The practical upshot of this is that we can autosave every few dozen turns for only one sector's worth of I/O. This will be great for people with unstable computers. It will also allow the SIGHUP handling to be made much less hackish.

Good idea. That would be implausible without transactions that can ensure a checkpoint is consistent, but we now have that, so if someone would want to implement this, it would be great! The save format would need to be extended to allow non-transactional chunks, but a buffer like this is an obvious addition. Too bad, the biggest problem lies in that we can't store keypresses – they depend on local settings and are incompatible with tiles. We'd need a way to describe commands, but this problem is already shared with the repeat command.
We'd have to ensure all calls to the random generator that affect the game are separated from UI stuff, too. This is partially done, but not really enforced in any way, so there's a fair bit of work involved.

Save files are not the only way Crawl stores data, and not the only one which needs revision. I can think of nine places Crawl stores information that potentially need revision:

* (1) Compiled map descriptions. These are currently stored in a custom binary format, with extremely fiddly locking requirements; they prevent Crawl from being updated with processes running, or more generally any .des change. I have not done sufficient research on these, but Darshan wants them improved, so I'll do something here.

* (2) Compiled speech and description databases. These are already stored in SQLite, but using ndbm compatibility wrappers, so the code is a lot more complicated than it needs to be. I will try to move as much of the existing logic from the Crawl database layer into the schema where it belongs.

* (3) Newgame pref files. These are stored in the same format as .crawlrc files, and are read by the same parser; they keep your old name and last chosen race/class.

* (4) Whereis records. These files are stored one per player and provide information which the servers use to learn user location. These bug me for some reason but I don't have a better idea.

* (5) Logfile. This self-describing append-only file records details of every game finished on a computer; it scales easily to millions of entries and can be replicated with a simple wget -c. Used for all of the servers' statistical analyses. This is probably the best designed of the current files; it's basically immune to crash damage, is indefinitely extensible, and compatible with standard text tools.

* (6) Milestones. This file is equivalent to the logfile, but for events other than deaths. Other than merging it with the logfile (which isn't worth the disruption, IMO), I see nothing to do with this.

* (7) High score table. This file is formated much like the logfile, but is kept in sorted order and is rewritten after every death. It is the probable explanation of the lag on death for server games. It also occasionally gets corrupted, causing major problems for the servers as subsequent games crash on end. This is a very good candidate for moving to an indexed transactional database; it will be possible to update the scores in logarithmic time then.

* (8) Saved game fragments. Covered above.

* (9) Bone fragments. Much like savefiles and will use the same system, at least initially. I have some ideas to take advantage of the ability of a database to store non-unique data, storing each ghost in a separate record; this will make the ghost reuse proposals easier to implement.

No, bones bear no similarities to saves whatsoever. They used tags, but that was the only thing in common, and now even that is gone. Unlike Nethack which stored the entire level, Crawl's bones contain only a simple static structure that can be at most repeated for multiple ghosts. In fact, ghosts are pretty close to score entries, and share some properties if we'd want to ensure only ghosts that actually got killed get removed.
Logged in as: Anonymous (VIEWER)
dcss/brainstorm/internal/save_files.txt · Last modified: 2011-12-20 20:01 by XuaXua
 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki