Nemesis wrote:I am indeed working on a S2 disassembly. What makes this disassembly so different from just chucking the rom through IRApc? Well, the more informed here will know the fundamental problem with disassembling a rom like S2 for the genesis. That is that no disassembler can ever tell the difference between code and data. If you disassemble the rom wholesale, it treats everything as code, and you end up with 14MB or so of "code", only about 10% of which is real.
Ok, so why don't you just trace through the program from the start, and follow every possible path that the code might go? Well, one thing always screws that idea up. Take this for example:
CODE
jmp off_80CC(pc,d1.w)
This basically means "run some code at location (off_80CC + D1)", where off_80CC is an absolute location, and D1 is the current contents of the register D1. D1 could have anything in it at all. You can never know what D1 might be, and hence, where the code goes after this point. The value of D1 is only finally known at runtime, and is probably different at this point many times during the life of the program. This means if you try and decompile a rom this way, you'll end up with around 0.1% disassembled, and the rest being counted as data, because it can't trace any further.
So, that's what's wrong with conventional disassembly techniques. They can't be gotten around, and the only alternative is to work through the code by hand, figuring out yourself what is data and what is code, and that takes a long time. An automatic process can't figure it out accurately, and you can only really know where the program will go when it reaches one of those jumps at run-time. Well, we have emulators people. They're required to be able to follow program flow at runtime. They trace through these jumps.
The process:
1. I made a quick and dirty hack of gens to track the current value of the PC, which contains the current address of the code that is being executed. It tags each address when it is executed as code, maintaining a record of every address that contains real code as it excutes it while a rom is running.
2. I played S2 for a few hours, with vsync off, and running with frameskip. I tell ya, that really works up the reflexes. During that time, I took care to interact with every sprite in the entire game, skimmed over every level, died, continued, didn't continue, played through 2 player winning and losing for each player, etc, until I couldn't think of a single thing in the game I hadn't done.
3. I then got gens to output a nice little script for me that contained all the addresses that had been executed as code.
4. I fed this script (which was frigging masive) into IDA Pro, and it churns away for a few minutes, beginning a deep disassembly at each and every location that had been executed (including all those impossible to trace jumps), ensuring (around 500000 times over) that even any code paths in the rom that had not been executed, but it could find a way to get to, were disassembled, whether it had actually been executed while I was playing or not.
<b>The result? A very deep and thorough disassembly of S2, with absolutely no data incorrectly disassembled as code, and only requiring around 0.0001% of the time required to do it by hand.</b>
Once that was done, I started going through the whole disassembly line by line anyway, doing my own analysis. While doing this, I linked every offset and pointer, and traced through the few peices of code that did not get disassembled (and there really weren't that many). I've also been formatting all the datablocks of known structure properly, as well as marking the datablocks that are known. This disassembly can be converted to an ASM file easily, and when compiled, produces a rom identical in every way to S2. When I've finished, it'll be possible to easily relocate blocks in the rom, while still making it possible to recompile a working S2 rom.
This disassembly does a lot of things. First of all, it allows any unknown datablock in the game (such as sprite mappings, hardly any of which have been dug out), to be quickly located and identified. It also allows you to easily see what areas of code referance a perticular block of data, so when you're planning a more advanced hack, you have a better idea of what will be affected by your changes. Most of all, it gives you the freedom to not have to worry about sizes. Say you want to update a set of tiles that was compressed, and you find that your modified art is longer than the previous set of tiles. What do you do? Do you just insert the block of data, then manually fix up the 10000 pointers and offsets that are now incorrect? Impossible. Well, with all the offsets and pointers replaced by labels, you just hit compile, and there's no problems.
That's phase 1 anyway. It has a problem. All your data needs to be included in this one massive asm file, and it all needs to be formatted as data. That is simply not practical. That's where phase 2 comes in.
I'll be releasing two versions of the source. One is the complete, single file disassembly, in an IDA Pro file format. This'll allow you to do some analysis of the rom in IDA Pro easily, making use of it's quite powerful features. The second one will be in a raw asm file, with all the blocks of data in seperate files in thier raw form. Want to edit some peice of art? Edit the seperate file in whatever the hell you want, then just hit build, and you have a compiled rom, with the different datablock.
I'll be making my own little addition to the asm file. A nice directive that tells a program that'll write to format the data contained in the file at an already specified location, and insert in at that point. It's really quite simple. You can then feed that single, complete asm file into snasm68k, and out comes a modified S2 rom. Taking the data out of the asm file will also allow you to rearrange the order of data in the rom with ease. Just switch two lines, and you've just switched two blocks of data in the rom, and it'll still work. It'll make the asm file a million times smaller an easier to work with, and it'll make modifying the data quick and easy too.
This method can easily be applied to other sonic roms, and whatsmore, any genesis game. I'm not done yet, but when I am, this will all be available. In the mean time, I'd like to ask that anyone out there who works on any kind of sonic editors, to allow thier editors to load the data they work with from individual files. If you're made a palette editor, add a feature to allow the user to load a file that contains just the palette alone. If you've written a level editor, add an option to allow the user to specify each data element manually (eg, load a level file, load a 16x16 block mapping file, etc), and even better, allow for some kind of simple script file that tells your program what files to load for those elements, so someone can just open the EHZ definition file, and they'll have all the files that relate to EHZ. Most importantly, allow an option to save back to these individual files. No more having your program have to worry about where to place anything in the rom, you can just focus on editing the data itself. If you've already got a program that does this, saving back to the seperate files should seem easy by comparison.
Also, I know there are a few other people in this community who have done work on sonic disassemblies. Any documentation or sourcecode comments you have done will contribute greatly to this project. There's no way in hell I'm gonna analyse and comment all the S2 source. In fact, I'm not even planning to try and figure out the engine structure. If all the info out there can be combined into this single disassembly, it will make it a much more comprehensive resource. While I've started this disassembly, I'm open to input from the whole community to help document it. I don't want this to be "my" thing. Let's work on this one together. If you know some asm and want to chip in, comment the code for a sprite or something. It can be the death egg or a ring, it doesn't matter.
So yeah. This, if done properly, and supported by some utilities out there, will eliminate a massive amount of the work that goes into making a hack of a sonic rom, while opening up a whole new level of hacking. Want to program a sprite? It's much easier in asm than machine code, especially when you have all the other sprites to refer to. I expect to be releasing something this time next week. Maybe a bit earlier, maybe a bit later. Depends what happens. Still, it'll be quite soon.