msdos dissasembly

Any developer realated stuff
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

If I start the game and take a memory dump before I enter the city, the map isn't loaded into memory yet.
If I enter the city, my debugger tells me following files are accessed:
bard.exe
K:\BARD.EXE
bard.exe
CITY.PATH
bard.exe
CITY.NAME
BIGPIC

Strange thing, CITY.PATH and CITY.NAME do not exist in my bards folder.. there is a CITY.PAT and CITY.NAM file however.

Both CITY.PAT and CITY.NAM start with 00 00 03 84 00 00
the LEVS file starts with 00 00 00 40 00 00 followed by similar code to point to the location of the different dungeon levels. The actual level starts with a 00 00 08 00 00 00 (as described earlier in this thread)
User avatar
ZeroZero
Posts: 286
Joined: Tue Mar 10, 2009 9:10 pm
Location: Germany

Post by ZeroZero »

Ok, DOS truncates the file extension to 3 letters, thats why.
SInce the wanted name matches into DOS'es 8.3 convention,
it is no problem to call such a file.
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

So, that leaves us finding a way to decode the various files. Preferably, the BIGPIC and LEVS file.

incidently, I took a memory dump of the game just before entering the city and one just after entering the city.
I ran a file compare on both memory dumps and there were some interesting differences.

there's a big difference in code starting at 039c:7c95 and ending at 039c:8f80... no idea what it represents
There's an interesting difference at 039c:1bc70, where a sequence of code blocks start:
039C:1BC70 00 00 00 00 23 73 23 27 27 27 2B BB BB BB BB BB
039C:1BC80 BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BC90 BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BCA0 BB BB BB BB B6 66 11 6E FE FC 6C 66 32 72 32 32
039C:1BCB0 72 72 32 BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BCC0 BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BCD0 BB BB BB BB BB BB BB BB BB BB BB BB 66 61 16 EF
039C:1BCE0 EF E6 C6 C6 23 77 73 27 27 27 23 2B BB BB BB BB
039C:1BCF0 BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BD00 BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BD10 BB BB BB B6 66 16 65 FE FE CC 6C 66 32 77 32 32
039C:1BD20 37 22 32 32 BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BD30 BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB BB
039C:1BD40 BB BB BB BB BB BB BB BB BB BB BB
...

at 039c:1fe29, the whole CITY.NAM file gets loaded. I think the CITY.PAT file got loaded there as well, but overwritten with CITY.NAM once it got processed.

Ofcourse, the decoded city map can be found in my 2nd dump, as well as the street map. Both of which are not in the 1st dump.

At 039c:2ebda, the .TPW files get loaded (without the actual name of the character). Since I can only find 1 .tpw file in my memory dump, I believe they all get loaded at the same location and overwritten.

Still want to decode those BIGPIC and LEVS files :lol:
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

Just like the LEVS file, the BIGPIC file starts with an offset into itself. This time it's 63 offsets of 4 bytes, pointing to address locations within the BIGPIC file.
Each offset starts with an 8-byte serie, starting and ending with 2x 00 bytes and in between different data.
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

Managed to open the extracted bard.exe with:

http://www.hex-rays.com/idapro/

Seriously need to learn assembler language :roll:
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

If you open the dosbox debugger you'll get 2 screens: the debugging screen and the actual DOSBox.
If you mount your gamefolder in DOSBox and then type:
> debug bard.exe

you enter a debugging mode. Now you switch to your debugging screen and you type the following:

> log xx (where xx is a value of how many CPU cycles you want to log)
If I use log 100000, then in my DOSBox program folder a logfile of 256mb gets created. This file contains the actual assembler codes being used at the time of logging.

Trying to create such a log file before entering the city to see how CITY.PATH and CITY.NAME gets processed by CPU.

EDIT: more info at http://kannan.jumbledthoughts.com/index ... assembler/
Maven
Posts: 138
Joined: Sat Apr 16, 2011 9:39 pm

Post by Maven »

About decoding the levs file...

As I recall, the Huffman coding tree is the first thing in the data for each level. You have already figured out the offsets where each level is located. The first part of the data for each level is the tree info, followed by the actual data.

Unfortunately, I had a hardware crash several months ago, so my program for decoding the maps and printing them out is on a hard drive that isn't normally accessible. However, I periodically need data from that drive, so I hook it up and suck data off. Next time I hook it up I'll try to find that program as well.
Maven
Posts: 138
Joined: Sat Apr 16, 2011 9:39 pm

Post by Maven »

The first 64 bytes of the levs file are offsets where the data for each level begin.

Code: Select all

000000000 00 00 00 40  00 00 05 AE  00 00 0B CE  00 00 11 DF
000000010 00 00 18 06  00 00 1D EE  00 00 24 02  00 00 2A 2B
000000020 00 00 30 2D  00 00 36 2B  00 00 3C 20  00 00 42 3A
000000030 00 00 48 90  00 00 4E C2  00 00 55 11  00 00 5A D1
The offset for level 9 (Third level of the castle) is 362B, and it ends where the next level begins, at 3C20.

Code: Select all

000003620                                    00  00 08 00 00
000003630 00 2F 68
The first 4 bytes of the levs data are the size of the decompressed data, in bytes. So Level 9 is 2K when decompressed. The next 4 bytes are the size, in bits, of the compressed data. So level 9 is 2F6B bits long.

Next is the Huffman tree. The bits are used highest order first. The algorithm is pretty easy:
If the bit is a zero, add another zero to the code.
If the bit is a one, the next 8 bits are the value associated with that code. The next bit after that will be a zero, and can be ignored. Drop all the ones off the right end of the code and change the last zero to a one.

So here's what level 9 looks like:

Code: Select all

000003630          01  5A 4E 0F 63  10 45 09 89  13 43 89 00
000003640 39 4E 41 DC  75 86 19 64  36 4D 70 A1  A6 63 F2 F1
Bit 0 is a 0. Code = 0
Bit 1 is a 0, Code = 00
Bit 2 is a 0, Code = 000
Bit 3 is a 0, Code = 0000
Bit 4 is a 0, Code = 00000
Bit 5 is a 0, Code = 000000
Bit 6 is a 0, Code = 0000000
Bit 7 is a 1. the next 8 bits are 5A. The code for 5A is 0000000.
Drop all the 1's (there are none) and increment. Code = 0000001
Bit 10 is a 0. Ignore it.
Bit 11 is a 1, the next 8 bits are 38. The code for 38 is 0000001.
Drop all the 1's and increment. Code = 000001.
Bit 1A is a 0. Ignore it.
Bit 1B is a 0, Code = 0000010.
Bit 1C is a 1, the next 8 bits are EC. The code for EC is 0000010.
Drop all the 1's on the right (there are none) and increment. Code = 0000011.
Bit 25 is a 0. Ignore it.
Bit 26 is a 1, the next 8 bits are 88. The code for 88 is 0000011.
Drop the 1's off the right and increment. Code = 00001.
Bit 2F is a 0, ignore it.
Bit 30 is a 0, Code = 000010.
Bit 31 is a 1, the next 8 bits are 14. The Code for 14 is 000010.
Drop all the 1's and increment. Code = 000011.
...

Code: Select all

000003700 1E 41 0B 20  E4 7D 1C 7B  C5 45 34 37  4C 09 4F 51
000003710 10 08 20 D6  30 D6 30 D6  08 DD AE BC  7D 3C 1C 2B
Bit 6D7 (the last bit of the 09 at byte 370D) is a 1, the next 8 bits are 4F. Code for 4F is 1011110.
Increment. Code = 101111.
Bit 6E0 is a 0, ignore it.
Bit 6E1 is a 1. the next 8 bits are 44. Code for 44 is 1011111.
Drop all the 1's off the right end and increment. Code is 11.
Bit 6EA is a 0. Ignore it.
Bit 6EB is a 1. Next 8 bits are 00. Code for 00 is 11.
Drop all the 1's and we're done with the tree.

The actual data starts at bit 6F4, which is half way through the 08 at byte 3711. Decoding the data is even easier, as long as you have the tree built. You just add one bit at a time until you find a match.

Bit 6F4 is a 1. Code = 1. No match.
Bit 6F5 is a 0. Code = 10. No match.
Bit 6F6 is a 0. Code = 100. No match.
Bit 6F7 is a 0. Code = 1000. No match.
Bit 6F8 is a 0. Code = 10000. No match.
Bit 6F9 is a 0. Code = 100000. No match.
Bit 6FA is a 1. Code = 1000001. No match..
Bit 6FB is a 0. Code = 10000010. No match.
Bit 6FC is a 0. Code = 100000100. Match = 65
Bit 6FD is a 0. Code = 0.
Bit 6FE is a 0. Code = 00.
Bit 6FF is a 0. Code = 000.
Bit 700 is a 1. Code = 0001.
Bit 701 is a 1. Code = 00011.
Bit 702 is a 0. Code = 000110.
Bit 703 is a 1. Code = 0001101.
Bit 704 is a 0. Code = 00011010.
Bit 705 is a 1. Code = 000110101. Match = 64.
etc.
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

That's pretty amazing, Maven. Thanks for posting! :D

I tried the same with the CITY.PAT file (since it's not such a big file)

so,...

00 00 03 84 00 00 08 F8; 900 bytes decompressed and 8F8 bits long

Huffman tree next:

translate
10 34 08 80 82 D0 35 0D 08 C4 B0 26 6D
91 58 04 48 23 88 59 78 4A D7 14 85 04 40
in binary (already grouped together to get the codes):
0001 {00000011} 01 {00000010} 001 {00000001} 000001 {01101000} 0001 {10101000} 01 {10100001} 0001 {10001001} 01 {10000001} 001 {10011011} 01 {10010001} 01 {01100000} 0001 {00010010} 00001 {00011100} 01 {00001011} 001 {01111000} 01 {00101011} 01 {01110001} 01 {00100001} 01 {00000100} 01 {00000000}
gives me:
000 = 03
001 = 02
010 = 01
0110000 = 68
011000100 = A8
011000101 = A1
0110001100 = 89
0110001101 = 81
0110001110 = 9B
0110001111 = 91
011001 = 60
0110100 = 12
0110101000 = 1C
0110101001 = 0B
0110101010 = 78
0110101011 = 2B
01101011 = 71
011011 = 21
0111 = 04
1 = 00
All the values are there:
01, 02, 03 and 04 for houses, 68 for gates, 0B for the guild,...
I'm trying to use these codes now to recreate the map with the rest of the code.

EDIT: had to correct, since I translated to binary wrong :evil:
User avatar
ZeroZero
Posts: 286
Joined: Tue Mar 10, 2009 9:10 pm
Location: Germany

Post by ZeroZero »

@ Maven:

EXCELLENT job, tyvm!!!

By any chance, are you able to provide the pseudo code to build
the hufman table when encoding?
Maven
Posts: 138
Joined: Sat Apr 16, 2011 9:39 pm

Post by Maven »

@ZeroZero, do you mean decide which Huffman codes to assign to which data bytes, or build the bit stream to put in the file once the codes have been assigned, or both?

Are you planning to build your own dungeons? That could be pretty fun, but you're limited to specials. They're hard coded in the IBM PC version.

@Caracas, one interesting thing to look at after you have decoded the levels... Text has the high order bit set. If you look at the data you dumped from the third level of the Castle, and drop the high order bits, you can see the text messages that appear when you enter certain squares.

Code: Select all

C1 A0 F3 E9 E7 EE A0 EF EE A0 F4 E8 E5 A0 F7 E1
EC EC A0 F2 E5 E1 E4 F3 AC A0 A2 D4 E8 E5 A0 C2
E1 F2 F2 E1 E3 EB F3 AE A2 DC
A sign on the wall reads, "The Barracks"\
User avatar
ZeroZero
Posts: 286
Joined: Tue Mar 10, 2009 9:10 pm
Location: Germany

Post by ZeroZero »

@ Maven

Hi Maven,

I just mean to be able to recompress the file after it has been edited.
And yes, sadly the "events", as I called the specials that were called by loading small ML files on the C64 versions, are hardcoded in the DOS version.

However, city and dungeon maps could be rearranged, but the adventure as it cannot be extended/modified for the given reasons above.
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

I'm first going to focus on CITY.PAT file, which is the map for the city.

The Huffman tree I posted earlier seems to be correct. I am able to completely decode the file with that code.

That means I should be able to modify the city, but only in a limited way.
If I look for example at the different statues in the city: there is a samurai, a stone statue, a grey dragon,... but on the map they all have $60 as value, so the actual event will probably be hardcoded somewhere.

I'm quite sure that I'm able to change the different houses:
01 (first house) appears 109 times on the map
02 (second house) 105 times
03 (third house) 91 times
04 (fourth house) 87 times

01 is represented by 010 binary
02 is represented by 001 binary
03 is represented by 000 binary
04 is represented by 0111 binary
00 is represented by 1 binary (empty square)

In the city.pat file, address location $23 has $3A as value. The actual data of the city starts after the first 2 bits of the '3'
Following binary value can be found there:

Code: Select all

11 1010 0010 0000 0001 0100 0001...
(3) (A) (2) (0)  (1)  (4)  (1)
first bit we see is a '1', so we put 00 on the map
2nd bit is again '1', so another 00 on the map
3rd bit another '1', so another 00 on the map
next we get '010', which is a 01 on the map
next we get '001', which is a 02 on the map
followed by '000', which is a 03 on the map
another '000', so another 03 on the map
next we see '001', so another 02 on the map
...

This way, you can decode the entire map.
You can change the first '010' (01 on the map) to '001' (02 on the map)
This would then change you binary to:

Code: Select all

11 1001 0010 0000 0001 0100 0001...
(3) (9) (2) (0)  (1)  (4)  (1)
So, in the city.pat file, the $3A at address location $23 can be modified to $39, and you should see house number 2 instead of house number 1 in the northwest part of Skara Brae.

EDIT: I have not tested this yet, so I'm not sure about this :D

Also, if you would change the binary for house 1 (010) to the binary of house 4 (0111), you'll add an extra bit to your file. This wil completely change the hex values of you entire city.pat file.

One last thing: I'm not an expert in the Huffman encoding, but I don't think you can just start changing binaries endlessly. binary '1' (00 on the map - empty square) shows up 458 times. '010' (house 3) shows up 109 times. If you change the number of times they appear on the map you might have to create a new huffman tree since I think the code that appears the most has to be highest up in the Huffman tree.

EDIT2: ok, I just tested it and it worked :D
Changing $3A to $39 in address location $23, gives me a different picture of the house in the northwestern part of Skara Brae
Maven
Posts: 138
Joined: Sat Apr 16, 2011 9:39 pm

Post by Maven »

Actually, you don't really have to change the tree at all, unless you need to add values that aren't already in the tree. In the CITY.PAT file, for example, you can move houses and gates and stables around all you want. Even if you change the frequencies of the values, the tree will still work--it just won't be optimally compressed. Not that we're especially worried about being optimally compressed at this point. Hard drive space has become a lot cheaper since the 1980's. Besides, they could have saved a lot of space if they had padded 0's instead of all that garbage about OMNINET and such at the end of the maps.

In any case, it's almost trivial to put out the new tree once you have it built. You just walk your tree recursively. WalkTree(Root).

Code: Select all

void WalkTree(HUFFMAN_NODE *Node)
{
    if (Node->LeftLink != NULL)  // Is this node a leaf?
    {
        EmitBit(0);
        WalkTree(Node->LeftLink);
        WalkTree(Node->RightLink);
    }
    else
    {
        EmitBit(1);
        EmitByte(Node->Value);
        EmitBit(0);
    }
}
Except then you have to back off the last 0 bit.
Caracas
Posts: 89
Joined: Thu Jan 20, 2011 9:16 am
Location: Belgium

Post by Caracas »

Maven wrote:@Caracas, one interesting thing to look at after you have decoded the levels... Text has the high order bit set. If you look at the data you dumped from the third level of the Castle, and drop the high order bits, you can see the text messages that appear when you enter certain squares.

Code: Select all

C1 A0 F3 E9 E7 EE A0 EF EE A0 F4 E8 E5 A0 F7 E1
EC EC A0 F2 E5 E1 E4 F3 AC A0 A2 D4 E8 E5 A0 C2
E1 F2 F2 E1 E3 EB F3 AE A2 DC
A sign on the wall reads, "The Barracks"\
Hi Maven

cool stuff, thanks!
any particular reason why the high order bit is set? or is it just to confuse people?

I've also been looking at the BIGPIC file over the weekend.
There are 63 offsets in there. My guess is that that file contains all the different events. If I go fight the samurai statue, BIGPIC gets accessed 3 times. So I think that the $60 on the map points to an event in the BIGPIC file and based on coordinates on the map (?) a specific event is called?
Each event also has its own huffman tree and data. I need a way to decode stuff more quickly :)
Was able to create a script to get the codes out of the binaries after discovering the huffman tree.
Does anyone know of an application that can quickly translate lots of hex values to binary?

All in all, pretty amazing stuff thought.. I seem to be able to decode and even modify a lot of things in the game. Feels kinda weird after playing this game so many years ago. :D
Post Reply