Copy Link
Add to Bookmark
Report

An explanation of Halo Map Files

xbox's profile picture
Published in 
xbox
 · 13 Jun 2020
Halo Combat Evolved for xbox
Pin it
Halo Combat Evolved for xbox

Ok, this tutorial is LONG overdue. In this tutorial, I intend to explain how the halo mapfile format works. I will try to point out the differences between Halo Xbox and HaloPC where I can. My target audience is advanced modders, but intermediate modders will get some good info out of it as well. After reading this, you should get a general idea of how the mapfile works, and a better understanding of why some things are corrupt, or why your rebuild doesn't work. Ok, so on with the info dump...

Table of Contents

MAPFILE ORGANIZATION

  • Header
  • Tag Index Header
  • Scenario
  • Supporting Tags
  • Raw Model Data
  • Raw Bitmap Data
  • Raw Sound Data
  • BSP(s)


THE MAPFILE HEADER

  • what it does
  • xbox and compression, fixed length
  • header format


THE TAG INDEX HEADER

  • model raw data offsets
  • magic
  • tag count
  • base tag
  • tag ids


THE SCENARIO

  • header and reflexives
  • scenario defines everything in the GAME, everything else is supporting
  • scenario spawns
  • high level tags
  • scenario reference pools
  • bsp data


SUPPORTING TAGS

  • metadata...what it really is
  • major tag types
  • recursive types
  • tag hierarchy
  • raw data


MODEL RAW DATA

  • model tags
  • how the model raw data works
  • submeshes and submesh headers
  • LODs and variations
  • differences in offsets for xbox and pc
  • raw data corruption with model injection and map rebuilding


BSP

  • what is a BSP
  • how the bsp is found
  • most difficult part of reverse engineering halo
  • bsp tags having a zero offset
  • collision bsp and visible meshes

DIFFERENCES BETWEEN HALO XBOX and HALO PC

There are great number of differences between xbox and pc maps. Of course, the version number in the header is 5 for xbox, and 7 for PC. People often say that xbox maps are fixed length, so you can't really add anything to them. This is partially correct, but mostly wrong. It is true that cache files are all the exact same size. However, all of the mapfiles in their "cache" format use garbage padding from the end of the useful halo data to the end of the cache file. When halo runs, it decompresses the map files on the original game disk onto the harddrive. The data here is usually several megabytes shorter than the cachefile length, so the rest of that "slack space" from the end of the useful data to the end of the actual cache file is garbage. The length of the decompressed file is defined by the mapfile header.

Another major difference between the PC and Xbox versions is a lot of the floating point data on the xbox version is in an encoded/compressed format. This is to save disk space and memory on the console, which is much more restrictive than on a normal PC capable of running halo. Since the PC has much more free memory, it is more efficient to use up more disk space and avoid converting all those compressed floats (this saves CPU cycles by avoiding the conversion altogether). In fact, this is the major difference between PC models and xbox models. It would probably be possible to someday convert PC models to xbox.

The biggest difference between PC and xbox is that xbox cache files contain ALL of the necessary resource to run a map. On PC, the raw data for sound and bitmaps are moved into separate .map files to save on disk space. This is the major reason that rebuilding with HMT works on PC, but not on xbox. The code to handle the raw data is not perfect, and it is still not perfectly understood.

MAPFILE ORGANIZATION

The mapfile consists of thousands of tags and some headers. However, there is an organization to the data. If you were to run a utility like Filemon (www.sysinternals.com) you could watch HaloPC's file activity in detail. You would notice that it reads 4 major sections:

  • The mapfile header
  • The Tag index and metadata
  • The model raw data
  • The BSP(s)

The layout of these structs looks like this:

Header | BSP(s) | Raw Data | Tag index and meta

--The mapfile header--
The mapfile header is the first thing that gets read when halo loads a map. The header format looks like this:

typedef struct STRUCT_MAPFILE_HDR 
{
int id;
int Version;
int decomp_len;
int Unknown1;
int TagIndexOffset;
int TagIndexMetaLength;
int Reserved1[2];
char Name[32];
char BuildDate[32];
int MapType;
int Unknown4;
int Reserved2[485];
int Footer;
}MAPFILE_HDR;

Some parameters of interest:

  • decomp_length - this is the length used by the compression algorithm to decompress the mapfile.
  • TagIndexOffset - this is the offset to the Tag Index Header.
  • TagIndexMetaLength - this is the size of the tag index and meta combined.
  • MapType - determines that the map is single player, multi-player, or User Interface
  • Version - xbox = 5, pc = 7
  • id = "head"
  • Footer = "foot"

The header is never compressed, it is always 2048 bytes (0x800 bytes). On the xbox game disk, the header is uncompressed, but the data following it is zip compressed using the zlib open-source compression library. For HaloPC, nothing is compressed.

So Halo reads in the map header. If we are talking about xbox, Halo then decompresses the map data into a cachefile and appends garbage to make the mapfile fit into one of the xbox-defined cachefile lengths.


--The tag index header--
Next, Halo reads in the Tag Index Header and the Tag Metadata. The Tag Header tells Halo how many tags there are in the index (usually there are thousands). It also tells the offset and length of the model raw data.

typedef struct STRUCT_INDEX_HDR 
{
int index_magic;
int BaseTag;
int unknown2;
int tagcount;
int vertex_object_count;
UINT ModelRawDataOffset;
int indices_object_count;
UINT indices_offset;
int ModelRawDataSize;
}INDEX_HDR; /* index_header_t */

Ok, this structure requires a bit of explaining. Aside from the mapfile header, the index header is the roadmap to a Halo mapfile. It tells you where the model raw data begins, how many tags are in the map, and it contains one of the critical components for calculating “magic”. For those of you that don’t know, magic is a special number that you subtract from those raw offset numbers in Halo. On xbox, if you see a number like 0x80245CA4 – that number is very likely an offset (numbers on 4-byte boundaries that start with 0x8). On HaloPC, the numbers are more like 0x423A8C40 (staring with 0x4). You might ask yourself why wouldn’t they just use the correct offset to begin with? Well, there is a very good reason.


--An explanation of magic--
The biggest reason that Halo loads like 1000% faster than other games is that Halo reads the mapfile straight into memory. Those “magic” offsets can easily be converted to memory pointers, which means that when you read the mapfile in, you don’t have to do excessive memory setup. This is a huge advantage. If you allocate memory only once instead of 6,000 times, and you don’t spend your time doing file opens, closes, seeks and reads, you will save a lot of time.

There is one problem with this setup. When you allocate a memory buffer, you can’t control where the memory is located. This means after you read in the data, your object located at 0x80245CA4 is not in a valid memory address. It has to be adjusted, as does every other raw offset in the mapfile. You do this by adjusting the offset:

Memory_offset = raw_offset – calculated_magic + memory_base_pointer

Ok, so you might ask yourself why bother with magic at all? Why not just keep the values at the actual file offsets and add the memory base pointer directly?

Memory_offset = actual_offset + memory_base_pointer

I mean, if the engine knows where all the offsets are anyway, why bother adding a huge number (like index_magic = 0x80004380) to all the values?

There is a very simple explanation for this. It has to do with map creation and the tools for creating mapfiles. I’ve told you already that mapfiles are composed of the following major components:

  • The mapfile header
  • The Tag index and metadata
  • The model raw data
  • The BSP(s)

Well, the artists are continuously tweaking the bsp, the models and the scenario. Now, I don’t know how long it takes to compile a map, but I would guess that it takes several minutes at least. BSP compiling could probably take hours. What the designers want to avoid is unnecessary holdups in the art pipeline. If someone changes a model, you don’t want to recompile the entire map, just the part with the models. One problem: When you change a model or a bsp, the sizes of these sections can change depending on what you are working on:

  • The Tag index and metadata
  • The model raw data
  • The BSP(s)

This means that the offsets inside the mapfile must shift everytime you make a minor change to a model’s size, or if you add a little bit to the bsp, or whatever. Well, to avoid this problem altogether, Bungie uses magic.

Here is an example:

Map Component          Original Map           Modified Map 
Mapfile header offset 0 0
Mapfile header size 0x800 0x800
BSP offset 0x800 0x800
BSP size 0x10000 0x11000
Model raw data offset 0x10800 0x11800
Model raw data size 0x1000 0x1000
Tag Index/Meta offset 0x11800 0x12800 <----overflow
Tag Index/Meta size 0x30000 0x30000

You can see that the model raw data size has grown and started to overwrite the Tag Index/Meta. To fix the problem, all you have to do is shift the Meta and the Index up. Without magic, you would need to modify all of the offset references in the meta (which could be 10,000 or more). With magic, all you have to do is adjust the magic number and you are done.

With magic, adding new assets to maps takes very little time and the artists can concentrate on what they do, rather than wait for map compilations.

This would also be a good time to mention the slack space in the scenario (slack space is where you see all of the 0xCACACACA garbage). The scenario is the tag which defines the placement of scenery, vehicles, dynamic lights, devices…a number of things. The slack space is there so the same problem can be avoided. The level designers can add to the scenario without recompiling the map because there is room to grow.

Ok, so last time we discussed the mapfile header, the layout of mapfiles, and what “magic” means to offsets. Now we start to get into the really interesting stuff...the tags.

Tags are the basic building blocks of Halo. Each map is composed of thousands of these blocks of data. Examples of tags are vehicles, bitmaps, sounds, effects, physics, shaders, and scenery – there are about 80 different types of tags. If you recall the mapfile layout discussed earlier:

Header | BSP(s) | Raw Data | Tag index and meta

We touched briefly on the tag index header, which basically tells Halo how many tags there are in the tag index. In the tag index header, there are 3 fields of particular interest for this discussion: index_magic, BaseTag, and tagcount.

typedef struct STRUCT_INDEX_HDR 
{
int index_magic;
int BaseTag;
int unknown2;
int tagcount;
int vertex_object_count;
UINT ModelRawDataOffset;
int indices_object_count;
UINT indices_offset;
int ModelRawDataSize;
}INDEX_HDR; /* index_header_t */

The field called index_magic is used to calculate the overall map “magic”. That, in turn, is used to calculate the correct file offsets found in metadata. The tagcount field, obviously, tells us how many tags there are in the map. The last field I refer to, called “BaseTag”, is somewhat interesting.

If you’ve ever gone through mapfiles with a hex editor, you know about tag-swapping. You might go through a mapfile and find a value like “E1880014” (in Blood Gulch, this is the “scenery\rocks\boulder\shaders\boulder” tag. This number is what we like to call a “Tag ID”. Now if you are an experienced modder at all, you know that you can swap a tag id with another tag id of the same type (sometimes of different types) and you can do some rather simple, but interesting things. Well, the “BaseTag” is the number at which these tag id’s start. For Blood Gulch, this number is “E1740000”. This number varies from map to map. You will notice an interesting property of the tag id and the base tag occurs when you subtract one from the other:

E1880014 - E1740000 = 00140014

If you’ve used the “Offset List Export” feature of SparkEdit, you will notice that this tag is the 0x15th element in the tag index. If you were referencing the array in C, it would be TagIndex[0x14]. Pretty cool eh? Well, actually, in a few cases this property breaks down. The upper 16 bits of that number do not always equal the lower 16 bits, and the reason why is still a mystery. Some of us guess that it may be some kind of “revision” number to keep track of assets, but we don’t really know.

What you need to learn from this rambling is that 32-bit numbers that begin with “E” are almost always Tag IDs. This will become more important when I discuss metadata in a moment.

Ok, so Halo reads the index header, followed by reading in “tagcount” number of index items. Usually there are 2000-3000 tags for multiplayer maps in the index. SP maps can have far more. Each item in the tag index has the following format:

typedef struct STRUCT_INDEX_ITEM 
{
char tagclass[3][4];
int tag_id;
UINT stringoffset;
int offset;
int zeros[2];
}INDEX_ITEM;

The “tag_id” is what we discussed before, the “E” number. The stringoffset field is a raw offset that points to the location in the mapfile (after accounting for magic of course) that contains the ascii name of the tag (i.e., “scenery\rocks\boulder\shaders\boulder”). tagclass contains up to 3 4-character descriptors of the tags purpose. Remember when I said there were about 80 different types of tags? This determines what type of tag the index item is. For instance, “vehi” is a Vehicle tag, “bitm” is a bitmap tag, and “scen” is a scenery tag. Now tags can be members of a larger class of objects. This means that they share the same kind of properties. If we look at the output from SparkEdit to compare some different tags:

343 vehi  unit  obje E4B70343  vehicles\ghost\ghost_mp 
2FF proj obje ÿÿÿÿ E47302FF vehicles\scorpion\tank shell
093 bipd unit obje E2070093 characters\cyborg_mp\cyborg_mp

The first 3 character strings describe the object’s classes.
If you’ve ever wondered why you can swap some objects with other types of objects, this is why (or at least that’s the theory). You can shoot MC’s out of a pistol because they share the “obje” class. Or you can swap vehicles with scenery for the same reason.

I suppose this would be a good time to discuss tag hierarchy. If you examine the output from SparkEdit some more, you can see a pattern to the way tags are organized:

===================================================================== 
12 scen obje ÿÿÿÿ E1860012 scenery\rocks\boulder\boulder
meta_offset: 008EBA18
meta_size: 00000204
IndicesStart: 00851544
IndicesEnd: 00851594
VerticesStart: 007F9994
VerticesEnd: 007F9EE4
ModelTag: E1870013 scenery\rocks\boulder\boulder
CollisionTag: E18A0016 scenery\rocks\boulder\boulder

----------------------------------------------------------------
13 mod2 ÿÿÿÿ ÿÿÿÿ E1870013 scenery\rocks\boulder\boulder
meta_offset: 008EBC1C
meta_size: 000002FC
IndicesStart: 00851544
IndicesEnd: 00851594
VerticesStart: 007F9994
VerticesEnd: 007F9EE4

----------------------------------------------------------------
14 soso shdr ÿÿÿÿ E1880014 scenery\rocks\boulder\shaders\boulder
meta_offset: 008EBF18
meta_size: 000001B8
BaseTextureTag: E1890015 scenery\rocks\boulder\bitmaps\boulder

15 bitm ÿÿÿÿ ÿÿÿÿ E1890015 scenery\rocks\boulder\bitmaps\boulder
meta_offset: 008EC0D0
meta_size: 000000DC

16 coll ÿÿÿÿ ÿÿÿÿ E18A0016 scenery\rocks\boulder\boulder
meta_offset: 008EC1AC
meta_size: 000012C4
=====================================================================

You will notice that the tags are grouped together in a special way. The high-level tag, the scenery object for the boulder, is grouped with all of the supporting objects that go along with it. These include the model (mod2), the shader(soso), the bitmap(bitm) and the collision model(coll). This is everything you need to use this particular scenery tag in the game.

If you’ve used the “recursive” feature on HMT, this is what it does. It takes the high level tag you want to use (the “boulder” scenery object), then it finds all the supporting tags it needs to make the scenery object work properly.

You will also notice that complex objects almost always have multiple tag classes (“scen” and “obje”). Supporting tags have only one class (such as a bitmap). Then there are tags that are intermediate...not simple, but not high level either. A good example is the “soso” tag, which is a model shader. The boulder model depends on the “soso” tag, and the “soso” tag depends on the bitmap tag. If you notice, the bitmap tag for the shader immediately follows the shader itself. Spend some time examining offset lists, and you will see how these recursive dependencies work. It’s pretty cool stuff, and it’s one of the reasons HMT rebuilding is so **** impressive.

The last thing I would like to discuss is the “offset” field of the INDEX_ITEM structure. This is an important field because it points to the metadata offset for that tag. Yes, the infamous metadata. Well, metadata is not a big deal, its just a word we use to describe binary data that describes the properties of the tags. Each tag has its own data structure that tells Halo everything about that tag. For example, The metadata for a warthog describes the shaders it uses, the model, the antenna, the physics, the collision model...the list goes on. The reason that everyone gets so excited about metadata is because you can do some really cool stuff with it like change weapon rate of fire, swap shaders, or change the projectiles of a weapon. Tons of stuff.

I’ve made a diagram showing the hierarchy of relationships in the mapfile. The colors are coded to show where each “section” exists and its role in the hierarchy.

An explanation of Halo Map Files
Pin it

The top part of the diagram shows the physical layout of the file and the major sections. Note, this isn’t exactly to scale, but it’s pretty close for PC. In xbox maps, the raw data section is larger because it contains not only model raw data, but bitmap and sound raw data as well. In HaloPC, this data is moved to separate shared mapfiles.

So I think the best way to do this is just walk you through the hierarchy of the mapfile.

  1. Halo reads in the Mapfile Header. It points to the Tag Index Header.
  2. Halo reads in the Tag Index Header. It describes the Tag Index Items (there are usually 2000 to 3000 of these items).
  3. Each tag index item points to metadata. This is true for ALL tags (except the BSP, which is a special case).
  4. For most tags, it ends with metadata. For models, bitmaps and sounds, the metadata will point to additional data that we call “raw data”. Raw data is just a name we use to describe additional data that supports the tag. More on this later.
  5. One special tag is located at the front of the tag index...the scenario tag (“scnr”). This special tag defines where the BSP is, as well as what is contained in the map. I will be discussing the scenario later as well.

Ok, so each tag has metadata. And some types of tags have raw data. We defined metadata above, but I want to explain something a little further – something called a REFLEXIVE. I know we talked about reflexive offsets before, but this is a little different. Most of the tags have metadata containing reflexives. I have created an example tag containing 2 reflexives. A REFLEXIVE structure is a group of 3 32-bit numbers:

typedef struct STRUCT_REFLEXIVE 
{
UINT Count;
UINT Offset;
UINT unknown;
}REFLEXIVE;

In the example metadata, lets say there is a header at the beginning of the data, and somewhere buried in that header are 2 reflexives. They have a format like this:

reflexive1.Count = 3 
reflexive1.Offset = 4721000
reflexive1.unknown = 0

reflexive2.Count = 2
reflexive2.Offset = 4721600
reflexive2.unknown = 0

The metadata looks like this:

An explanation of Halo Map Files
Pin it

After accounting for the magic in our reflexive offsets, the raw offsets could be something like this (we don’t care about the unknown stuff):

reflexive1.Count = 3 
reflexive1.Offset = 321000

reflexive2.Count = 2
reflexive2.Offset = 321600

What this means is that, starting at location 321000, there is an array of 3 blocks of data. How big is each block? Well, that is simple to calculate (in most cases):

Block 1 size = (reflexive2.Offset - reflexive1.Offset)/reflexive1.Count 

Block 1 size = (321600 – 321000)/3 = 600/3 = 200

So each of the three blocks in Block 1 are 0x200 bytes long. Now most of the time this works, but sometimes you have nested Reflexives which complicates things. For example, if each of the blocks in “Block 1” contained their own reflexives, then you would have to figure that part out first and work backwards.

You also run into a new problem: what about the last reflexive? How do you determine its size? Well, most of the time you just use the start of the next tag’s metablock (the next item in the tag index).


----Raw Data----
Ok, so you probably have a good idea about metadata and how it works now. What about raw data? Sometimes the metadata in tags points to data that is not in the metadata section. This is Raw Data. Raw data is additional data that helps define the tag.

There are 3 kinds of raw data that we know about. Raw data for sounds, bitmaps and models. On xbox maps, the raw data for sounds and bitmaps comes after the BSP but before the model raw data. On PC, they moved the raw data for sounds and bitmaps into their own datafile to save on hard disk space. This is the reason why if you patch a bitmap on PC, you affect all maps that use that bitmap. On xbox, however, this is not the case. You can patch whatever you want, and it cannot affect other maps.

There is another difference between PC and xbox. On xbox, there can be an intermediate “reflexive” that the metadata points to. This intermediate reflexive, in turn, points to the raw data.

metadata -> intermediate reflexive -> raw data

I seem to remember the model vertex data on the xbox being this way. There may be other situations where this also occurs, but in any case it’s something you should be aware of if you are working on xbox maps.

On PC, it is much simpler...there is no intermediate reflexive, the meta points to the raw data directly.

So what is in this raw data? For sounds, the meta is like a soundfile header, and the raw data contains the compressed audio. For bitmaps, it is a similar situation, the meta contains a modified DXT (a type of bitmap used by DirectX) header and the raw data contains the compressed image data.

-Grenadiac

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT