Writing my own engine in C/C++, using no libraries, I wanted to go on a little tangeant, and be able to load FBX files, which are files used to store 3D models and animations invented by Kaydara, and now owned and maintained by Autodesk.
Autodesk do provide a C++ SDK that you can incorporate in your project , or even open projects on github , but the condition in my engine is to not use any outside code, so I decided to roll my own.
This is a closed but popular format, so I thought getting some information on it would not be so hard. Luckily, the blender team did a small post, 9 years ago, that is a good starting point, but other than that you’re kind of on your own, so I decided to write about it.
The FBX files exist either in text or binary mode, this article will look at the binary mode, which should be more compact.
Quick types overview
Before seing some code, a quick explanation on the types you’ll see in the code snippets, they all have the formats <letter><number> with letter being :
u for unsigned integer
s for signed integer
r for real (aka decimal numbers)
And the number meaning the number of bits used in the type.
So u8 will be a 8 bits unsigned integer, so a byte, and r64 will be a 64 bits real value (aka double in the C or C++ standard).
Make sense of this binary blob
The first thing to do is try to decode anything, and file formats usually have some kind of header. Thanks to the blender post, we know that this header is 27 bytes long with some sort of signature as the first 20 bytes, and a version for the last 4 bytes. Be sure to disable padding, or your compiler will insert some before the version field, to be 4 bytes aligned. On MSVC, you can do this with a #pragma pack(push, 1)
(and remember to disable it after with a #pragma pack(pop)
. Here what it looks like as a struct.
// 27 bytes
struct FBXBinaryHeader
{
u8 signature[21];
u8 reserved[2];
u32 version;
};
u8* fileContent;
// Magically fill fileContent by loading the file
FBXBinaryHeader* header = (FBXBinaryHeader*)fileContent;
The signature (magic number) should match “Kaydara FBX Binary “, a little bit long for a magic number, as files format usually use 4 or 8 bytes magic numbers (WAV, PNG, …) but nothing wrong. The two reserved bytes I don’t know the meaning of. Then 4 bytes for the version, for a total of 27 bytes, not a lot of useful informations for the bytes count, but we’re just starting.
FBX Nodes
To store all the different kind of informations (metadata, vertex, materials, textures, normals, uvs, mesh, …) the format use a single Node type, which can have childs, like a tree structure. Let’s look at the fixed size of these nodes :
// 13 bytes
struct FBXBinaryNode
{
u32 endOffset; // See warning, u64 for more recent format version
u32 numProperties;
u32 propertyListLen;
u8 nameLen;
};
Quick warning
This is based on an old FBX version, which used 32 bits values for offsets in the file, allowing a maximum file of 2gb. It’s my understanding that since version 7.5, it now uses 64 bits value for offseting, so you would now need to replace those 32 bits offsets by 64 bits read based on the header version to be correct for newer versions. In a matter of fact, even numProperties and propertyListLen become 64 bits in the newer versions, seems a little overkill, but I imagine some crazy big FBX file once used it.
First you have the global offset (so from the start of the file, not from your current location) for the end of this node, so if you do not want to parse this one, you can skip over. You then have the number of properties in this node, which are informations of any type. The propertyListLen is the size of the property block, so that once, again, you can skip over. After that, you’ve got nameLen bytes for the name of the Node (not 0 terminated). Here is a rough diagram of how this is layed out :
A Parameter is a structure consiting of a byte (u8) for the type of the data, followed by the data, which can be
if byte type is ‘D’ then a r64 value
if byte type is ‘L’ then a s64 value
if byte type is ‘I’ then a u32 value
if byte type is ‘F’ then a r32 value
if byte type is ‘Y’ then a s16 value
if byte type is ‘C’ then a u8 value
if byte type is ‘S’ or ‘R then the data will be a u32 length value, followed by length bytes
if byte type is ‘d’ or ‘i’ or ‘f’ or ‘l’ then the data will be array versions of the above, with a common header like :
struct ArrayParameter { u32 arrayLength; u32 encoding; u32 encodedLength; };
You first read the arrayLength value, which tells you how many values are in the array. You then have an encoding value, which, from what I saw can have two values, 0 or 1, meaning no encoding or zlib encoding. Follows 4 bytes for the size of the data, useful if the value is compressed, as you’ll have to deflate the encodedLength following bytes into your array. Luckily, I already implemented a version of the deflate Zlib function for my PNG parser, so no harm there.
And this is it for a node. After your node can come your child nodes, if they are in the range defined by your header, meaning their start adress is before your endOffset.
Now that we can parse the building block of the file, let’s see how it is architectured and how we can extract the information we need.
Data, where are you ?
Before diving into specifics, let’s have a quick overview of the file structure. We have five big blocks that we need to parse, interpret and link together :
Geometry, which contains Vertices, normals, and uv data.
Models, which contains all the transformations (rotation, scale and translation).
Materials, with different values for lighting purpose.
Textures, with path for the different images to use as textures.
Connections, which will contains all the different links to reconstruct the structure (which texture to use in every material, which geometry to use in each model, and the tree structure of the scene, linking models together)
Here is an overview of what data we want to extract, and under which section they are:
They have two important parameters, a s64 value corresponding to its id, and the Geometry name, useful when debugging or if you want to display it in a tool.
Under each Geometry nodes, there are a certain number of child Nodes.
The most important are :
Vertices which contains a raw Parameter containing all r64 vertice data, often zlib encoded
PolygonVertexIndex is an array of polygons/surfaces. This is a s32 array, which values corresponds to index into your vertices array. A surface is defined as a sequence of these indexes, terminated by a negative value. For example:
0 ,1, -3 defines a Surface. If your index is negative, then you just apply <index> * -1 - 1 to get back the correct index, so for the example, this will give you : 0, 1, 2, meaning the surface is composed of the first three vertices of your vertices array. Surfaces can have as many vertex as you want, so if you want only triangles in your renderer, you’ll have to triangulate your surfaces.
We then have 3 LayerElement nodes, with very similar structure but with different uses. They all use the same notions, that are :
ReferenceInformationType is pretty simple, it is either Direct or IndexToDirect. If it is Direct, then just parse the parameter and the ith element will map to the ith vertex meaning
Parameter[i]
maps toVertex[i]
. If you have IndexToDirect, you have to parse an additional Node, that will contains index for the mapping. So you’ll have <id>Index node, containing an array of the length of your vertex array, and you’ll do the mapping asVertex[i]
maps toParameter[parameterIndex[i]]
.MappingInformationType can have multiple values, according to Autodesk Documentation namely :
ByControlPoint, never used it
ByPolygonVertex, every vertex has a value for each Polygon it is included in.
ByPolygon, one value for each polygon
ByEdge, never used it
AllSame, the whole geometry use only one value.
Let’s see the three nodes type that contains information we need.
LayerElementNormal which is a root node, under which a Normals node resides which also contains r64, often zlib encoded, data to reconstruct normals. There are other nodes under LayerElementNormal that you need to check, namely MappingInformationType and ReferenceInformationType which we covered above.
LayerElementUV is very similar to LayerElementNormal but contains a UV node, containing your UV array. The data resides under a UV node, and indices under UVIndex if the referenceInformationType is indirect.
LayerElementMaterial contains information on which material to bind to each polygon. Each model can have multiple bounded material, which will be defined later, but this node tells you which material to use with which surface. You will have an array of indices into the material array of the model using this geometry.
Model
Model are another kind of Nodes in the file, where you’ll be able to get the transformation informations, translation, scaling and rotation.
The Model node has 3 parameters :
its id, which will be used for the connections
A name
A type parameter (either Mesh or LimbNode).
For now, let’s only parse Mesh types.
We want to find the Properties child node, under which 3 nodes are of interest to us : Lcl Rotation, Lcl Scaling and Lcl Translation, which have r64 parameters to reconstruct each Vector.
As you can see, you can interpret models as transformation matrix, but they will also be used to create a tree structure to calculate the correct transformation (based on the hierarchy), and each one will be linked to a corresponding Geometry.
Materials
Materials are pretty easy to parse too, you can have two types, Phong and Lambert. For now I only parse Phong, you need to find the Properties child, and then in its child you’ll find : DiffuseColor, AmbientColor, SpecularColor and SpecularFactor. There are other child (like Shininess, …) but my renderer only use Diffuse and Specular for the ligthing equations right now, so I just parse those.
Textures
Textures are another easy one. The data resides under a Texture node with two parameters: the first one with a s64 id, and a second one with a name. It has several child nodes but I only parse the FileName one.
Connections
Connections is a root node, on the same level as the Objects one. The information to extract here are all the links between each kind of Geometry/Textures/Material/Models. The annoying part, is that it is a fully generic format with a type either ‘OO’ (Object to Object) or ‘OP’ (Object to Parameter), and two identifiers, so you’ll have to track ids and types of object by yourself instead of a version with multiple specific connections.
Anyway, the connections are as follow:
Model to Geometry, which is a connection of the vertices/uvs/normals to the transformation matrix.
Textures to Material to assign a texture to the material. Order of appearance is important as a normal texture is the second texture bound to the material.
Model to Model, to create a tree like structure, to be able to calculate the correct transformation, as you need to apply the parents transformation before your transformation.
Material to Model, so that we can know which material to use with which geometry (assigned to the model). You can have multiple materials per model.
And with all that parsed, we now have the correct data to display it ( Vertices, Uvs, Normals, Transformations, Texture, Materials).
Here is a character from Kenney’s website loaded in my engine.
There are of course more informations in the file I did not parse, but it is pretty easy now that I have a good sense of the format, to come back and parse more stuff if I need to.
Final Thoughts
The FBX file format is really a generic file format, as too generic for my taste. Parsing is slowed by layers of Nodes, and lot of string comparisons. I get that this genericity may be why it was widely adopted, as tools could easily add Nodes for everything you may need in a more complex scene : cameras, lighting, and new techniques/data not known at the time of the format, but as u64 offsets, this could have been added in later versions instead. You could keep the generic stuff for optionals and have a very fast format for the main path (aka geometry, transforms, materials), with known offsets from the header, and no need for node parsing and string comparisons. Anyway, the format also have informations for animations, but we’ll keep this for another post, as this one is long enough already.