The recent release of 0.11 marks the inclusion of the long awaited serialization feature. Python and Java only for now.
I've been using it for a while for Python and although it has some rough edges, it works pretty well and I'm super excited for the project.
My dream for a parsing library / language is that it would be able to read, manipulate, and then re-serialize the data. I'm sure there are a ton of edge cases there, but the round trip would be so useful for fuzzing and program analysis.
Kaitai is absolutely one of my favorite projects. I use it for work (parsing scientific formats, prototyping and exploring those formats, etc) as well as for fun (reverse engineering games, formats for DOSbox core dumps, etc).
I gave a guest lecture in a friend's class last week where we used Kaitai to back out the file format used in "Where in Time is Carmen Sandiego" and it was a total blast. (For me. Not sure that the class agreed? Maybe.) The Web IDE made this super easy -- https://ide.kaitai.io/ .
(On my youtube page I've got recordings of streams where I work with Kaitai to do projects like these, but somehow I am not able to work up the courage to link them here.)
This seems to say flags is a sort of unsigned integer.
Is there a way to break the flags into big endiaN bits where the first two bits are either 01 or 10 but not 00 or 11 with 01 meaning DATA and 01 meaning POINTER with the next five bits as a counter of segments and the next bit is 1 if the default is BLACK and 1 if the default is WHITE ?
Kaitai is pretty nice. Hex editors with structure parsing support used to be more rare than they are now, so I've used https://ide.kaitai.io/ instead a few times.
Also, the newest Kaitai release added (long awaited) serialization support! I haven't had a chance to try it out.
Even if you don't want to use it since it is not as efficient as a hand-written specialized parser, Kaitai Struct gives a perfect way of documenting file formats. I love the idea and every bit of the project!
> Kaitai Struct is in a similar space, generating safe parsers for multiple target programming languages from one declarative specification. Again, Wuffs differs in that it is a complete (and performant) end to end implementation, not just for the structured parts of a file format. Repeating a point in the previous paragraph, the difficulty in decoding the GIF format isn't in the regularly-expressible part of the format, it's in the LZW compression. Kaitai's GIF parser returns the compressed LZW data as an opaque blob.
Taking PNG as an example, Kaitai will tell you the image's metadata (including width and height) and that the compressed pixels are in the such-and-such part of the file. But unlike Wuffs, Kaitai doesn't actually decode the compressed pixels.
---
Wuffs' generated C code also doesn't need any capabilities, including the ability to malloc or free. Its example/mzcat program (equivalent to /bin/bzcat or /bin/zcat, for decoding BZIP2 or GZIP) self-imposes a SECCOMP_MODE_STRICT sandbox, which is so restrictive (and secure!) that it prohibits any syscalls other than read, write, _exit and sigreturn.
They overlap, but none does strictly more than the other.
Kaitai is for describing, encoding and decoding file formats. Wuffs is for decoding images (which includes decoding certain file formats). Kaitai is multi-language, Wuffs compiles to C only. If you wrote a parser for PNGs, your Kaitai implementation could tell you what the resolution was, where the palette information was (if any), what the comments look like and on what byte the compressed pixel chunk started. Your Wuffs implementation would give you back the decoded pixels (OK, and the resolution).
Think of Kaitai as an IDL generator for file formats, perhaps. It lets you parse the file into some sort of language-native struct (say, a series of nested objects) but doesn't try to process it beyond the parse.
I wanted to use this a long time ago but the rust support wasn't there. I can see now that it's on the front page with apparently first class support so looks like I can give it a go again.
I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.
I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.
Does it support incremental parsing? For example, when I am parsing a network protocol, can it still consume some data from the head of the buffer even if the data is incomplete? This would not only avoid multiple attempts to restart parsing from the beginning but also prevent the buffer from growing excessively.
Kaitai Struct is really great. I've used it several times over the years to quickly pull in a parser that I'd otherwise have to hand-roll (and almost certainly get subtly wrong).
Their reference parsers for Mach-O and DER work quite nicely in abi3audit[1].
Wow this is good. My only complaint is annoyingly verbose yaml. What if I would like to use Kaitai instead of protobuffs, my .proto file is already a thousand lines, splitting each od these lines into 3-4 yaml indented lines is hurting readability
I discovered this project recently and used it for Himawari Standard Data format and it made it so much easier. Definitely recommend using this if you need to create binary readers for uncommon formats.
It's not C but we have sponsored a Zig target for Kaitai. If anyone reading this knows Zig well, please comment because would love to get a code review of the generated code!
The recent release of 0.11 marks the inclusion of the long awaited serialization feature. Python and Java only for now. I've been using it for a while for Python and although it has some rough edges, it works pretty well and I'm super excited for the project.
My dream for a parsing library / language is that it would be able to read, manipulate, and then re-serialize the data. I'm sure there are a ton of edge cases there, but the round trip would be so useful for fuzzing and program analysis.
Kaitai is absolutely one of my favorite projects. I use it for work (parsing scientific formats, prototyping and exploring those formats, etc) as well as for fun (reverse engineering games, formats for DOSbox core dumps, etc).
I gave a guest lecture in a friend's class last week where we used Kaitai to back out the file format used in "Where in Time is Carmen Sandiego" and it was a total blast. (For me. Not sure that the class agreed? Maybe.) The Web IDE made this super easy -- https://ide.kaitai.io/ .
(On my youtube page I've got recordings of streams where I work with Kaitai to do projects like these, but somehow I am not able to work up the courage to link them here.)
I'm curious, how do you use it for Game RE?
Kaitai is one of many different tools that do this, there is a list of them here:
https://github.com/dloss/binary-parsing
Personally I like GNU Poke.
To quote from the page: id: flags type: u1
This seems to say flags is a sort of unsigned integer.
Is there a way to break the flags into big endiaN bits where the first two bits are either 01 or 10 but not 00 or 11 with 01 meaning DATA and 01 meaning POINTER with the next five bits as a counter of segments and the next bit is 1 if the default is BLACK and 1 if the default is WHITE ?
Kaitai is pretty nice. Hex editors with structure parsing support used to be more rare than they are now, so I've used https://ide.kaitai.io/ instead a few times.
Also, the newest Kaitai release added (long awaited) serialization support! I haven't had a chance to try it out.
https://kaitai.io/news/2025/09/07/kaitai-struct-v0.11-releas...
Even if you don't want to use it since it is not as efficient as a hand-written specialized parser, Kaitai Struct gives a perfect way of documenting file formats. I love the idea and every bit of the project!
I like using it for parsing structs but then intersperse procedural code in it for loops/containers, so not everything gets read into RAM all at once.
Is the main difference from https://github.com/google/wuffs being that Kaitai is declarative?
See https://github.com/google/wuffs/blob/main/doc/related-work.m...
> Kaitai Struct is in a similar space, generating safe parsers for multiple target programming languages from one declarative specification. Again, Wuffs differs in that it is a complete (and performant) end to end implementation, not just for the structured parts of a file format. Repeating a point in the previous paragraph, the difficulty in decoding the GIF format isn't in the regularly-expressible part of the format, it's in the LZW compression. Kaitai's GIF parser returns the compressed LZW data as an opaque blob.
Taking PNG as an example, Kaitai will tell you the image's metadata (including width and height) and that the compressed pixels are in the such-and-such part of the file. But unlike Wuffs, Kaitai doesn't actually decode the compressed pixels.
---
Wuffs' generated C code also doesn't need any capabilities, including the ability to malloc or free. Its example/mzcat program (equivalent to /bin/bzcat or /bin/zcat, for decoding BZIP2 or GZIP) self-imposes a SECCOMP_MODE_STRICT sandbox, which is so restrictive (and secure!) that it prohibits any syscalls other than read, write, _exit and sigreturn.
(I am the Wuffs author.)
Wuffs looks pretty awesome. Thanks for making it.
Wuffs is intended for files. But, would it be a bad idea to use it to parse network data from untrusted endpoints?
They overlap, but none does strictly more than the other.
Kaitai is for describing, encoding and decoding file formats. Wuffs is for decoding images (which includes decoding certain file formats). Kaitai is multi-language, Wuffs compiles to C only. If you wrote a parser for PNGs, your Kaitai implementation could tell you what the resolution was, where the palette information was (if any), what the comments look like and on what byte the compressed pixel chunk started. Your Wuffs implementation would give you back the decoded pixels (OK, and the resolution).
Think of Kaitai as an IDL generator for file formats, perhaps. It lets you parse the file into some sort of language-native struct (say, a series of nested objects) but doesn't try to process it beyond the parse.
Looking at that repo.. i have no clue how to get started.
The top-level README has a link called "Getting Started".
I wanted to use this a long time ago but the rust support wasn't there. I can see now that it's on the front page with apparently first class support so looks like I can give it a go again.
I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.
I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.
It hasn't improved that much, you need to know the final size and fill all attributes, there are no defaults, at least in Python.
Does it support incremental parsing? For example, when I am parsing a network protocol, can it still consume some data from the head of the buffer even if the data is incomplete? This would not only avoid multiple attempts to restart parsing from the beginning but also prevent the buffer from growing excessively.
I also like Protodata [1]. It's complementary as an exploration and transformation tool when working with binary data formats.
[1]: https://github.com/evincarofautumn/protodata
Kaitai Struct is really great. I've used it several times over the years to quickly pull in a parser that I'd otherwise have to hand-roll (and almost certainly get subtly wrong).
Their reference parsers for Mach-O and DER work quite nicely in abi3audit[1].
[1]: https://github.com/pypa/abi3audit/tree/main/abi3audit/_vendo...
Great timing! I just published https://github.com/fzakaria/nix-nar-kaitai-spec and contributed kaitai C++ STL runtime to nixpkgs https://github.com/NixOS/nixpkgs/pull/454243
Wow this is good. My only complaint is annoyingly verbose yaml. What if I would like to use Kaitai instead of protobuffs, my .proto file is already a thousand lines, splitting each od these lines into 3-4 yaml indented lines is hurting readability
What was the Python based binary parsing library from around 2010? Hachoir?
https://hachoir.readthedocs.io/en/latest/index.html
Hachoir was rad, just not very fast.
Construct?
I discovered this project recently and used it for Himawari Standard Data format and it made it so much easier. Definitely recommend using this if you need to create binary readers for uncommon formats.
https://en.wikipedia.org/wiki/Data_Format_Description_Langua...
DFDL is heavily encroaching on Kaitai structs territory.
No pure C backend?
It's not C but we have sponsored a Zig target for Kaitai. If anyone reading this knows Zig well, please comment because would love to get a code review of the generated code!
This would be great for most projects as Swift for example is abandoned & 6+ years since last commit.
[dead]