See examples (hachoir-urwid screenshots).
A perfect parser has no "raw" field: with a perfect parser you are able to know the meaning of *every* bit. Some good (but not perfect ;-)) parsers:
- Matroska video
- Microsoft RIFF (AVI video, WAV audio, CDA file)
- PNG picture
- TAR and ZIP archive
hachoir-parser is used by other programs: hachoir-metadata, hachoir-subfile, hachoir-grep, etc.
Download
Code example
See code examples.
List of parsers
Total: 70 parsers
Archive
- 7zip: Compressed archive in 7z format
- ace: ACE archive
- bzip2: bzip2 archive
- cab: Microsoft Cabinet archive
- gzip: gzip archive
- mar: Microsoft Archive
- rar: Roshal archive (RAR)
- rpm: RPM package
- tar: TAR archive
- unix_archive: Unix archive
- zip: ZIP archive
Audio
- aiff: Audio Interchange File Format (AIFF)
- fasttracker2: FastTracker2 module
- itunesdb: iPod iTunesDB file
- midi: MIDI audio
- mod: Uncompressed amiga module
- mpeg_audio: MPEG audio version 1, 2, 2.5
- ptm: PolyTracker module (v1.17)
- real_audio: Real audio (.ra)
- s3m: ScreamTracker3 module
- sun_next_snd: Sun/NeXT audio
Container
- asn1: Abstract Syntax Notation One (ASN.1)
- matroska: Matroska multimedia container
- ogg: Ogg multimedia container
- ogg_stream: Ogg logical stream
- real_media: RealMedia (rm) Container File
- riff: Microsoft RIFF container
- swf: Macromedia Flash data
File System
- ext2: EXT2/EXT3 file system
- fat12: FAT12 filesystem
- fat16: FAT16 filesystem
- fat32: FAT32 filesystem
- iso9660: ISO 9660 file system
- linux_swap: Linux swap file
- msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR)
- ntfs: NTFS file system
- reiserfs: ReiserFS file system
Game
- lucasarts_font: LucasArts Font
- spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video
- zsnes: ZSNES Save State File (only version 143)
Image
- bmp: Microsoft bitmap (BMP) picture
- gif: GIF picture
- ico: Microsoft Windows icon or cursor
- jpeg: JPEG picture
- pcx: PC Paintbrush (PCX) picture
- png: Portable Network Graphics (PNG) picture
- psd: Photoshop (PSD) picture
- targa: Truevision Targa Graphic (TGA)
- tiff: TIFF picture
- wmf: Microsoft Windows Metafile (WMF)
- xcf: Gimp (XCF) picture
Misc
- 3do: renderdroid 3d model.
- 3ds: 3D Studio Max model
- chm: Microsoft's HTML Help (.chm)
- lnk: Windows Shortcut (.lnk)
- ole2: Microsoft Office document
- pcf: X11 Portable Compiled Font (pcf)
- pdf: Portable Document Format (PDF) document
- tcpdump: Tcpdump file (network)
- torrent: Torrent metainfo file
- ttf: TrueType font
Program
- elf: ELF Unix/BSD program/library
- exe: Microsoft Windows Portable Executable
- java_class: Compiled Java class
- python: Compiled Python script (.pyc/.pyo files)
Video
- asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
- flv: Macromedia Flash video
- mov: Apple QuickTime movie
- mpeg_ts: MPEG-2 Transport Stream
- mpeg_video: MPEG video, version 1 or 2
Supported file extensions
File extensions: 3do, 3ds, 7z, a, ace, aif, aifc, aiff, ani, apm, asf, au, avi, bin, bmp, bz2, cab, cda, chm, class, cur, deb, der, dll, doc, dot, emf, exe, flv, gif, gz, ico, jar, jpeg, jpg, laf, lnk, m4a, m4b, m4p, m4v, mar, mid, midi, mka, mkv, mod, mov, mp1, mp2, mp3, mp4, mpa, mpe, mpeg, mpg, msi, nst, oct, ocx, odb, odc, odf, odg, odi, odm, odp, ods, odt, ogg, ogm, otg, otp, ots, ott, pcf, pcx, pdf, png, pot, pps, ppt, ppz, psd, ptm, pyc, pyo, qt, ra, rar, rm, rpm, s3m, sd0, snd, so, stc, std, sti, stw, swf, sxc, sxd, sxg, sxi, sxm, sxw, tar, tga, tif, tiff, torrent, ts, ttf, vob, wav, wma, wmf, wmv, wow, xcf, xla, xls, xm, zip, zs1, zs2, zs3, zs4, zs5, zs6, zs7, zs8, zs9, zst.
Total: 135 file extensions.
Supported MIME types
MIME types: application/java-archive, application/java-vm, application/msexcel, application/mspowerpoint, application/msword, application/ogg, application/pdf, application/vnd.ms-cab-compressed, application/vnd.oasis.opendocument.chart, application/vnd.oasis.opendocument.database, application/vnd.oasis.opendocument.formula, application/vnd.oasis.opendocument.graphics, application/vnd.oasis.opendocument.graphics-template, application/vnd.oasis.opendocument.image, application/vnd.oasis.opendocument.presentation, application/vnd.oasis.opendocument.presentation-template, application/vnd.oasis.opendocument.spreadsheet, application/vnd.oasis.opendocument.spreadsheet-template, application/vnd.oasis.opendocument.text, application/vnd.oasis.opendocument.text-master, application/vnd.oasis.opendocument.text-template, application/vnd.rn-realmedia, application/vnd.sun.xml.calc, application/vnd.sun.xml.calc.template, application/vnd.sun.xml.draw, application/vnd.sun.xml.draw.template, application/vnd.sun.xml.impress, application/vnd.sun.xml.impress.template, application/vnd.sun.xml.math, application/vnd.sun.xml.writer, application/vnd.sun.xml.writer.global, application/vnd.sun.xml.writer.template, application/wmf, application/x-7z-compressed, application/x-ace-compressed, application/x-archive, application/x-bittorrent, application/x-bzip2, application/x-coredump, application/x-debian-package, application/x-dosexec, application/x-dpkg, application/x-executable, application/x-executable-file, application/x-gimp-image, application/x-gtar, application/x-gzip, application/x-jar, application/x-ms-shortcut, application/x-msmetafile, application/x-object, application/x-ogg, application/x-rar-compressed, application/x-rpm, application/x-sharedlib, application/x-shockwave-flash, application/x-tar, application/x-wmf, application/x-zip, application/zip, audio/basic, audio/mime, audio/mod, audio/module-xm, audio/mpeg, audio/ogg, audio/s3m, audio/x-aiff, audio/x-cda, audio/x-matroska, audio/x-mod, audio/x-ms-wma, audio/x-ogg, audio/x-pn-realaudio, audio/x-pn-realaudio-plugin, audio/x-real-audio, audio/x-realaudio, audio/x-s3m, audio/x-wav, audio/x-xm, audio/xm, image/gif, image/jpeg, image/photoshop, image/png, image/psd, image/targa, image/tga, image/tiff, image/wmf, image/x-3do, image/x-3ds, image/x-bmp, image/x-emf, image/x-ico, image/x-ms-bmp, image/x-pcx, image/x-photoshop, image/x-png, image/x-tga, image/x-win-metafile, image/x-wmf, image/x-xcf, video/mp2p, video/mpeg, video/ogg, video/quicktime, video/theora, video/x-flv, video/x-matroska, video/x-ms-asf, video/x-ms-wmv, video/x-msvideo, video/x-ogg, video/x-pn-realvideo, video/x-theora.
Total: 116 MIME types.
TODO
- #37
- Parser: GIF, able to parse image content or at least compute image length
- #66
- Rewrite ar archive parser
- #74
- Improve MPEG video parser
- #75
- Write 'demuxer' using EncodedFile, Link, Fragment for video containers (RIFF, Ogg, MPEG, ...)
- #93
- Rewrite/improve MPEG video parser using Kaa sourcecode
- #100
- Use magic number to choose parser when use of filename fails
- #116
- Possible DoS with bz2 files.
- #120
- Allow to load user parser
- #138
- OLE2 Parser Changes + Updates
- #145
- CHM parser updates
- #152
- Parse other MAR format
- #163
- MPEG audio: write LYRICS parser
- #164
- MPEG audio: write APE2 parser
- #166
- Mach-O executable file format
- #167
- ELF parser improvements
- #171
- Assorted hachoir_parser additions
- #175
- Rewrite MPEG-4/MOV parser from scratch
See also
- TodoParsers
- Reverse engineering
- File format resources
- Microsoft Office file formats (for Word, Excel, Powerpoint applications)