Just what is an e-book anyway? Part two

Check out Part one of Just what is an e-book anyway?

Now that we have looked at how an e-book functions, let’s look at what’s under the hood of a typical e-book file.

There are many different types of e-book file formats. Here are some:

  • EPUB, EPUB3 – the open source de facto file format and it’s newer sibling.
  • MOBI, AZW, KF8, PRC – proprietary formats owned by Amazon, similar to EPUB.
  • IBA – proprietary format owned by Apple, similar to EPUB.
  • BBeB – proprietary format owned by Sony and Canon, similar to EPUB. BBeB files can have LRS and LRF or LRX file extensions.
  • PDF, TXT, RTF and others – static file formats that can be read on various e-book readers and computers.

At the highest level, an e-book file is an archive file format, much like a ZIP file. An archive file format is a kind of file that can take large and/or multiple files and compress them in size to save space. Like a ZIP file, e-book file formats like EPUB and MOBI compress multiple files together to create one file.

Let’s break one apart and look at what’s inside. In this example, I’m using a test EPUB file that I created specifically for this blog post. Here’s what we get:

File directory of an expanded EPUB file

The file directory of an expanded EPUB file, showing the component files.

What’s all this? Remember an EPUB file (as well as MOBI, AZW, PRC and others) are like mini web sites, and are written in XHTML and CSS.

XHTML (or Extensible HyperText Markup Language) is a variation of HTML (or HyperText Markup Language), the main programming language for creating webpages. Each section of the book is kind of like it’s own web page.

CSS (or Cascading Style Sheets) is the code that describes and controls the look and formatting of the web page and its elements (like how the fonts look).

You also notice some other important files for the EPUB format:

  • content.opf – this file contains all the e-book’s metadata, the file manifest and the linear reading order. In other words, the file contains all the descriptive data for the book, lists all the parts of the book and directs the order of the parts.
  • toc.ncx – this file describes and controls the hierarchical or navigational table of contents…the table of contents that runs the navigation on the device. For example, the list of chapters that appears in the left hand side in Adobe Digital Editions:
  • NCX view in ADE of EPUB test file

    The NCX view in Adobe Digital Editions…see the table of contents on the left?

  • container.xml – this file helps define the contents of the book.
  • mimetype – this file, which needs to be uncompressed and unencrypted for the e-book to work, basically tells other programs and devices that this is an EPUB file.

So, those are the parts of an EPUB file. MOBI and other proprietary formats have similar structures and use XHTML and CSS as well.

For those who are curious, I created the test EPUB file by creating a book file (and all supporting chapter files) in Indesign CS4 (Mac) and exporting the book to Adobe Digital Editions. To break apart the EPUB file, I used EPUB Unzip 1.0, a free application for Mac. For more info on how to break apart your own EPUB files, see this great article from Anne-Marie Concepcion at Indesign Secrets.

In the final installment, we’ll explore how all this affects your approach to designing an e-book.



  1. Just what is an e-book anyway? Part three. | Dog-ear Book Design - [...] This is part three of an in-depth look at e-book files. See Part one and Part two. [...]
  2. Formatting Tips: Building a Table of Contents in Microsoft Word | Dog-ear Book Design - […] In this series, we explore common techniques for formatting manuscripts in Microsoft Word for print and e-book conversion. I…

Submit a Comment

Your email address will not be published. Required fields are marked *