Introduction to Colibrio Concepts
Introduction
This is a high level overview of how we define the concept of a Publication and how Colibrio handles different formats in a generalized way.
The Colibrio Reader Engine is a "multi-format" reading system engine. At this point you can load both EPUB and PDF Publications and get back a common API to work with. This is possible thanks to Format Adapters that abstract away format specific details.
This does not mean however that you loose format specific data and metadata when working with the Colibrio API, you can always access the underlying source Publication through its high level representation. But, before we get to technical, let's lay some ground work.
The Colibrio definition of a Publication
A Publication in both the digital and in the physical world can be defined as a versioned unit of content, packaged and optimized for delivery. In contrast to a web site, a Publication such as a magazine or a book all have a distinct set of metadata attributes and body of content that are, at least in theory, guaranteed to not change for the duration of its lifespan. If a Publication needs to be revised, a new uniquely identifiable version is published.
Most digital Publication formats which you come across, whether it is a PDF, EPUB, audio book or Word document, will in some way contain the following:
Metadata, such as title, authors, revision, classification etc.
Manifest, a listing of all Resources needed to render the Publication
Resources, files such as images, video, fonts etc. used within the documents
Documents, the files that contain the actual body of content. Each document can represent a page, a chapter, an article etc.
Spine, that specifies the order of documents within the Publication
Navigation, a list of different landmarks within the Publication such as table of contents, page list, etc.
Depending on the format of the Publication the elements described above will be stored and encoded in varied ways.
A PDF stores most information about how to access its resources in its so called trailer. Metadata is stored in XMP compatible format, the resource manifest is stored in the cross reference table and spine and navigation is stored in its catalog section.
An EPUB Publication stores its metadata, manifest and spine in its package file. Navigation information, such as table of contents and landmarks, is stored in its navigation document.
The IPublication interface
In the Colibrio Reading System Engine the general definition of a Publication is represented by the IPublication interface. Any new format that is added to the Colibrio Reader Framework needs to implement this interface in order to be recognized as a valid Publication by the Reading System Engine. Each format can then add additional, format specific features on top of IPublication.
PdfPublication
A PDF publication is a ASCII based / binary format optimised for random access of its Resources. Many of the Resources within a PDF files such as images, fonts etc are stored as binary objects. All Resources are directly accessible by reading information from the PDFs so called "trailer". In the trailer there is a cross reference section where byte offsets point out the location of each Resource within the PDF.
The PdfPublication class is still very simple in comparison to its EPUB equivalent. It contains all the features necessary for rendering, searching and navigating the publication, but does not expose the PDF internals.
EpubPublication
The EPUB file is actually a zip archive that can contain one or more renditions of a Publication. A high-level overview of the EPUB file format is presented in the illustration below.

An illustration of the EPUB file format.
The EPUB Container
The EPUB Container uses the OCF (Open Container Format) to describe a number of important aspects of the EPUB and its contents such as:
MIME type (must be "application/epub+zip").
Entry points for all the "renditions" of the Publication.
Digital Rights Management information to help authorize the access of the publication.
Digital Signature information to help verify any signed Resources.
Encryption information to help decrypt any encrypted Resources.
With the help of the information contained in the OCF the Reading System knows how to authorize, validate and parse the Publication. Additionally, as the Publication is being rendered the information in the OCF provides the information needed for the Reading System to decrypt any encrypted Resources.
As any Resource requested by the Reading System must first be checked against the OCF the Colibrio's EpubPublication has a Resource Provider implementation specifically for this purpose. More on Resource Providers later in this article.
IContentDocument
Just as the IPublication interface describes the generalized model of a Publication, the IContentDocument interface describes the actual units of content that make up the body of the Publication.
An IContentDocument can be pretty much any format depending on type of Publication of which it is a part of. In an EPUB, a Content Document is either an HTML or an SVG document. In an audio book a Content Document would be represented by an audio file.
Resource Providers
To load a Publication such as EPUB into the Colibrio Reading System you must tell the Engine how to access the publication file contents. This is where Resource Providers come in.

A practical example
An EPUB Publication is contained within a ZIP archive. The ZIP has a so called "Central directory" where all the files in the archive are listed with their respective byte offsets. Inside the archive, as mentioned above, you find the EPUB Container. This Container in turn points out the location of the Publications Package with the Publications manifest where all its resources are listed. To grab resources from an EPUB you need to access all these layers of packaging.
Sounds like a lot of work? No worries, Colibrio has a Resource Provider for that: the OCF Resource Provider. Once you have your file, array buffer or blob The OCF Resource Provider in turn creates a ZIP Resource Provider behind the scenes and takes care of all the provisioning of Resources from within the Package. The only thing the EPUB Publication needs to do to fetch a resource is by using the resources path as defined in the Package Manifest.
Streaming Resources over HTTP
Similarly to the example above Colibrio can fetch resources from an EPUB or PDF that is located on the web.

IReaderPublication
Where the implementations of IPublications, in the Colibrio Core namespace, are designed to be as close to their respective specification as possible, IReaderPublications are very specifically aimed at optimising and augmenting to make things easy for the internal machinations of the Reading System Engine. It is in the various IReaderPublication implementations that format specific methods to handle layout, navigation, interaction, fragmentation etc. are implemented.
Format Adapters
The Format Adapter is what actually "transforms" the underlying format to fit the generic Colibrio definitions. The EpubFormatAdapter parses the EPUB's Nav file, the PdfFormatAdapter parses the PDF's Catalog, both return INavigationCollections with INavigationItems.
The image below shows the "pipeline" from source file to IReaderPublication. Each step adds domain specificity to the Publication model.
