How Linearization Works - Fast On-Demand Access of Document Pages
Linearization, introduced with PDF 1.2, has a 20+ page appendix dedicated to it in the core PDF reference.
But if you prefer a faster and quicker explanation…
Linearization works by modifying the PDF file’s internal structure that enables fast on-demand streaming of partial content.
By linearizing, each PDF document transforms into an object tree, starting with a root node, and ascending from there. Pages can reference other objects located on that tree by object number. In the case of non-linearized PDFs, these objects, such as embedded fonts, are often scattered across the file. Therefore, with non-linearized PDFs, there are no quick methods to identify and grab a given page’s resources, most document viewers will need to download the entire document before it can render.
In contrast, linearized PDFs are reorganized so that page resources are grouped together logically according to document page order. A Linearization Dictionary and “Hint tables” are also added to the top of the document. These act as an inventory specifying the location of objects needed to render any given page, essentially enabling random online access to pages.
A viewer designed to handle linearized content, like eViewer, can then request linearized PDF content from the web server via a URL. This information is then served as sequential content “packets” of PDF binary.
When eViewer detects linearization, it will stop the rendering process after receiving the hint tables and first few pages. Remaining content packets are then prioritized based on how the user navigates. For example: if the user skips ahead to page 750 in a 2000-page document, the viewer can request resources for page 750 and surrounding pages, and these will render first. The remainder of the document will then progressively download and render as the user session continues. And unneeded pages can be easily cleared from the device’s memory when required.