About Our Digitization Process

The Sexton Digital team is proud of the high quality digital products that we produce. We are committed to the concept of Open Access, and all of our digitized products are available for free online, from this website.

Below is a short Prezi created in 2014 to accompany a presentation about our process. The accompanying text is adapted from the original presentation.

About the Royal Architectural Institute of Canada and the Journal

The Royal Architectural Institute of Canada (RAIC) is a professional association of Canadian Architects, Faculty and Graduates of Canadian Architectural Schools. Founded in 1907, it represents the standard of architectural development and excellence in Canada. The journal, which underwent several title changes, was published from 1924-1973. It is the authoritative record of the history of architectural development and practice in Canada.

About the RAIC Digitization Project

The primary goal of our project is to create an open access, keyword searchable, web edition of the RAIC journal. Our project can be broken down into 4 main steps: Scan, Clean, Compile and Publish.

Scanning and Cleaning (Illustrated by Video)

Steps 1 and 2 are inextricably tied, as we digitize all available back issues of the journal, by scanning each page and cleaning it in Photoshop. First we straighten the page to make it easier for OCR to recognize text. Next we crop out the edges.

We then run a script that we’ve developed, which makes use of Photoshop’s ability to layer images on top of one another, like sheets of Mylar in an animator’s studio. Our script essentially lifts all text and images from the original image and places them on a new, cleaner layer which lies on top of the original. As you can see, the script degrades the images. To get around this, we first use the selection tools to highlight all images in the new layer. Then we delete the selection, creating a hole in the layer. This way, the original images can be revealed through the holes.

Next, we mush the two layers together so that they become a single image. Now, we turn everything that isn’t text or images a solid background colour. In this case, white. Everything that is not the background colour is highlighted, to reveal any spots or blemishes that remain. We use the eraser tool to get rid of these. We do this process twice to ensure that we have erased all blemishes.

Finally, we enhance the colour of the text. This improves the accuracy of OCR when applied. Now we have finished cleaning this page.

Creating PDFs and Applying OCR

Once we have cleaned an entire issue, we batch resize the images. This ensures that all images are the same width, and will fit together nicely as a PDF. Now the issue is ready to be compiled into a PDF using Adobe Acrobat Pro. We select the folder with the issue we want to compile and Adobe creates a brand new PDF. We look over the PDF to ensure that we haven’t missed any pages or obvious blemishes

The last step is to run the text recognition tool, so that the PDF is keyword searchable. Once we check the text recognition for errors, the PDF is ready to be published on the website.

Progress Report

There are 594 issues of the RAIC journal, with an average of 60 pages. The process of scanning and cleaning has been ongoing since February 2012. As of September 2017, 394 issues have been converted to PDF and are available on the website.

Challenges and Improvements

Sexton Digital is committed to finding the best digitization method for conserving the original quality of the scans. Over the course of the RAIC project, we have made significant changes to our methods that greatly improve the quality of our product.

Every page has its own personality, and requires minor adjustments to the standard method. We are constantly problem solving and experimenting with Photoshop’s features to create the best product possible, while remaining expedient in our work.

A recent challenge has been finding a way to work within server space limits. Our digitization process results in very large PDFs files. At a certain point, compressing these became an issue of some urgency. The challenge was to compress the PDF files while retaining the high quality that the project strives to achieve. Compression was done using Adobe Acrobat Pro. The original PDF files are still available, though not through the website. An added benefit of smaller PDFs is reduced loading time. This is especially important when viewing the files on a tablet.

Next Steps

While we continually update our website, we also work towards completion of the project. Sexton Digital is committed to Open Access and is invested in digitization for the long-run.