The Least Word: 2014

Friday, May 2, 2014

My final metadata blog post (for classes, at least).

This semester has been a truly fascinating experience. I've learned a heck of a lot in all my classes, and this one is far from an exception. And so, I suppose in honor of the ignorance I entered this particular semster, I will post a few videos I found, one of which sums up the general public's perception of what this class is about.

What the public thinks metadata means:

And then part of what we learned it means:

I hope that in this respect, I can at least leave y'all with a laugh this semester. Now if you'll excuse me, I need to go brush on speaking Alabama.

More Archival repositories I found while researching my EAD presentation.

Calames is an online, multi-institutional repository of archives and manuscripts contained in French University and Research libraries. It defaults to a keyword search with the option to narrow by what library or libraries.

And then there's Archives Hub. It's a multi-institutional repository in the UK. I'd give more information on it, but frankly, the sheer breadth of it is amazing. I sincerely doubt you could log onto that and fail to find at least one participating collection you found fascinating.

In my case, these were the ones that caught my attention:

D H Lawrence Collection
Cor, blust, squit! Dialects and accents of the British Isles contains everything need to know to research English dialectic. (Just in case you want your fake British accent to be authentic).
There's the collection of finding aids entitled A Cabinet of Curiosities and Curiouser and Curiouser for people who prefer their archives weird.

Was looking for EAD instance examples and found a few that were interesting.

Apparently worldcat is demoing find-an-archives function entitled archivegrid. It...well, either archives are under-reported among their sources in Alabama, or they're being very specific about what qualifies.

See for yourself.

The making of a finding aid.

Having come across the Here and There blog elsewhere on this site, one of her posts caught my attention in relation to the investigation into EAD I've been doing lately.

In this case, it's her description of creating a finding aid for historical records about a N.J. landfill. In it, she goes through her process of encountering the problem and creating the aid. I thought it was interesting that she mentioned a different content standard which she used for finding aids. Apparently the standard DACS (Describing Archives: A Content Standard), uses bits of MARC and bits of EAD encoding in it. It was interesting to read about and, best of all: she provides a link to the finished aid.

EAD, the semantic web, and archives

I was looking about for more EAD resources and discovered a number of ones I found interesting. One of them was this discussion on Encoded Archival Description, the semantic web, and archives:

Note: the video above is licensed under a Creative Commons Attribution license by Debra Schiff for the Here and There blog.

That blog is located here.

EAD Basics 8: Archival Description

The <archdesc> tag brackets the bulk of an EAD instance. All of the contents of your finding aid are included in this. As is managerial and supplemental information. Info is organized in unfolding levels of hierarchy that give a descriptive overview and allow it to be followed with detailed information about the parts of the whole. That particular set of detailed information is included in the <dsc> tag, which stands for Description of Subordinate Components. For obvious reasons, there is not a provided example of this tag being used in its entirety outside of the fully encoded examples of all of EAD avaiable in Appendix C to the LoC site's Tag Library (from whence the bulk of the information and examples in this introductional series of blog posts came).

EAD Basics 7: Title Page

The tag <titlepage> groups together the finding aid's bibliographic information. Things such as addresses, author, block quotations, dates, edition, volume number, publisher, sponsor, subtitles, and the actual title of the finding aid.

It occurs inside the <frontmatter> tag.

An example:

<frontmatter>
        <titlepage>
            <titleproper>Inventory of The Arequipa Sanatorium Records,
            <date>1911&ndash;1958</date></titleproper>
            <num type="Collection number:">BANC MSS 92/894 c</num>
            <publisher>The Bancroft Library<lb/>University of California,
             Berkeley<lb/>Berkeley, California
            </publisher>
            <list type="deflist">
                <defitem>
                    <label>Processed by:</label>
                    <item>Lynn Downey</item>
                </defitem>
                <defitem>
                    <label>Completed by:</label>
                    <item>Mary Morganti and Katherine Bryant</item>
                </defitem>
                <defitem>
                    <label>Date Completed:</label>
                    <item><date>May 1994</date></item>
                </defitem>
                <defitem>
                    <label>Encoded by:</label>
                    <item>Gabriela A. Montoya</item>
                </defitem>
            </list>
            <p>&copy; 1996 The Regents of the University of California.
            All rights reserved.</p>
        </titlepage>
    </frontmatter>

EAD Basics 6: Text Division

Having gone over the <frontmatter> tag last time, this post is for the tag <div>

<div> designates subdivisions of the text in the <frontmatter> and uses <head> to show the purpose.

It, likewise has its own list of elements it can contain, ranging from more divisions like it, to addresses, lists, notes, or tables.

Examples of <div> :

<frontmatter>
            <titlepage>[...]</titlepage>
            <div>
            <head>Acknowledgements</head>
                <p>The University of California, Irvine Libraries wishes to
                acknowledge the generosity of the family of Edgar Holden for an endowment
                in support of the processing and maintenance of this collection and the
                University of California Office of the President for grant funding in
                support of the encoding of this and other finding aids using the Encoded
                Archival Description standard.</p>
            </div> . . .
        </frontmatter>

and

 <frontmatter>
            <titlepage>
                <titleproper>Inventory of the Rietta Hines Herbert Papers, 1940-1969</titleproper>
                <author>Processed by: Debra Carter</author>
                <publisher>Schomburg Center for Research in Black Culture<lb/>
                The New York Public Library</publisher>
                <date>August, 1977</date>
                &schtp;
                <p> &copy; <date>1999 </date> The New York Public
                Library, Astor, Lenox and Tilden Foundations.  All rights reserved.</p>
            </titlepage>
            <div>
                <head>Preface</head>
                <p>This inventory is one of several prepared as a part of the archival
                preservation program at the Schomburg Center for Research in Black Culture,
                a research division of The New York Public Library.</p>
                <p>The Schomburg archival preservation program involves the organization
                and preservation of primary source material held by the Center and of significance
                to the study of the Black Experience. It furthermore includes the preparation of
                detailed inventories of these records, making the information contained therein
                accessible as well as available to scholars.</p>
                <p>The necessary staff and supplies for this program were made available
                through a combination of Library, National Endowment for the Humanities grant,
                and State of New York grant funds.</p>
            </div>
        </frontmatter>

Does anyone know who the figures in this image might be? I'm tempted to say the one on the right might be Mark Twain, but I'm not sure.

A Break from EAD: Back to Indexing

Well, I have managed to encounter something I truly was not expecting with my DePol Engravings. Of the images I was tasked with describing, I have discovered that this image has absolutely no information in the digital archive notes. I know it's an engraving of a building. Somewhere. Some time. Beyond that...I really don't know what to put for many of the fields on this one.

EAD Basics 5:

Having covered <eadheader> in my last post, I'll continue with <frontmatter>, which bundles together preface text found before the actual Archival Description (which is in the <archdesc> tag). The tags it contains tend to matters providing information about the creation, use, or publication of the aid itself, rather than anything to do with the contents thereof.

It can contain two subsequent tags:

<div> - optional, contains text divisions

<titlepage> - Optional, contains what it sounds like it contains.

Here is an example of the <frontmatter> tag in use:

<frontmatter>
        <titlepage>
            <titleproper>Register of the Gibbons (Stuart C.) Papers,
                <date>1955-1964</date>
            </titleproper>
            <num>Collection number: Ms28</num>
            <publisher>San Joaquin County Historical Society and Museum
                <lb/>
                <extptr actuate="onload" show="embed entityref="sjmlogo">
                <lb/>
            Lodi, California</publisher>
            &tp-cstoh;
            <list type="deflist">
                <defitem>
                    <label>Processed by: </label>
                    <item>Don Walker</item>
                </defitem>
                <defitem>
                    <label>Date Completed: </label>
                    <item>1997</item>
                </defitem>
                <defitem>
                    <label>Encoded by: </label>
                    <item>Don Walker</item>
                </defitem>
            </list>
            <p>&copy; 2000 San Joaquin County Historical Society &amp; Museum. All rights reserved.</p>
        </titlepage>
    </frontmatter>

EAD Basics, Part 4:

Last post, I discussed the <ead> tag briefly. In that post I meantioned the <eadheader> tag was a required tag that was nested within the <ead> tag.

One thing I forgot to post then (which I will add to said post later, is the list of minimum required elements for an instance of EAD.

Moving to the <eadheader> tag, it's another wrapper element, another strictly structural tag. Because a lot of the identification and file description elements which it can contain are pretty vital to creating a searchable set of metadata, it itself is a required tag.

It can contain four subsequent elements (this being XML, the order they appear in is important):

<eadid> - mandatory

<filedesc> - mandatory

<profiledesc> - optional

<revisiondesc> - optional

In the vein of the minimum valid content example for the <ead> tag mentioned above, here is such an example for the <eadheader> tag:

<eadheader>
    <eadid>[...]</eadid>
    <filedesc>
        <titlestmt>
            <titleproper>[...]</titleproper>
        </titlestmt>
    </filedesc>
</eadheader>

EAD Basics, Part 3:

EAD is hierarchical because its area of concern is organized by hierarchical principles. The hierarchy of its subject matter is reflected quite well by its nested series of tags.

In practice, EAD has three distinctive key tags which create the overall structure of an iteration of EAD.

The first of these is the <ead> tag. Its primary function in it is to provide a structural frame in which the various other tags that provide detailed information about or structure to the informational contents of the finding aid being encoded.

There are three tags / elements which are contained within the nesting of the <ead> tag.

<eadheader> : A required tag

<frontmatter> : An optional tag

<archdesc> : Another required tag.

Here is an example of the minimal content required for a valid use of this tag:

<ead>
    <eadheader>
        <eadid>[...]</eadid>
        <filedesc>
            <titlestmt>
                <titleproper>[...]</titleproper>
            </titlestmt>
        </filedesc>
    </eadheader>
    <archdesc level="fonds">
        <did>[...]</did>
        <dsc type="combined">[...]</dsc>
    </archdesc>
</ead>

EAD Basics, Part 2: MaRC's archival equivalent

The Function of EAD

There are lots of them. 146 separate Elements

http://www.loc.gov/ead/ is the official website.

As an XML DTD, EAD is structured via a series of nested tags, creating hierarchical metadata.

It serves to encode archival finding aids

Think the Archival equivalent of Union Catalogs.

Intended to create interoperable, universal encoding principles to allow similar cooperation as MaRC-encoded union catalogs do for libraries.

EAD vs MaRC

· both are encoding standards to provide descriptive and structural metadata to collections of information or materials.

· Both enable interoperability and unionization of searchable databases.

· MaRC-encoded union catalogs tend to be organized based principally upon the item level of organization, as it is both the initial level with which the cataloger applying the standard interacts, and because it is the principle goal of most MaRC users.

· EAD finding aids tend to approach materials from the opposite end of the scale. Archives tend to interact with materials from a many-to-few perspective, and many archival users interact with the archives starting at the collection level and narrowing it down only after that.

EAD Basics, Part 1

What is EAD?

· XML-based DTD.

What does it do?

· Provides structural metadata for electronic archival finding aids

Why Do I need it?

· If you are tasked with using or creating a finding aid, it will most likely be encoded in or need to be encoded in EAD.

History?

Developed starting in 1993 at UC, Berkeley library

Intended to be non-proprietary standard for encoding machine-readable finding aids.

Wanted functionality MaRC did not provide.

Started it in Standardized Generic Markup Language.

During early development, the Library of Congress agreed to maintain the standard and its web presence.

4 release versions:

· Feb 1996 Alpha

· Dec. 1996 Beta

· August 1998: Version 1.0

· 2002: EAD DTD 2002.

A set of resources I completely forgot to mention during my presentation, the XML Cover Pages provide a great deal of information about XML and its use. I will have to look through the site's content about the standard to better familiarize myself with it.

XML Cover Pages are located here: http://xml.coverpages.org/

EAD Resources Part 3: Great Examples of EAD repositories

These are repositories whose content is encoded in EAD. I will be adding to this list over the Summer on an ongoing basis, so please check back in from time to time.

National Union Catalog of Manuscript Collections: http://www.loc.gov/coll/nucmc/

OAC: http://www.oac.cdlib.org/

AOMS: Archive numériques d’Objets et de Matériaux iconographiques Scientifiques

EAD Resources Part 2: The Tag Library

After gathering information on the setup and basic structure of EAD from the Best-Practices Guides, I proceeded to look over the information on the Library of Congress' EAD webspace with more understanding.

EAD homepage: http://www.loc.gov/ead/index.html

EAD Tag Library: http://www.loc.gov/ead/tglib/index.html

EAD Resources part 1: Two Great Best Practices Guides.

As I mentioned during my presentation on it, EAD is a very important schema to know if you're going to have direct contact with archival repositories in the future. Given that, I would like to take the time to make a number of posts laying out more information about EAD than I was able to during the 10-minute presentation. This series of posts will likely end up continuing well into the Summer, as I intend to use them as an opportunity to learn more in-depth information about EAD myself.

Given that preface, I'd like to start the series off with the point where I really felt like I first started understanding EAD at all, the two best-practices guides for EAD Version 2002 published by the Online Archive of California and the Research Libraries Group.

OAC guide: http://www.cdlib.org/services/access_publishing/dsc/contribute/docs/oacbpgead_v2-0.pdf

RLG guide: http://www.oclc.org/content/dam/research/activities/ead/bpg.pdf?urlm=161431

More Indexing

Well, I've updated the images I asked about in my past posts, and am now moving on to discover my ineptitude for describing portraiture and architecture with the DePol Engravings.

Thursday, May 1, 2014

Adventures in Indexing, Volume: Who Remembers? : Questioning My Aptness for Description of Sports

In the course of indexing football images, there are three particular elements I have difficulty with on a consistent basis. Those three are the Subject, Description, and Title elements. My problems with them all stem from the same root: I know relatively little about football. Unusual, I know, for someone who grew up in the state of Alabama (and who attends the University of Alabama), but it's true nonetheless. I don't really like football. Never could get into it. So where most people who see this image would find something they instantly recognize and can make sense of, the first possible description to pop into my head was: "Both teams pursue as referee holding invisible football attempts escape." Obviously, that's not the correct description. Unfortunately, because of the way the shot is staged, I'm not entirely sure what is. I can identify a number of the players in the image, but I have no idea precisely who the ball-carrier for the play is, beyond a supposition based on the facings of the various players.

Likewise, my first thought for describing this image was "Lucy van Pelt (14, Alabama) fools Charlie Brown (10, Alabama) yet again." With the Subject element of course including Hoaxes, Charles Schultz, and Performance Art.

Admittedly, that probably wasn't as bad as my first idea for a description for this: "Referee scratches his own crotch, both teams panic." The accompanying title being, of course: "Iron Bowl 75 Referee Scratches Crotch, Both Teams Panic."

It doesn't take a genuis to realize that none of these titles, descriptions, or subject identifiers are what the client would want, but beyond the case of the one my impulse suggested a Peanuts reference for, I have little to no clue as to what is actually going on in these images. Like the person who assign a video of a Sprint Cup race at Talladega in the following fashion: "Dozens of men drive in circles on same stretch of road for hours, most refusing to stop for directions," there's a sense in which the description is, technically apt. There is also, as in that case, a much greater sense in which the description is entirely wrong.

In that sense, I was hoping for some help or feedback. Anyone more sports-savvy than I (read: anyone at all) want to offer feedback or suggestions? They would be greatly appreciated. For now, beyond putting up the actual identifications of the relevant players, I'm inclined to leave these alone until I can get help deciphering them. Which, I suppose, makes it an excellent time to try indexing some DePol engravings.

Further Adventures in Indexing while Editing

While looking at another of my images, it occurred to me that another item was lacking in our guideline for Player Names: specifically, the ability to use identification acquired in related images to serve in an image in which the identification may be made.

The reason I mention it is that this image is clearly the companion-piece to the image I discussed in my last post, occuring moments apart (as confirmed by timestamp). The reason this is relevant to my last post is that in the earlier image in the sequence, a player obscured in the later image is visible, and the helmet the ball carrier (I think? He seems to be the focal point of the images.) had lost is so rotated in the air that his number may be ascertained. As such, I am adding a line to clarify that helmets do not have to currently be on the player's head to be used for identification, should a better source of identification be lacking. I'll also be adding to the guideline that where relational data may be ascertained, identification in any of the related images is sufficient for identification in the others.

More Issues in Indexing: It's Deja Vu All Over Again

For a third instance, the conflict between the Abstract and Description Elements in our repository have left me at something of an impasse. As with the image mentioned in my last post two posts, this image I am indexing leaves something to be desired in terms of identifiable players. Several are identifiable, but a number of the players at the core of the action are problematic. Given how many of my images have this problem thus far, I am inclined to decide that part of the Abstract Element guideline's wording leaves something to be desired. Fortunately for me, I co-administrate that guideline. As such, I can edit it to resolve the problem.

My first thought in this instance was to edit to the guideline, by adding to the notes section, the following:

Additionally, in some instances, because the guidelines for the Description, Title, and Subject elements require key players involved in the action of the image to be identified. In those instances, where a reasonable supposition of an obscured number may be made for a key player (i.e., the ball-carrier or the blocker tackling them), then that supposed number may be used to identify the player.
In such cases, the player's name should be entered in a manner indicating the suppositional nature of the identification. As such, in those cases, the name should be entered as follows:

Last, First (supp.)

That way, users may know the identification is not certain.

However, as we are dealing with metadata content, I realized that would only identify them as an entirely different entity than entering the name the standard way. As such I am removing the section beginning with "In such cases," and just indicating they should identify them as if it were certain.

More Issues in Indexing: Different Image, Same Problem

Those who've read my previous post will know that I had problems with filling in the description section of one of my images. Unfortunately, I ran into another instance of this same problem in another image I was responsible for indexing. In this case, I can successfully identify a number of the players in the image, but not the ball carrier. His helmet, which would normally present his number, has been knocked free. Similarly, one of the players tackling him is obscured by a teammate.

Issues in Indexing: When Guidelines Contradict

As with everyone in my LS 566 class, I have been working upon indexing a handful each of DePol Engravings and UA Football images. This was going fairly well, up until I read the Guidelines for the Description Element. In particular, the section that gave me pause was this one: "Additionally, the players involved should be included (include the name, number, and team in parentheses if visible), as well as the action (punt, pass, interception, etc.) and any relevant contextual information (down, statistics, importance of the game, etc.). IMPORTANT: This information will be available in the Abstract section, so make sure you index the Player Names Element before you index this one!"

That might seem odd, given that it's a sensible set of guidelines to describe an image of a football game. The reason it occurred to me that this could be an issue was that I knew our guidelines for the Abstract element had this line, "Partially covered or covered numbers will make a player unidentifiable and will be excluded from the data entered," in it. The combination of the two mean that in images where no Abstract may be assigned, then neither can a Description.

Upon checking back over the images I was tasked with indexing, I discovered further that I am responsible for just such an image. This image contains two players. The numbers on each player's jersey are obscured. The Auburn player's number is obscured by the fact that he is partially out of frame, while the ball-carrier's number is obscured on the front of his jersey by his hands and on his helmet by the glare of the stadium lights. Given that, by the guidelines-as-written, the only description I can give this 1) require knowledge of football that I lack (I don't know if this is a run, a pass, a kick return or what) and 2) are vague in the extreme. As in, "Alabama player carries ball while referee watches on," vague.

As such, I would be grateful for any help or clarifications that could be offered for this image.

Wednesday, April 9, 2014

A Tale of Two Explications: SCORM Explanations Past and Present

While reading up on what SCORM was, I noticed that Rustici software's one-minute overview of it had a note in its header signifying that their new SCORM Explained page provided more information. Clicking that link, I found myself taken to a splash page of sorts for their newer explanation. On it, it seemed that the writer took significant efforts to dumb down his previous one-minute overview, adding analogies, a colorful bar graph, and removing virtually all technical language. Whereas the prior overview explained and attempted to clarify far more than the new splash page does, it also provided less detailed technical and business-related information than the company's SCORM Explained page did overall.
Past the splash page, the new page provides information and context about it from a business and technical perspective, as well a glossary of SCORM-related terms. While it is certainly informative, it also lacks the elegance and ease of digestibility of the simple overview presented by the previous page. Within two paragraphs the overview explains that SCORM is a particular way of constructing Learning Management Systems and educational content in order to allow interoperability between SCORM conformant systems. The second says it is the industry standard for this, and then proceeds to present an analogy of DVD format compatibility to explain SCORM's funciton. It follows this up with a red and blue bar graph (simply titled "thegraph") with images of clocks and dollar signs pasted over each bar, indicating content integration's costs before and after SCORM. The graph lists no units, sets no scale whatsoever, and otherwise does nothing to indicate its particulars but its somewhat ham-fisted assertion that: BIG RED BAR BAD! MANY CLOCKS AND DOLLARS COST. SMALL BLUE BAR BETTER. LESS CLOCKS. LESS DOLLARS.

Works Cited

“One Minute Scorm Overview for Anyone.” SCORM.com. http://scorm.com/scorm-explained/one-minute-scorm-overview/

“Scorm Explained.” SCORM.com. http://scorm.com/scorm-explained/

Educational Objects via digitally-archived articles.

This may be an odd way to open a post, but while reading Norm Friesen's explanation of Educational Objects, I noted something that struck me as quite interesting. Specifically, the version of the document to which we were directed was a copy of the page which had been stored via the electronic archiving efforts of the internet archive's Wayback Machine. I found it fascinating, given the frequency with which scholarly articles are locked behind paywalls and the like, that a publicly available source such as the Wayback Machine would have a copy stored in their records. It makes me curious as to what the copyright entanglements that might crop up as a result of it. Likewise, I wonder if the service's operators had simply worked out an accord in their reproduction of the content.

Moving on from that curiosity, however, I should dig into the meat of the article itself. In it, Friesen examines the definition of an educational object. He begins by citing the Learning Technology Standards Committee's definition: "any entity, digital or non-digital, which can be used, re-used or referenced during technology supported learning." He goes on to point out that such definition is often replaced with a narrower one based upon the sort of programming which granted the concept its name. He proceeds to denote three characteristics of objects:

Discoverable: able to be discovered, accessed or searched due to the metadata which describes and categorizes it.
Modular: able to be adapted by outside parties without assistance of its originators, yet nonetheless able to stand on its own.
Interoperable: in a general sense, workable with a variety of hardware and software. in a specific sense, that of the ability for programs and their components to cooperate and share data.

Work Cited

Friesen, Norm. “What Are Educational Objects?” Interactive Learning Environments 9, no. 3 (Dec. 2001): 1. http://web.archive.org/web/20041015064204/http:/www.careo.org/documents/objects.html (accessed April 9, 2014).

Sunday, March 9, 2014

Flickr and Image pattern-recognition

I enjoyed the opportunity to look into how Flickr tracked patterns within their images here: http://www.asis.org/Bulletin/Oct-07/Beaudoin_OctNov07.pdf

Back in my undergraduate days, I'd often used the site when looking for inspirational images for my creative writing classes, taking advantage of its ability to pull often-unexpected combinations of ideas together and display just the right image for inspiration.

I thought the inclusion of emotion-based tags such as happy, depressed to indicate mood shown in a piece was an interesting choice. While I wouldn't have likely thought to include it myself, much of my use of the site involved finding pieces exhibiting certain moods I wished to evoke in the story I had been writing. Likewise, I hadn't expected to find categorical tags for humor, poesy, or events...even including assassination as an example.
I also found it interesting that one of the tag categories I found most useful in personal use was also the single least-often used category, that being the emotion tag.

Determining what art is "Of" and "About".

I admit, I hadn't truly considered the subjective nature of some of the more famous historical photographs, prior to reading this article: http://www.loc.gov/rr/print/tgm1/iib.html

In particular, I found the notion that assigning an "About" to images with a complexity of meaning to them is an extremely tenuous activity, rather than an aggregate one of assigning each meaning to it, a fascinating one. With most photography and art, barring that in which abstraction obscures it, the "Of" attribute is generally clear. But even in a photo with the clearest "Of" quality to it, without a statement of intent from the author, it can be difficult to make a determination what it's "About."

Wednesday, February 26, 2014

Digitization and Dystopian Fiction

As a preface to this post (and in some senses, this blog in general), let me say this: I was originally ill-disposed to digital media in a general sense. I think some of it had to do with a stubborn streak regarding media formats. I love books. Not just reading and what it entails, but the format itself. The smell of old books, the feel of an old, well-worn tome. Though digitized content gains much in convenience, as my brushes with it after graduation from the University of Mississippi with my B.A. in English taught me, I still feel there is something of the essence or the character of accessing a digitized work that is fundamentally altered from that of reading it in book format. Perhaps it is simply the unfamiliarity, but there's something that feels less human somehow about accessing digital works, to me. There is no arguing, however, that as the field of information out there continues to grow at increasing rates, that digitization is not needed.
Though necessity may be the mother of invention, she doesn't always provide the best quality of results. And in a sense, that is part of my concern regarding it. Google's mass-digitization efforts provide an extremely efficient method to process works into a digital format but, as Karen Coyle points out here: http://www.kcoyle.net/jal-32-6.html , it apparently offers little in the way of human oversight or correction to the OCR results. Similarly, as opposed to non-mass digitization techniques, the results lack detailed mark-up, beyond the searchable nature of the OCR results.

Put another way, the sheer scale and pace of the undertaking inhibits efforts to truly understand what is being processed. This problem reminds me of some of the background elements from the novel Prospero Burns by Dan Abnett. In it, one of the focal characters heads a archaeological effort in the wake of catastrophic warfare, in the attempts to salvage human knowledge nearly lost to catastrophic warfare which greatly set back the state of human knowledge. The character's work becomes sponsored by a large governmental agency, enabling a rapid processing of the materials, but he finds that he has begun to question the quality of the results. He continues to circle back to the question of "How do we know what we don't know?" even as his organization becomes ever-more prominent. Eventually, he finds his distaste for the unthinking aggregation being carried out has grown to the point that he feels compelled to depart the organization he created.

Mind, I do not think this is especially predictive. The Warhammer 40,000 setting is a dystopia, after all. Still, it's a far more interesting consideration of the process than I'd have expected to appear in a series typically only concerned with the exploits of over-the-top super-soldiers in an absurdly large-scaled science fiction setting.