'What, No Automation?' Some Principles of the DigiPal Project

Perhaps the most frequent misunderstanding of the DigiPal project is the role of technology in what we are doing, specifically the automatic identification of scribal hands, and/or automatic identification of letters and features. Despite what many seem to think, we are not using the computer for either of these at this point, and we will not necessarily use either at all. Instead we are drawing hundreds of thousands of boxes on images of manuscript pages by hand, and making connections between scribal hands ourselves based on our own 'old-fashioned' observations and memory. Are we crazy? Quite possibly yes, but this post here is my explanation why.

Automatic Scribal Identification

In the past I was very interested in automatic identification of scribal hands, as demonstrated by my article on 'Palaeography and Image-Processing' in Digital Medievalist 3 (2007/8). So, isn't that part of what DigiPal is about, then: using computers to identify scribal hands in objective ways? No, not at all. On the contrary, I have since had significant doubts about the merits of these automated approaches, as some of you will have heard me say in conference presentations. These doubts are discussed in another article of mine, 'Computer-Aided Palaeography' (at p. 322 ff.), but they had already been expressed by several others, most clearly by Tom Davis in his article in The Library 7 (2007), an article which I have found highly influential:

These [automated] methods are unlikely to replace, though they may supplement, the work of the document analyst, because, however powerful computers will (surely) become, it will probably not be possible to cross-examine them. (p. 266 n. 7)

Similar comments about the 'black box' of automated systems have been made by Lambert Schomaker in §7 of his 'Advances in Writer Identification and Verification', and also by Sculley and Pasanek in Literary and Linguistic Computing 23 (2008, at p. 421).

The problem, to put it crudely, is this: The computer can tell me that two samples of handwriting are very similar, but what does this similarity mean? In what way (if at all) does it reflect any sort of 'ground truth'? Is it that the samples were written by the same scribe? That they were written at the same scriptorium? That they were photographed by the same camera? That they were photographed at the same resolution, or using the same colour profile? That they have much the same pattern of dirt on them? Who knows? How can I tell? Of course I am being a little flippant here, in that some of these things we can correct for, but the underlying question remains the same. I am not yet convinced that we really can be sure that we have controlled for all the assumptions that underlie our models in these systems (e.g. that the cameras for all the images are the same), and Sculley and Pasanek seem to demonstrate quite clearly that the results of these systems are extremely sensitive to small changes in these assumptions, and my discussions with experts in image processing seem to confirm this. After all, if a computer reports that two samples are by the same scribe and we think they are not, then what do we do? Do we believe the computer? Or do we change the software so that the computer matches our preexisting beliefs? As it is, these approaches have not yet succeeded in closing down any debate about authorship attribution, for example, as you can see very easily if you search for the number of articles 'proving' that a given text was or was not written by Shakespeare.

So, is this approach useless? No, not at all. I think it makes sense to use if one of two conditions are met:

  1. The corpus is overwhelmingly large, and so any clue of where to start – even a false clue – is better than nothing.
  2. The nature of the problem is such that a proposed match can be verified with a high degree of certainty from other means.

The danger with the first condition alone is that the computer's suggestion may be misleading, as it may encourage us to see connections which are not there and which we would not otherwise have seen. For the second condition, I am thinking of examples such as joins of fragments: it is often (or at least sometimes) easy to verify if two fragments were once joined together, since we can consider a range of factors such as the size, shape, colour, number of lines, style of writing, text, and so on, rather than just the script alone. However, neither of these conditions apply to the core DigiPal corpus (or, to that matter, to Shakespeare). We do not have so many scribal hands that we cannot work through them ourselves, particularly with the help of the DigiPal framework. Furthermore, as discussed above, the identification of scribal hands is by no means easy to verify except in the most obvious cases, and it is fairly safe to think that the obvious cases in the DigiPal corpus are ones that we will find ourselves anyway. For these reasons, I personally think that such an approach is not applicable to our work, neither now nor in the immediately foreseeable future (although some very promising discussions took place at the Dagstuhl Perspectives Workshop on Computation and Palaeography last September).

(Incidentally, the set of problems for which this approach makes sense to me is reminiscent of the famous set of NP-complete problems in Computer Science, a set which, among other things, is characterised by having solutions which are hard to find but, once found, quick to verify. I am not qualified to tell if my proposal really is for NP-completeness, but if so then we are in extremely good company: this set of problems has been studied without (efficient) solution since the 1970s and remains perhaps the most sought-after goal in Computer Science, so much so that the Clay Mathematics Institute has listed as one of the seven Millenium Prize Problems.)

Automatic Image-Annotation and Letter Detection


What about automatic annotation, then? Can't a computer at least identify letters (i.e. graphs) for us and draw the boxes accordingly? Well, it turns out that this problem is much more difficult than most people allow for. Medieval manuscripts are messy: they are dirty, fragmentary, written in widely varying scripts, with additions, lacking straight lines, and so on. There is a lot of very good software out there which goes some way to approaching this problem, such as T-PEN, SPI, and many others. They do very well, and they get very close to solving the problem, but they are not yet good enough to work on their own, and again experts in the field tell me that they are not likely to any time soon (see, again, discussions at the Dagstuhl Perspectives Workshop). The problem here is managerial: many people tell me that this problem is 'solved', but not one can show me a workable solution. People say 'give me time and money and I will do it for you', but what happens if my time and money has run out and I still don't have a workable solution? Then DigiPal will have no content, and I think I will have squandered my grant. I know that my team and I can draw boxes and produce a useful result. Furthermore, there is real value in looking at this material for days, months, years on end, in getting to know it much better than one could by automatic analysis. Besides, we do not want to select all letters on every page – that would produce an unusable quantity of material – and so we, the experts, need to select which letters to include. That is not something I would want to leave to a computer; not yet, anyway.

Again, this system does have promise, and already Brian Maher has done some work as an MSc project to develop a semi-automated system which suggests boxes that we can accept, reject, or modify. This is very helpful and I am very glad to have it, but again it is an aid to the human operator the results of which are designed to be adjusted by hand; it is crucially not an attempt to provide an automatic 'answer'. Perhaps I will be proven wrong and you will successfully implement a system that can annotate the letters for me perfectly. If so I will be very happy indeed. But I hope you forgive us if we do not wait for you to do it, and if we continue to annotate by hand in the meantime.

Comments

Posts by Date

Posts by Author

Feeds

RSS / Atom