The ARCHway Project: Architecture for Research in Computing for Humanities through Research, Teaching, and Learning

Information Technology Research, No. 0219924


Principal Investigator

Kevin
S
Kiernan
Department of English
University of Kentucky
1377 Patterson Office Tower
Lexington
KY
40506
(859) 257-6989
(859) 323-1072
kiernan@uky.edu
http://www.rch.uky.edu


Co-PI

Alexander
M
Dekhtyar
Department of Computer Science
University of Kentucky
773 Anderson Hall
Lexington
KY
40506
(859) 257-1839
(859) 323-1971
dekhtyar@cs.uky.edu
http://www.cs.uky.edu/~dekhtyar


Co-PI

Jerzy
W
Jaromczyk
Department of Computer Science
University of Kentucky
773 Anderson Hall
Lexington
KY
40506
(859) 257-1187
(859) 323-1971
jurek@cs.uky.edu
http://www.cs.uky.edu/~jurek


Collaborator

Ionut
E
Iacob
Department of Computer Science
University of Kentucky
352 W.T. Young Library
Collaboratory for Research in Computing for Humanities
Lexington
KY
40506
(859) 257-9549
(859) 323-1971
ionut@ms.uky.edu
http://www.rch.uky.edu


Collaborator

Dorothy
C
Porter
Department of English
University of Kentucky
351 W.T. Young Library
Collaboratory for Research in Computing for Humanities
Lexington
KY
40506
(859) 257-9549
(859) 323-1072
dporter@rch.uky.edu
http://www.rch.uky.edu

Keywords

Image-integrated XML
Document-centric XML
Concurrent XML hierarchies
Database support for XML
Architecture
Edition Production Technology (EPT)

Project Summary

Our goal is to identify and solve problems of mutual importance for Humanties scholars and Computer Science researchers in building image-based electronic editions of significant cultural and historic materials. The project will result in the development of an Edition Production Technology (EPT), both a methodology and an integrated software suite to allow us to construct and implement a digital library of previously unedited Old English manuscripts as well as to upgrade legacy electronic editions. Among the problems addressed in the project are database support for complex, document-centric, XML markup with concurrent hierarchies, including the integration of manuscript images and XML encoding of the manuscript content. The project relies on extensive collaboration between Computer Science reseachers and humanities scholars.

Publications and Products

Publications

Kevin S. Kiernan, "Digital Facsimiles in Editing: Some Guidelines for Editors of Image-based Scholarly Editions," forthcoming in Electronic Textual Editing, a volume of essays jointly sponsored by the Modern Language Association and the TEI Consortium, funded by the Mellon Foundation, and co-edited by John Unsworth, Katherine O'Brien O'Keeffe, and Lou Burnard.

Alex Dekhtyar and Ionut Emil Iacob, "A Framework For Management of Concurrent XML Markup," International Workshop on XML Schema and Data Management (XSDM'03), forthcoming in LNCS (Springer-Verlag).

Kevin S. Kiernan, "Hand-written Materials and the Science of Information Management," Online Proceedings of The Wave of the Future: National Science Foundation, Post Digital Library Futures Workshop, Wequassett Inn, Cape Cod, 15-17 June 2003.

Kevin S. Kiernan and Alex Dekhtyar, "EPT: Edition Production Technology for Multimedia Contents in Digital Libraries," Online Proceedings of the Workshop on Multimedia Contents in Digital Libraries, sponsored by DELOS (EU Network of Excellence for Digital Libraries) and the National Science Foundation. Chania, Crete, Greece, 2-3 June 2003.

Kevin S. Kiernan and Ching-chih Chen, eds., "Report of the DELOS-NSF Working Group on Digital Imagery for Significant Cultural and Historical Materials," National Science Foundation, Digital Libraries Initiative Phase Two, International Projects, 10 December 2002.

Kevin Kiernan, Charles Rhyne, and Ron Spronk. "Digital Imagery for Works of Art: Report of the Co-Chairs." Harvard University Art Museums, Cambridge, Massachusetts. 19-20 November 2001.

Presentations

Alex Dekhtyar, "Management of Document-centric XML markup in the ARCHway Project" invited talk, IBM Almaden Center, September 12, 2003 (forthcoming).

Kevin S. Kiernan, "The ARCHway Project and Damaged Old English Manuscripts in the Cotton Collection," invited lecture, The British Library, London, 29 July 2003.

Kevin S. Kiernan, "EPT: Edition Production Technology for Multimedia Contents in Digital Libraries," Workshop on Multimedia Contents in Digital Libraries, sponsored by DELOS (EU Network of Excellence for Digital Libraries) and the National Science Foundation. Chania, Crete, Greece, 2-3 June 2003.

Dorothy Carr Porter, "Introduction" to The ARCHway Project: Architecture for Research in Computing for Humanities through Research, Teaching, and Learning. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. The University of Georgia, 29 May-2 June 2003.

Kevin S. Kiernan and Kenneth Carr Hawley, "An Image-Based Electronic Edition of Alfred the Great's Old English Version of Boethius's Consolation of Philosophy," Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. The University of Georgia, 29 May-2 June 2003.

Jerzy Wl. Jaromczyk and Sandeep Bodapati, "An Architecture Promoting Collaborative Research, Teaching and Learning," Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. The University of Georgia, 29 May-2 June 2003.

Alexander Dekhtyar and Ionut Emil Iacob, "Management of Data for Building Electronic Editions of Historic Manuscripts," Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. The University of Georgia, 29 May-2 June 2003.

Kevin S. Kiernan, "Electronic Editions: Using Digital Facsimiles." Modern Language Association Convention, New York City, 28 December 2002.

Kevin S. Kiernan, Discussion leader, "Image-Based Electronic Editions: A Discussion." Media Cloisters, Vassar College Libraries, 15 November 2002.

Kenneth Hawley (English doctoral student, presenter) and J. Adam Turner (C.S. masters student), "Designing an Interface for Searching Image-Based Electronic Editions: An Interdisciplinary Student Research Project," special session on "Building an Electronic Edition of Alfred's Boethius," sponsored by the Richard Rawlinson Center for Anglo-Saxon Studies, Thirty-Seventh International Congress on Medieval Studies, 2-5 May 2002.

Project Impact

ARCHway will develop a system for building digital libraries of image-based scholarly editions for the humanities. We will use this system to produce new electronic editions of a number of previously unedited or inadequately edited Old English manuscripts and to upgrade legacy electronic editions. The architecture should enable both the creation of image-based electronic editions by editors in other fields and the use of those editions by research scholars, students, and the general public. The research results of this project will lay the groundwork for sophisticated technical tools to interpret, assemble, disseminate, and maintain image-based scholarly editions on a continuing basis by humanities scholars with limited or no access to programming resources.

Goals and Objectives

  1. Develop methodology for Edition Production Technology (EPT) and its enabling software
  2. Integrate images and text in XML markup for image-based electronic editions of cultural documents
  3. Develop database support (search and retrieval for editor and end-user) for image-text integration and EPT
  4. Develop interdisciplinary teaching and training of undergraduate and graduate student teams from Computer Science and humanities disciplines

Activities

  1. Design and develop the architectural framework for Edition Production Technology (EPT) with individual editorial tools, including a data-centric glossary tool, a document-centric XML tool that integrates image content, an overlay tool for simultaneous tagging of image content, and a paleographical tool
  2. Develop and prepare image-based XML markup for electronic editions of Alfred the Great's Old English translation of Boethius's Consolation of Philosophy, Ælfric's Lives of Saints, and Beowulf
  3. Design and develop database support for EPT with solutions for concurrent XML hierarchies, including those arising from detailed integration of images
  4. Design a framework that fosters interdisciplinary teaching and learning and promotes exchange of experience and domain-specific knowledge among scholars and students in Computer Science and humanities disciplines

Area Background

There has tended to be a sharp cleavage between the research agendas of computer scientists and humanities scholars in the area of digital libraries. Computer scientists have shown great interest in images and the automatic methods, such as CBIR (Content-Based Image Retrieval) and QBIC (Query By Image Content), for searching them by low-level semantic features (e.g., color, shape, texture). On the other hand, as is most evident in the massive Guidelines for the TEI (Text-Encoding Initiative), humanities scholars have mostly concentrated on text-encoding, usually ignoring the images of the handwritten documents on which the texts are based, or including images only as html links. By this approach, humanities scholars have devised an efficient method for comprehensive searches of high-level semantic information that cannot be discovered without structural markup. Many of them feel they can do without images, even though they cannot sufficiently encode important features of handwritten documents. There is an urgent need to bring together these complementary (or diametrically opposed) research agendas of computer science and the humanities. Over the past ten years, several image-based editorial projects at the University of Kentucky have attempted to combine these approaches, with varying degrees of success (or lack of success). Using web technology and Java programming, the image-based scholarly edition of the Electronic Beowulf integrated many hundreds of images with accompanying text and glossaries encoded in SGML. The Digital Atheneum: new techniques for restoring, accessing, and editing humanities collections, a DLI2 project, experimented with methods for automatically linking text and image. The Electronic Boethius, an NEH Collaborative Research project, has begun to take advantage of the ways that XML can help achieve these goals. We have begun developing tools that explicitly link text and images and render both searchable. Under the auspices of the ARCHway Project, described above, we are incorporating these tools and others under development into a common architecture supported by a database specifically designed to facilitate efficient searching of image-based, document-centric, electronic editions.

Management of Image-based, Document-Centric, XML

The key research challenge facing the ARCHway project from the Computer Science perspective is the problem of managing all the information that comprises an Electronic Edition in a seamless, flexible and efficient manner. While XML databases have been an object of extensive study in recent years, and various mechanisms for storing XML and querying it have been proposed, this work is most directly applicable to data-centric XML. When dealing with document-centric XML, the storage and querying facility must be able to support the increased complexity of the markup, in particular, dealing with concurrent XML hierarchies in the encoding. This problem has been in the center of attention of a number of researchers recently. TEI Guidelines contain a number of suggestions for incorporation of concurrent hierarchies into single DTDs. These suggestions include the use of milestone (empty) elements and tag fragmentation. Durusau and O'Donnell proposed a way of incorporating concurrent hierarchies in XML encodings using complex XPath expressions, and more recently, have developed a concept of just-in-time trees (JIIT) for parsing XML with concurrent hierarchies. Sperberg-McQueen and Huitfeld have described a data structure GODDAG, an extension of a DOM tree onto concurrent XML hierarchies. Most of the approaches that attempt to resolve the overlaps in XML encoding caused by concurrent hierarchies place the burden of decision-making and maintenance of the XML encoding and its DTD on the shoulders of the human editor. They also may result in unreadable and hard-to-maintain DTDs/XSchemas. The approach to managing XML encodings in ARCHway is to capitalize on existing achievements in XML storage and querying, while generalizing the solutions for the case of concurrent hierarchies.

Area References

S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, The Lorel Query Language for Semistructured Data, International Journal on Digital Libraries, 1(1):68-88, April 1997.

I. Beavan et al., "Text and Illustration: The Digitisation of a Medieval Manuscript," Computers and the Humanities 31:1 (1997), 61-71.

A. Berglund, S. Boag, D. Chamberlin, M. F. Fernandez, M. Kay, J. Robie, J. Simeon. XML Path Language (XPath) 2.0, W3C Working Draft, 02 May 2003

S. Boag, M. F. Fernandez, D. Florescu, J. Robie, J. Simeon (Eds.). XQuery 1.0: An XML Query Language, W3C Working Draft, 16 August 2002

A. Bonifati, S. Ceri. Comparative Analysis of Five XML Query Languages , SIGMOD Record, Vol 29, No. 1, pp. 63-67, March 2000.

A. B. Chaudhri, A. Rashid, R. Zicari. XML Data Management: Native XML and XML-Enabled Database Systems, Addison-Wesley, 2003.

A. Deutsch , M. Fernandez , D. Suciu. Storing Semi-structured Data Using STORED Proceedings, ACM SIGMOD, 1999.

P. Durusau, M.B. O'Donnell. Concurrent Markup for XML Documents , Proc. XML Europe 2002.

P. Durusau, M.B. O'Donnell. Just-In-Time-Trees (JITTs): Next Step in the Evolution of Markup? , Proc. Extreme Markup Languages, 2002.

D. Florescu, D. Kossmann, A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database, INRIA Technical Report #3680, 1999

C. Huitfeldt, C. M. Sperberg-McQueen. TexMECS: An Experimental Markup Language for Complex Documents, February 2001.

IBM's Query By Image Content. QBIC Home Page.

C.-Ch. Kanne, G. Moerkotte. Efficient Storage of XML data, Proceedings, ICDE 2000

M. Keeler, "The Place of Images in a World of Text," Computers and the Humanities 36:1 (2002), 75-93.

K. Kiernan. "Digital Facsimiles in Editing: Some Guidelines for Editors of Image-based Scholarly Editions," forthcoming in Electronic Textual Editing, eds. John Unsworth, Katherine O'Brien O'Keeffe, and Lou Burnard.

K. Kiernan, W. Seales, and J. Griffioen. "The Reappearances of St. Basil the Great in British Library MS Cotton Otho B. x," Computers and the Humanities 36:1 (2002), 7-26.

E. Lecolinet, Laurent Robert, and Francois Role, "Text-image Coupling for Editing Literary Sources," Computers and the Humanities 36:1 (2002), 49-73.

P. Robinson, "The One Text and the Many Texts," Literary and Linguistic Computing 15:1 (2000): pp. 5-14

P. Robinson, "Ma(r)king the Electronic Text: How, Why, and for Whom?" in Joe Bray et. al. Ma(r)king the Text: The Presentation of Meaning on the Literary Page. Ashgate: Aldershot, England, 309-28.

P. Robinson, "Redefining Critical Editions," in George P. Landow, ed. The Digital Word: Text-Based Computing in the Humanities. MIT Press: Cambridge, MA, 271-291.

W. Seales, J. Griffioen, K. Kiernan, et al. "The Digital Atheneum: New Technologies for Restoring and Preserving Old Documents" Feature article in Computers in Libraries 20:2, February 2000.

J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, J. F. Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities Proceedings, VLDB 1999, pp. 302-314

C. M. Sperberg-McQueen, L. Burnard (Eds.). Guidelines for Text Encoding and Interchange (P4), The TEI Consortium, 2001.

C. M. Sperberg-McQueen and C. Huitfeldt, GODDAG: A Data Structure for Overlapping Hierarchies, Proc. Principles of Digital Document Processing, Munich, September 2000.

F. Tian, D.J.DeWitt, J. Chen, C. Zhang. The Design and Performance Evaluation of Alternative XML Storage Strategies SIGMOD Record, Vol 31, No. 1, March 2002.

J. Wang. SIMPLIcity: Content Based Image Retrieval / Search 2003.

Potential Related Projects

UK projects

  1. The Digital Atheneum: new techniques for restoring, accessing, and editing humanities collections
  2. The Electronic Beowulf
  3. The Electronic Boethius: Alfred the Great's Old English Translation of Boethius's Consolation of Philosophy

External projects

  1. The Canterbury Tales Project (Peter Robinson, DeMontfort University, United Kingdom)
  2. The Rosetti Archive (Jerome McGann, University of Virginia)
  3. Cuneiform Digital Library Initiative (Robert Englund, UCLA)
  4. The Charrette Project (Karl Uitti, Princeton University)
  5. Princeton Dante Project (Robert Hollander, Princeton University)
  6. The William Blake Archive (Morris Eaves, University of Rochester)

Project Websites

The ARCHway Project

This website briefly describes the project for building an architecture for research in computing for humanities through collaborative research, teaching, and learning. It also contains a list of publications, presentations, and student initiatives, including graduate degrees, certificates, and joint projects.

Research in Computing for Humanities

This website collects all of the current projects of our research group, including Electronic Beowulf, Digital Atheneum, Electronic Boethius, and ARCHway.

Illustrations

Image-based Electronic Editions

Please see the relevant links to publications and presentations for Electronic Beowulf, Digital Atheneum, and Electronic Boethius websites. The complete Guide to the Electronic Beowulf provides an overview of images and texts.

Online Software

None of the software under development is available yet online. The Electronic Beowulf is available only on 2 CDs published by British Library Publications and University of Michigan Press.

Online Data

See comments under Online Software.

Other Resources

N.A.