The Story of the Thesaurus

The Historical Thesaurus of English has a long and varied history, even if the story of its over 50 years of development is not quite as long as that of its principal ‘parent’, the Oxford English Dictionary.

In 1964, Michael Samuels, then Professor of English Language at the University of Glasgow, announced the fact that his department intended to undertake production of a historical thesaurus of English at a lecture given to the Philological Society.[1] The original intention was that the work would be carried out by members of staff and postgraduate students, with individuals making contributions to both data collection and the development of the theoretical framework. Data collection came first, and each member of the team started to transcribe information from a volume of the first edition of the OED, using paper slips to record a word sense, its part of speech, its dates of recorded use, and any information contained in OED labels, such as ‘figurative’ or ‘Philosophy’, which might guide classification in future. Although the intention was never simply to produce a historical version of Roget’s Thesaurus, its categories were used as a preliminary filing system; as the principle of classification primarily by semantic field developed, Roget categories were virtually abandoned.

The members of the department who embarked on the original work included Samuels, Leslie Blakely, Leslie W. Collier, John Farish, James Muir, and Jane Roberts; Roberts primarily undertook to supplement the OED’s Old English materials. The first full-time postgraduate student was Irené Wotherspoon, who completed a thesis comparing a concrete area of lexis, the Body and its Parts, with the abstract field of Mental Pain.[2] The scale of the project was by then becoming apparent, and in 1969 a successful application for funding was made to the Leverhulme Trust, leading to the employment of Wotherspoon and Christian Kay as Research Assistants, mainly involved in collecting data. At the same time, substantial contributions to the archive of slips were made by volunteers, notably Frida Swanson, who made considerable inroads into the letter S, Frissy Peden, and Angus Somerville, who worked both at his home base at Brock University, Canada, and during research leaves spent in Glasgow. Data collection was also an integral part of the training of early postgraduate students, such as Elizabeth Donaldson, Ann Mackay Miller, who later became a Modern Humanities Research Association fellow on the project, working on Time and Change, and Freda Thornton, who was both a Research Assistant and a PhD student. Thomas Chase continued to work for the project after completing his Glasgow PhD and returning to the University of Regina; since he was working on Religion while Thornton tackled Good and Evil, there was considerable interchange of data and ideas between the two while they were doctoral students in Glasgow. Theses were also completed in London under Roberts’ supervision, for example by Julie Coleman and Louise Sylvester. Titles of many of these completed PhD theses, or works, based on them can be found in the Bibliography.

Although now firmly established, in the coming years the project faced a series of intellectual, financial, and domestic challenges. The most dramatic of the latter was a fire in 1978, when the archive, by now amounting to many thousands of slips, survived only because it was housed in metal filing drawers within metal filing cabinets. Thereafter, the archive was microfilmed and new slips were completed in triplicate, with copies being stored at King’s College London, where Roberts now had a lectureship, and in the Glasgow University Archive. When the Department of English Language moved to its current home in 1984, a former kitchen was converted into a fire-proof archive. Since great importance was attached to the completeness of the archive, both that move, and the two temporary moves which preceded it, placed a considerable strain on the team’s organizational powers.

Fund-raising remained a perennial problem, producing many tense situations while people waited to hear whether a grant application had been successful and their jobs would continue. We became adept at dividing the project into manageable chunks which could be completed within the term of a grant. We practised economies in our use of resources: one colleague, perhaps with excessive zeal, worked out how many pages of the OED could be covered by a slip-maker with a single pencil (answer: 130). Renewal of the Leverhulme grant and a series of annual grants from the British Academy, plus funding from the University of Glasgow, enabled the Research Assistant posts to be maintained at various levels.

When Kay became a lecturer in the department in 1979, her place as Research Assistant was taken by Freda Thornton. Other new Research Assistants and Research Associates over the years included Lorna Gilmour, Ann Gow, Lesley Haughton, Cerwyss O’Hare, Liz Reay, Judith Wood, Marc Alexander, and Fraser Dallachy. Following Samuels’ retirement from his chair in 1989, Kay took over as director of the project.

From 1981 to 1988, a new source of funding opened up, in the form of government-sponsored programmes for people learning new skills of digital data entry and editing in return for a stint of work on the project, starting with three trainees and peaking at nineteen. This development necessitated changes in the way the project was managed; a new stage of pre-classification was introduced, where trainees prepared sections of classification for future work by more experienced editors. It also saw the beginning of bulk input of data, and coincided with one of the major developments of the 1980s, the increased use of computers for storage and manipulation of the data.

The subject of computing had first come up in 1981 during talks with Oxford University Press about eventual publication of the project. It was made quite clear then that anything of such size and novelty could be printed only if it was handed over in electronic form, and the subsequent development of the the Historical Thesaurus database took place with that in mind. Following discussions with Glasgow University Computing Service, a database of 29 fields was designed and implemented by Alasdair Forsythe.[3] Already in the late 1970s, Roberts and colleagues at the King’s College London Computing Centre had taken the first steps in computerizing the Old English materials, in order to provide a relatively small test corpus as a pilot study for the main project.

Forsythe’s program proved robust throughout the entire period of data entry, but the rapidly moving pace of technology dictated further developments. Storage on the Glasgow University mainframe computer necessitated several changes of database, and the development of web technology opened up new possibilities for displaying and searching data. From the mid 1980s, work on developing versions of the database and data retrieval routines was carried out by Flora Edmonds, Joyce Farmer, Ann Miller, and Irené Wotherspoon.[4] Edmonds continued in the role of database officer alongside expanded duties for the Department, with assistance from Jean Anderson and Alexander. In 2012 Brian Aitken arrived at Glasgow as Digital Humanities Research Officer for the School of Critical Studies, and worked on redeveloping the website and database for an increasingly web-centric world. Administrative support was provided by Ian Hamilton. It should be said that the role of the computer for the first edition was largely restricted to data entry, storage, and retrieval, although in the final stages of that edition routines developed in house and by Oxford University Press proved useful in identifying and correcting certain inconsistencies in the data. For the basic work of lexicography – extracting meanings and organizing them into categories – human input, working with paper slips, was essential. The second edition, by contrast, is entirely digital, which saves on paper slips but introduces its own challenges.

Basic slip-making from the first edition of the OED could have been completed by 1980, but before that a decision had been taken to include material from the Supplements to the OED published from 1972-86 and, later, from the second edition of 1989 and its Additions Series of 1993-97,[5] thereby enriching the project but also slowing it down. After the final Additions volume, the first edition decided to temporarily call a halt to data collection: following a gap to finish that edition, the Thesaurus is again being brought in line with the in-progress third edition of the OED, helped by colleagues at Oxford University Press.

Making a thesaurus is an endlessly circular process. From the late 1970s onwards, we began to classify the data, starting with the more obviously discrete conceptual fields, such as Music and Food. Classification thus replaced slip-making as the team’s principal activity; major grants to support both classification and the development of the database were received from the Leverhulme Trust, the Carnegie Trust for the Universities of Scotland, and the Arts and Humanities Research Board (now the Arts and Humanities Research Council). As the operation developed and the categories became less clear-cut, there was a good deal of movement of data between categories, with the result that sections completed in the 1980s built up a considerable backlog of material to be added. Nor was this necessarily the end of things, since the person revising a particular category might decide that some slips had been misdirected to it and send them on somewhere else. Slips from OED2 also had to be added, often requiring matching with OED1 senses. We could not therefore claim that the first edition of the project was finished until the last slip was slotted into place in July 2008. Because of the complexity of the various stages in the editorial process, we have not, except in the case of published theses, attributed sections of classification to individual lexicographers: all sections, including thesis material, have been worked on by two, or usually more, people, in the years since classification became the main focus of activity.

Following that final slip’s insertion into the database in its correct place, only proofreading and marketing work was left to do, and the team was delighted to see the first edition published in October 2009, to enthusiastic reviews and with much publicity. The presence of an advance copy in the Department of English Language throughout August attracted many colleagues to come and finally hold in their hands the product of 44 years’ labour.[6] A launch event was held at Glasgow on 22 October, attended by a hundred academics, friends, supporters, and team members past and present were invited to gather for the party. Michael Samuels made a keynote address at this event, seeing at last his major project come to fruition. In the ensuing months, the print edition sold far more copies than expected, and in November the gratifying news came that the first printing had sold out. A second printing was rushed through in time for Christmas, followed by yet more.

A few months after publication, we were in a position to be able to announce the Historical Thesaurus Postgraduate Studentships at Glasgow. Funded entirely from the print edition’s royalties, these studentships for new research postgraduates working on any aspect of the English language have enabled a significant number of talented young scholars to begin their careers.

Research could begin on the whole of the Thesaurus’ database now it was complete, and major funded projects using that data such as Mapping Metaphor with the Historical Thesaurus, the Enroller repository, Parliamentary Discourse, and the SAMUELS semantic tagger (named, of course, in tribute to Samuels, who had sadly passed away by this point) took up much of the time of the remaining project team. Kay had retired from her chair in 2005 in order to focus on seeing the project completed, and in 2014, on the fiftieth anniversary of the project’s birth, passed the directorship of the project to Alexander, who had been her deputy since 2008. Discussions with Oxford University Press continued, and an arrangement was finalised in 2014 for the exchange of updated Thesaurus data (by then in its fourth major editorial revision from the 2008 first edition data) for data from the third edition of the OED. Plans for the second edition were then put in place, with former Research Associate Dallachy joining Kay as deputy director, and the exacting – but exciting – process of editing and classifying lexical data begun once more.

All in all, Samuels showed a certain prescience when he wrote in 1972:

The production of such a thesaurus is arduous. Every attested meaning, past and present, must be semantically analysed and classified, and this can be achieved only by conventional methods, not by computer. It is at present being attempted for English only, and, from experience so far gained, will be a lengthy task.[7]

[1] See M. L. Samuels, “The Role of Functional Selection in the History of English”. Transactions of the Philological Society, 1965, 15-40.

[2] I. A. W. Wotherspoon, ‘A Notional Classification of Two Parts of English Lexis’, M. Litt. Thesis, University of Glasgow, 1969.

[3] See C. J. Kay & T. J. P. Chase, “Constructing a Thesaurus Database”. Literary and Linguistic Computing, 2, 3, 1987, 161-163.

[4] See Irené Wotherspoon, “Historical Thesaurus Database Using Ingres”. Literary and Linguistic Computing, 7, 4, 1992, 218-225.

[5] The Oxford English Dictionary. 1884-1933, ed. by Sir James A. H. Murray, Henry Bradley, Sir William A. Craigie & Charles T. Onions; Supplement, 1972-1986, ed. by Robert W. Burchfield, 2nd edn, 1989, ed. by John A. Simpson & Edmund S. C. Weiner; Additions Series, 1993-1997, ed. by John A. Simpson, Edmund S. C. Weiner, & Michael Proffitt; 3rd edn (in progress) OED Online, March 2000- , ed. by John A. Simpson. Oxford: Oxford University Press.

[6] For more detail, see C. J. Kay and M. Alexander, “Life After the Historical Thesaurus”. Dictionaries 31, 2010, 107-112.

[7] Linguistic Evolution, 180.