Stats and Figures

The Historical Thesaurus contains 793,742 word forms arranged into 225,131 semantic categories. (By comparison, the second edition of the OED, one of our primary sources, defines 615,100 word forms.) This gives an average of 3.5 word forms used to describe each concept across the history of English.

The largest categories in the database are:

The most commonly occurring word forms across the history of English are:

set (345 occurrences), run (302), strike (256), fall (206), cast (187), round (179), turn (174), point (169), slip (165), pass (160), shoot (159), take (158), show (157), stand (157), stock (153), up (151), stop (146), work (145), cut (145), light (142), pitch (141), roll (141)

Overall, our electronic database contains over 1 million rows and 25,376,610 separate pieces of data.

The Thesaurus project itself formally began on 15 January 1965 at an address to the Philological Society in London, where Professor Michael Samuels announced that the work would be undertaken by himself and his colleagues at Glasgow, and the production of version 1 of the Thesaurus ended at the final launch party on 22 October 2009; this first stage of the project therefore consumed 44 years, 9 months and 1 week exactly (or 16,351 days), and the total cost of the Thesaurus was £1.1million in grants (when adjusted for inflation approximately £2.2m/$3.4m in 2010 equivalent), in addition to a good deal of uncosted academic time; a bargain at a little over 1p per word and around 340 words a week! Overall, the Thesaurus is the work of over 230 people, taking approximately 320,000 person-hours to complete – the equivalent of 176 years of solid work for one person.