Lexicalization Sparklines

Introduction to the sparklines tool

The purpose of this sparkline tool is to give a context-free indication of a category’s development over time, and therefore allow a rapid visual assessment of the ‘shape’ of its growth. It is possible to use sparklines to see, for example, which categories remain stable for decades before experiencing sudden growth, which grow steadily, or even which decline in size dramatically. Sparkline charts are scaled to a standard size rather than based on y-axis values (i.e. number of word senses); this means that the development in a category with only a handful of word senses is instantly comparable with a category containing hundreds of senses. Viewing many sparklines together can help a user identify ‘normal’ or ‘abnormal’ growth in categories, or whether categories in a given area of the Thesaurus hierarchy share a pattern of development, and thus may be a starting point for further investigation of the relevant semantic fields and their histories. The sparkline visualization tool uses the human-scale thematic category set.

Using the tools on this site, it is possible to pre-select features of category development such as rapid rises or sharp declines in size. The user defines the values of interest for a given set of parameters, and the tool returns sparklines for all categories which display these features.

Sparklines should be interpreted with caution as the processes which have shaped lexicalization of each semantic field are incredibly complex and unique; these visualizations are intended as an entry point to deeper investigation.

Main parameters

Time period selection

A double-ended slider allows selection of the time period in which a user is interested. Results will still be returned showing the full timescale covered by Thesaurus data, but where requested features (peaks, plateaus, etc.) occur within the specified range of dates. Old English words are excluded because they cannot be assigned to a decade and thus would create an artificial spike at the start of most sparklines. The earliest available decade is the ‘1010s’, whilst the ‘2000s’ at the upper end includes all word senses considered to be currently active. These are defined as those for which citation evidence attests that the sense was in use during the time period in question; as well as excluding senses which were not coined within the given period, a count of active senses excludes any which fell out of use before that period began. As well as using the sliders, start and end dates may be typed directly into the boxes, but must be expressed as decades (e.g. 1470s rather than 1476 or 1470) or they will return zero results.

Average decade size range

This setting allows users to select an approximate size for categories, excluding any which are too small and/or too large for their interests. The average size of a category is a mean calculated from a count of the active word senses in that category per decade. As well as using the slider, maximum and minimum values may be typed directly into the appropriate boxes.

Minimum size of largest decade

This setting can be used to exclude categories which never obtain a large enough size to be interesting to the user. Rather than an average size, this setting uses the maximum size a category reaches, based on counts of active senses per decade.

As an example of the use of these criteria, a user may want to search for instances of ‘trauma fall’ which occur in the period between 1700CE and 2000CE, for categories with an average size of between ten and a hundred senses, but which reach a peak of at least twenty senses. If the desired fall has to be at least 20% of the category’s average size, the tool returns 1277 categories matching these criteria.

Extra parameters may be set for peaks, plateaus, and trauma rise or fall:

  • Peaks: A peak occurs when a category reaches its maximum size and then begins to decline. Peak identification is less reliable in the 20th century as its accuracy depends on the date at which the relevant dictionary headwords were revised. Categories which reach their highest values in the 20th century are, therefore, automatically discounted. The loss of a few word senses from small categories can create a misleading impression of a peak. Identification of a peak is therefore more reliable when the standard deviation of the category is greater than thirty and when there is a decline of ten senses or more between the peak and the ‘2000s’ decade. Users may experiment with their own values for the minimum standard deviation and percentage decrease in category size from the peak to the present day.
  • Plateaus: A plateau is deemed to occur when the active sense count varies very little from decade to decade. Plateaus are identified by finding the modal value of a category’s size and counting the number of decades in which the mode, or something very similar to the mode appears. Although it is possible that a category could vary wildly in size but still return to the mode value repeatedly, the spark line graph should reveal whether such behaviour is present. For of a plateau to occur, it is suggested that the mode value should occur more than five times, and that values close to the mode (i.e. within 5% of the modal value) should occur thirty times or more. These values are open to refinement and users of the website may wish to try other possibilities.
  • Trauma rise and trauma fall: Sudden rapid increase or decrease in category size may be the result of traumatic incidents in the history of the language and culture of its speakers, so that a concept either gains great cultural importance or becomes taboo or obsolescent in a short period of time. As a guide, if a category containing more than ten senses gains more than 5% of its average size within a decade, this could be seen as a traumatic rise. Traumatic falls are similarly defined although, as a guideline, a category experiencing a traumatic fall should contain an average of twenty or more senses and should lose more than 30% of the average size of its contents in its largest inter-decade fall.

Explore Sparklines

Filter the Sparklines using the following criteria:

1. Select the period you are interested in: Between and

The 'from' and 'to' values in the rest of the form will change depending on the period you have selected.

2. Select the average decade size range you wish to include: Between and

3. Select the minimum size of largest decade:

4. Select the feature you're interested in:


(only categories that have a peak in the selected period will be returned)

Peak decade: Between and

Options are limited to your selected period.

Minimum percentage difference between largest value and end:
between 0 and -100

Minimum standard deviation:
values are from 0.2 to 566.8 in your selected period


Minimum frequency of mode:
values are from 1 to 70 in your selected period

Minimum mode:
values are from 0 to 81 in your selected period

Minimum frequency of 5% either way from mode:
values are from 0 to 50 in your selected period


Minimum size of largest decade to decade rise :
values are from 1 to 323 in your selected period

Minimum percentage rise of average category size:
values are from 9 to 3333 in your selected period


Minimum percentage fall of average category size:
values are from -3333 to 0 in your selected period