Hapax legomenon

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Rank-frequency plot for words in the novel Moby-Dick. About 44% of the distinct set of words in this novel, such as "matrimonial", occur only once, and so are hapax legomena (red). About 17%, such as "dexterity", appear twice (so-called dis legomena, in blue). Zipf's law predicts that the words in this plot should approximately fit a straight line.

A hapax legomenon is a word that is only occurs once in a corpus of text. The plural is either hapax legomena, or hapaxes. The word comes from Ancient Greek, and means (something) only said once.

In this context, a word that occurs twice is called dis legomenon (/ˈdɪs/), one that occurs three times tris legomenon (/ˈtrɪs/) and one that occurs four times tetrakis legomenon (/ˈtɛtrəkɨs/).

Hapax legomena are quite common, as predicted by Zipf's law,[1] which states that the frequency of any word in a work (corpus) is inversely related to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type) are hapax legomena, and another 10% to 15% are dis legomena.[2] In the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.[3]

Note that hapax legomenon refers to a word's appearance in a body of text, and does not talk about its origin nor how often it is used in speech. For this reason, it is different from a nonce word, which may never be recorded, or which may find currency and may be widely recorded, or which may appear several times in the work which coins it, and so on.

References[change | change source]

  1. Paul Baker, Andrew Hardie, and Tony McEnery, A Glossary of Corpus Linguistics, Edinburgh University Press, 2006, page 81, ISBN 0-7486-2018-4.
  2. András Kornai, Mathematical Linguistics, Springer, 2008, page 72, ISBN 1-84628-985-8.
  3. Kirsten Malmkjær, The Linguistics Encyclopedia, 2nd ed, Routledge, 2002, ISBN 0-415-22210-9, p. 87.