Kernerman Dictionary News • Number 13 • June 2005

If dictionaries are free, who will buy them?



If dictionaries are free, who will buy them? The question looms over publishing houses like a slow-motion tsunami. Dictionaries are now free, on the web and bundled with Microsoft Office and other products, so where is the publishers’ income stream going to come from?


Articles in the last four editions of this newsletter have addressed the title question. Charles Levine [KDN9, 2001] opened the debate in optimistic mode, seeing signs of growth in English-language lexicography despite the web. Joseph Esposito’s response [KDN10, 2002] was a:


“grim vision … when I complained about Microsoft bundling a spell checker, with its limited dictionary, into Word ages ago, the techies I knew all laughed at me. Now that most of them have burned through their venture capital after Microsoft "integrated" the gist of their products into Windows, we all cry into our lattes together.”

Then, following Levine’s still-optimistic response [KDN11, 2003], the villain of the piece bravely entered the fray. Microsoft’s
Julian Parish [KDN12, 2004] argued that Microsoft saw publishers as partners, not competitors. There were many new e-opportunities, for example, to publish add-ons to Microsoft products.


History: a dictionary in every household

In the twentieth century, a number of European and North American publishers occupied the fertile coastal strip of “a dictionary in every household”. Dependable as the cycle of one generation growing up and handing over to the next, it was a large and enviable market, in harmony with the grand and noble agenda of universal education. To be sure, the coastal strip was sometimes crowded with competitors, but the soil was good: there were always more households to buy dictionaries. They don’t need to buy them any more.


There is no use lamenting the lost market. It may disappear with varying speeds: as Esposito notes and Levine confirms:

“In the absence of growth, the old business will be strained for capital, which will beget smaller investments, which will in turn hasten the decline. In the short term, this will redound to the benefit of market leaders, such as Merriam-Webster and Oxford University Press…”

but disappear it will.


The market which is collapsing is the monolingual, emblematic “dictionary-at-home” market (the role of which has always been complex: status symbol for spelling, scrabble and – sometimes – schoolwork). Different markets, notably the boom EFL one and bilinguals that people need for travel and language-learning, have different trajectories.


The future

For the regular monolingual centerpiece, away from that lush dictionary-in-each-household coastal strip, what is there?


The key lies in quality. Most free dictionaries are not very good. Most people don’t care: a dictionary is a dictionary is a dictionary, good or bad, and one is plenty. Some free ones are even quite good; Esposito and Levine note the quality of the Encarta dictionary, possibly the first of a new breed of market-swamping, “good-enough” dictionaries.


But the minority of people for whom language is their trade do care. They are the translators and academics, etc. The numbers are tiny compared to the golden age but, in this, dictionary publishing is undergoing the same transformation as many other markets with the advent of the internet: the market fractures, and where there were a small number of products selling to millions, there are now millions of products – selling far smaller numbers – to billions. The up side is that customers can be found all over the globe and, once found, they are the right customers for the product so are likely to be willing to spend more.


The nice thing about this is that making good dictionaries, as opposed to bad ones, is what every lexicographer wants to do. There is usually tension between lexicographer and publisher – better vs cheaper – and the change in the market gives more weight to the lexicographer’s case. While Esposito despairs at the traditional publishers being left “to focus on the scraps Microsoft leaves on the floor”, we note that the market for the most accurate, the most consistent and the most current account of a language (or source-target pair) is far more than a scrap.


Of course, language professionals will be online. Lexicographically, this is exciting as it means the dictionary can be far better than any that went before: it is not constrained by space, and we can open our vision to the dictionary as an object integrated with the underlying corpus resources (as in Word Sketches1). But that is a different topic: here, our concern is for income streams.


Many of the language professionals are associated with universities and libraries. They are traditional customers for dictionaries, have substantial budgets, and, with physical space ever at a premium, are often enthusiastic about services which do not incur extra demands on space or personnel.


For example, Oxford Reference Online2 is an online subscription service, sold almost exclusively as a site licence to institutions, incorporating a wide range of Oxford University Press’s reference materials. It is very successful. Extensions which focus on language resources are planned. Of course, OUP has a wonderful brand, and has so many resources that it is able to offer a very broad resource, a one-stop-shop which is attractive to libraries. Others probably need to assemble into consortia (branding according to the best-known brand in each market). It is a route out of the path of the tsunami.


Dictionaries for computers?

All of the above is about dictionaries for people to use. Esposito, writing in 2002, says

“The real game for Microsoft is using lexical databases within computer algorithms, as in natural-language processing.”


Parish, too, stresses that Microsoft is an energetic customer for dictionaries for NLP (aka Language Technology, Computational Linguistics). As an NLP researcher, I’m a little sanguine here. To be sure, most NLP applications need dictionaries as inputs. In the short term most will probably be derived from dictionaries as we know them, where there are good ones available at reasonable cost. But consider eg, Prinsloo and de Schryver’s spellcheckers for African languages3. The wordlists are corpus derived. 


Across NLP, researchers are finding ways of solving problems using corpora. While high-quality, well-structured hand-crafted resources currently support technologies that corpus-derived resources don’t, the list is shrinking. Even three years ago, Esposito’s remarks looked right, but now, as NLP has changed, and while it may often be a short-term convenience for Microsoft and others to take publishers’ resources, it is not an income stream for the long term. While post-editing corpus-derived resources is a job that will need doing for some time yet, it is less than a glorious future for the grand old names of dictionary publishing.




3  D.J. Prinsloo and G-M de Schryver 2003. Non-word error detection in current South African spellcheckers. Southern African Linguistics and Applied Language Studies 21/4 (Special issue on 'Human Language Technology in South Africa: Resources and Applications'): 307–326. 


[For a version of this article with responses from Esposito and Levine, visit ]