Word frequency

Jennifer mentioned something a few days ago which made me reflect more carefully on the work that I am doing.  I quote her: ‘…exercises that have a morphological focus, like the opposites crosswords, can sometimes throw emphasis onto the less useful one of the pair, or an opposite that is not used in the same way in academic English as its counterpart. For me, an example would be the opposites involved (which is a low lexicality word, which I have found to be used mainly as a listing device, e.g. for stages in a process), whereas ‘uninvolved’ is rather more specialised and unusual (what is its frequency in academic texts compared with ‘involve/d?) I think these considerations are important where we are developing AWL-based exercises.’

Since then I have been considering frequency more carefully, alongside the general joys of morphology.  I am devising a prefixes exercise and had included a number with ‘in-’.  It occurred to me to run them through Google, just to get a very rough estimate of their frequency.  I was surprised by the results.  Now, I am aware that Google is a blunt instrument to gauge frequency, but it does give us some idea, especially when the differences are huge (and, as you will see, they are).  Obviously, the texts are a mixed bunch, so it doesn’t tell us much about academic texts, but, all other things being equal, this has helped me choose which two words out of ten to leave out of my exercise. The words I checked were:  inappropriate  inconclusive  insufficient invalid   invariable  insecure inconsistent  indistinct  inaccessible  inadequate. 

I could not have guessed the different levels of frequency in Google.  Here they are in millions of pages for each of the words: inappropriate 105; insufficient 26; invalid 60; inadequate 22  inconsistent 17;  insecure 8;  inaccessible 7; inconclusive 3; invariable 2; indistinct 1.5.  I have rounded the figures up or down. 

Interestingly, in Gogle Scholar the ranking was not exactly maintained and, of course, the numbers are much smaller given the smaller collection of texts: inadequate 2,310,000; insufficient 2,450,000;  inappropriate 1,370,000; inconsistent 857,000;  invalid 704,000; inaccessible 474,000; inconclusive 343,000; invariable 323,000; insecure 259,000; indistinct 95,000.

So, this helps me to choose which words to include in exercises.  Obviously , the starting point is the AWL and frequent academic words, but a word’s frequency in general texts is also a factor to consider.  Anyway, I no longer have any doubts that ‘indistinct’ and ‘invariable’ are the words to remove from the exercise!

