Standardised test is perhaps the only objective method of measurement of human ability that is feasible with contemporary level of technology.
However, the preparation for exams waste a substantial amount of time of students every year.
Also, as the number of questions are not very high (~100 questions), the confidence interval is very wide. If there are 100 questions, the confidence level is 95%, the score is 50%, the confidence interval is ±9.8%p. However, if the number of questions could be increased to 500, 1000, 1500, 2000, at the confidence level 95%, the score 50%, the confidence intervals would be ±4.4%p, ±3.1%p, ±2.5%p, ±2.2%p, respectively.
As most dictionaries nowadays are edited and stored in an electronic retrieval system, the entire dictionary can be utilised for a computer-based randomised vocabulary test.
One of any randomly-computer-selected 171,476 words can be shown to the examinee. A number of definitions can be shown as possible choices. Only one choice is the correct answer. To prevent guessing the meaning of the word without sufficient knowledge of the word, an incorrect answer might be penalised with an option to skip the question.
(The test could be conducted on another way, by showing the definition of one of any one word, and giving multiple words as possible choices. This method measures the examinee’s active vocabulary, while the former method measures the examinee’s passive vocabulary)
If the examinee is given 3.6 seconds per question, the 1000-question exam can be conducted within 1 hour. The result will predict the size of the vocabulary of the examinee with the confidence interval ±3.1%p, the confidence level 95%.
If the 171,476-words (contemporary English) lexicon of the Oxford English Dictionary, 2nd Edition is used, the size of the examinee’s lexicon can be estimated with the formula of the ratio score * 171476. For example, if the examinee get 500 answers right out of 1000 questions, the percent score will be 50%, and the estimated size of the lexicon is 85,738.
Estimation of the size of the lexicon
Standardised college readiness test (for both undergraduate and graduate programmes)
Intelligence test (the percentile score can be converted to IQ with 68–95–99.7 rule, the percentile-based (lexical) verbal IQ can be calculated by the percentile score among the native speakers of the same/similar chronological age)
What are its advantages?
It is confidentiality-neutral. (The exam material need not be confidential, as entire dictionary can be used and the computer randomly ask question for each examinee) As the exam material need not be confidential, the test can be practised over and over again, and the examinee can take the exam at any date and time of her convenience and even perhaps anywhere (so long as she is invigilated, although the examinee can be invigilated with electronic measures (such as webcams) as well)
It is hard to prepare specifically for the exam, without actually learning much. While the test can definitely motivate students to learn only vocabulary to be accepted to elite universities, lexical resource is a very important aspect of one’s college readiness. It is unlikely for a language user to have an excellent lexicon without comparable proficiency in all four components (reading, writing, listening, speaking) of the language usage. Also, the size of one’s lexicon is an excellent predictor of one’s reading habits and intellectual development.
Seeking for developers
Please contact me (email@example.com) if you are a developer and interested in bringing this to reality.
First, I need you to use a public domain dictionary data. (http://www.gutenberg.org/cache/epub/673/pg673.txt)
Second, I need you to use Wikipedia Dump for word frequency and word usage data. (https://dumps.wikimedia.org/enwiki/20180120/)
Third, I need you to create a randomized vocabulary test (like flashcard, like memrise), as mentioned by my article on my blog (http://jiwoonhwang.org/vocabularytest/)
Fourth, I need you to create a vocabulary flash card learning system (like memrise), while it is similar to memrise, it should adopt the frequency data from wikipedia dump.