This is a similar problem to looking up the spelling of a word in a dictionary
when you can't spell it in the first place.By reducing the database text, and the search words later used, to a very
low-level component that fully represents a word, it is possible to perform
a database search which overcomes the vagarities of spelling; especially the
subtle differences between British English and American English.
This is the basis of SimText ( short for Simple Text ) reduction.
How does SimText Reduction work ?
The process of SimText reduction is very simple; all words in the
database to be searched, and all words later used to perform the search are
reduced to their lowest-level component parts.Once the database text and search words have both been reduced to their
low-level, component parts; a database search is more likely to reveal a
possible match using the two despite any errors or ambiguity in the words
used in either the database or those used to perform the search.
We will demonstrate the process of SimText reduction by taking the
case of a phrase that may appear in a database, "Many magic Dwarves phone
home on Halloween", and show how the phrase is compressed using SimText
reduction.
Performing SimText reduction is a simple case of following a small
set of rules ...
- Start with the text to reduce -
Many magic Dwarves phone home on Halloween.
- Turn all plurals ( of words with more than three letters ) into
singularites -
Many magic Dwarf phone home on Halloween.
- Rationalise phonetic word parts ( ie PH becomes F ) -
Many magic Dwarf fone home on Halloween.
- Remove all adjacent, duplicate letters from each word -
Many magic Dwarf fone home on Halowen.
- Remove all vowels -
Mny mgc Dwrf fn hm n Hlwn.
- If any reduced word is only one or two letters long; use the original
word - Mny mgc Dwrf phone home on Hlwn.
Once we have reduced our original text we can do the same to any words that
are used for searching it. This means that we search using the SimText
reduced words within the SimText reduced text so a search for
Dwarfs ( Dwrf after reduction ) will still find a match, in the
example we have followed, even though the spelling is incorrect.
An example of SimText translation
An interesting aspect of the English language is that there is an incredible
amount of redundancy built into it. This means that it is quite easy to
understand phrases and sentances even when the words have been corrupted or
are simply missing.The redundancy in English is so great that it is entirely possible to
understand complete passages after SimText reduction, especially
if the punctuation is left in, as is shown by the SimText reduced
example below.
There is, obviously, a semantic aspect to the English language and it is
certainly true that SimText reductions of literary passages are much
easier to understand than technical ones. There will always be some
reduced words that are difficult to resolve, however, it is an
interesting phenomena of English none the less ...
|
Hint : | Try reading the SimText reduced
passage with a broad Scottish accent. This trick also works well with formal
phonetic translation of texts - Psychology students should make a note of
this fact now ( and send money when this revelation proves useful ).
|
The SimText reduced passage
|
Rtrd tlr Hrld Snby had been wrng a hrng aid for twnty year
but it nvr smd to have done him mch good. Hrld, aged svnty four,
dscvrd the rsn when he went to Leed Hsptl for a rtn
chckp ... and was told that he had been wrng it in the wrng ear.
Hrld said, "It pr that thr was a mxp when it was frst ftd.
The aid was mldd to fit my lft ear nstd of my rght one. I lwy
thght it was prty sls."
|
The original passage
|
Retired tailor Harold Senby had been wearing a hearing aid for twenty years
but it never seemed to have done him much good. Harold, aged seventy four,
discovered the reason when he went to Leeds Hospital for a routine
check-up ... and was told that he had been wearing it in the wrong ear.
Harold said, "It appears that there was a mix-up when it was first fitted.
The aid was moulded to fit my left ear instead of my right one. I always
thought it was pretty useless."
The World's Greatest Mistakes, Nigel Blundell
|