SimText Reduction




One of the difficulties in performing a search on a computer database is the requirement that the user must know what it is that they are looking for, and when looking for a specific word, know how to spell it, and, not only that, they must know how it is used in the database; does the database contain a reference to Colour or Color, Aluminium or Aluminum ?

This is a similar problem to looking up the spelling of a word in a dictionary when you can't spell it in the first place.

By reducing the database text, and the search words later used, to a very low-level component that fully represents a word, it is possible to perform a database search which overcomes the vagarities of spelling; especially the subtle differences between British English and American English.

This is the basis of SimText ( short for Simple Text ) reduction.


How does SimText Reduction work ?

The process of SimText reduction is very simple; all words in the database to be searched, and all words later used to perform the search are reduced to their lowest-level component parts.

Once the database text and search words have both been reduced to their low-level, component parts; a database search is more likely to reveal a possible match using the two despite any errors or ambiguity in the words used in either the database or those used to perform the search.

We will demonstrate the process of SimText reduction by taking the case of a phrase that may appear in a database, "Many magic Dwarves phone home on Halloween", and show how the phrase is compressed using SimText reduction.

Performing SimText reduction is a simple case of following a small set of rules ...

  • Start with the text to reduce - Many magic Dwarves phone home on Halloween.

  • Turn all plurals ( of words with more than three letters ) into singularites - Many magic Dwarf phone home on Halloween.

  • Rationalise phonetic word parts ( ie PH becomes F ) - Many magic Dwarf fone home on Halloween.

  • Remove all adjacent, duplicate letters from each word - Many magic Dwarf fone home on Halowen.

  • Remove all vowels - Mny mgc Dwrf fn hm n Hlwn.

  • If any reduced word is only one or two letters long; use the original word - Mny mgc Dwrf phone home on Hlwn.

Once we have reduced our original text we can do the same to any words that are used for searching it. This means that we search using the SimText reduced words within the SimText reduced text so a search for Dwarfs ( Dwrf after reduction ) will still find a match, in the example we have followed, even though the spelling is incorrect.


An example of SimText translation

An interesting aspect of the English language is that there is an incredible amount of redundancy built into it. This means that it is quite easy to understand phrases and sentances even when the words have been corrupted or are simply missing.

The redundancy in English is so great that it is entirely possible to understand complete passages after SimText reduction, especially if the punctuation is left in, as is shown by the SimText reduced example below.

There is, obviously, a semantic aspect to the English language and it is certainly true that SimText reductions of literary passages are much easier to understand than technical ones. There will always be some reduced words that are difficult to resolve, however, it is an interesting phenomena of English none the less ...

Hint :Try reading the SimText reduced passage with a broad Scottish accent. This trick also works well with formal phonetic translation of texts - Psychology students should make a note of this fact now ( and send money when this revelation proves useful ).

The SimText reduced passage

Rtrd tlr Hrld Snby had been wrng a hrng aid for twnty year but it nvr smd to have done him mch good. Hrld, aged svnty four, dscvrd the rsn when he went to Leed Hsptl for a rtn chckp ... and was told that he had been wrng it in the wrng ear.

Hrld said, "It pr that thr was a mxp when it was frst ftd. The aid was mldd to fit my lft ear nstd of my rght one. I lwy thght it was prty sls."

The original passage

    Before clicking on the button below; have a good read of the above SimText reduced text; I reckon it should be possible to determine over 95% of the original words correctly if you put your mind to it.





Site Navigation

  Home Page
  What's New
  Search
  Add Bookmark
  Have Your Say
  Guestbook




First published sometime before Tuesday the 16th of November, 1999
Last upload was on Tuesday the 23rd of September, 2003 at 19:19:55