
   #[1]next [2]previous [3]up [4]next
   
   [5][Next] [6][Up] [7][Previous] [8][Contents]
   Next: [9]6. How Aspell Works Up: [10]Aspell .27.2 alpha A Previous:
   [11]4. Library Interface   [12]Contents
   Subsections
     * [13]5.1 At Run Time
          + [14]5.1.1 Format of the language data files
               o [15]5.1.1.1 The Case Block
               o [16]5.1.1.2 The Vowel Block
               o [17]5.1.1.3 The Other Characters Block
          + [18]5.1.2 A complete example
          + [19]5.1.3 Finishing Up
     * [20]5.2 At Compile Time
          + [21]5.2.1 Getting Started
          + [22]5.2.2 The SC_Language Class
               o [23]5.2.2.1 Synophis
               o [24]5.2.2.2 The protected members
                    # [25]5.2.2.2.1 const char *name_
                    # [26]5.2.2.2.2 const char *to_lower_
                    # [27]5.2.2.2.3 const char *to_upper_
                    # [28]5.2.2.2.4 const char *is_alpha_
                    # [29]5.2.2.2.5 const char *soundslike_chars_
               o [30]5.2.2.3 Virtual Destructor
               o [31]5.2.2.4 Virtual Public Members
                    # [32]5.2.2.4.1 string to_soundslike(const
                      const_string &word) const
                    # [33]5.2.2.4.2 string to_phoneme(const const_string
                      &word) const
                    # [34]5.2.2.4.3 bool have_phoneme() const
                    # [35]5.2.2.4.4 int case_pattern (const const_string
                      &word) const
                    # [36]5.2.2.4.5 string fix_case (int pattern, const
                      const_string &word) const
                    # [37]5.2.2.4.6 bool trim_n_try (const aspell &sc,
                      const const_string &word) const
               o [38]5.2.2.5 Private Members
               o [39]5.2.2.6 Public Members
     _________________________________________________________________
   
                           5. International Support
                                       
   Note: Aspell International Support is about to under go a major
   rewrite. Please see
   [40]http://metalab.unc.edu/kevina/aspell/international/ for more
   information. The information presented here will be outdated very
   soon.
   
   Even though Aspell is designed around the English language Aspell will
   do OK with other non-English languages provided that it doesn't have
   an extremely large dictionary (say over a megabyte of two in size) or
   have a lot of affication (to the point where affix compression will
   shrink the size over 50%). If the language has a large dictionary or a
   lot affication Aspell will work but it will take up a lot space due to
   the way Aspell indexes the words (see [41]6) and the fact that Aspell
   currently lacks any sort of affix compression (see[42]B.7.1 ).
   
   Support for other language can either be added at run time through a
   language data file or at compile time.
   
                                5.1 At Run Time
                                       
   Languages can be added at will through the use of a language data
   file. The file name must be in the same directory that the word
   list(s) are and it must be named <language>.lang where <language> is
   the name of the language you are added support for.
   
5.1.1 Format of the language data files

   The data file consists of three blocks of information inclosed in
   braces. Any information out side of the braces is ignored. The white
   space before and after the braces is mandatory.
   
  5.1.1.1 The Case Block
  
   The first block of information contains the upper to lower case
   mapping. It conceits of lower/upper case pairs of letters with white
   space between them. For example here is the case mapping for English:
   
     { aA bB cC dD eE fF gG hH iI jJ kK lL mM nN oO pP qQ rR sS tT uU vV
     wW xX yY zZ }
     
   If a character is the same in both upper and lower case than repeat
   twice such as ``kk''. Failure to do so will result in an error. Also
   as I said before the white space before and after the braces ({ })is
   mandatory.
   
  5.1.1.2 The Vowel Block
  
   The second block of information contains a list of the vowel or vowel
   like characters in lower case. For example the second block for
   English would be:
   
     { a e i o u y }
     
  5.1.1.3 The Other Characters Block
  
   The last block of information contains a list of other characters
   which are not part of the alphabet but can nevertheless appear within
   a valid word. The final block for English would be:
   
     { ' }
     
5.1.2 A complete example

   For you reference here is what the complete english.lang file looks
   like
   
     Language File for english
     
     Case Block { aA bB cC dD eE fF gG hH iI jJ kK lL mM nN oO pP qQ rR
     sS tT uU vV wW xX yY zZ }
     
     Vowel Block { a e i o u y }
     
     Other Characters Block { ' }
     
5.1.3 Finishing Up

   Once you created the data file you need to pass the dictionary through
   ``aspell master'' to properly prepare it using the new language. Now
   just make sure the word list and the language data file are in the
   same directory.
   
   Once you have used the new language for a while please consider
   sending me a copy of the data file so that I can include it in future
   versions.
   
                              5.2 At Compile Time
                                       
   More complete support for a language can be added by writing some code
   and recompiling the source file. In order to do this you should have
   the latest version of automake, autoconf, and libtools installed as
   the Makefile is going to need to be recreated.
   

5.2.1 Getting Started

   The easiest way to get started is to write the language data file
   first and use the aspell utility to create most of the code for you.
   The usage is
   
     aspell lang [<path>]<lang>
     
   Where <path> is the optional fully qualified directory name of the
   location of the language data file and <lang> is the name of the lang.
   There should be no space between <path> and <lang>.
   
   This will create to file asl_<lang>.hh and asl_<lang>.cc containing
   all the code you need to compile in support for the language. However,
   in order to get Aspell to recognize the new language you need to
   modify the file language.cc in two places. You need to include the the
   asl_<lang>.hh file and you need to add the language to the lookup
   static variable. The line to add should look like this:
   
     lookup_pair("<lang>", new_SC_<Lang>)
     
   Where <Lang> is <lang> with the first letter capitalized. For example
   to add support for a French language you would say:
   
     lookup_pair(``french'', new_SC_French)
     
   This line can go anywhere in the table however I recommend that you
   add it after the last entry. Just be sure you remember the list still
   has all the necessary commas.
   
   Finally you need to add the file asl_<lang>.cc to the end of the
   libspell_la_SOURCES variable in Makefile.am and then type make. All
   the necessary files re be recreated automatically provided that you
   have the proper tools installed.
   
   Once you have successfully used the compiled in language you can start
   experimenting with fine tuning it by overriding virtual methods in the
   SC_Language class.
   
5.2.2 The SC_Language Class

   The SC_Language class is the base class for language support all
   language class must be derived from this class.
   
  5.2.2.1 Synophis
  
   class SC_Language {
          protected:
            enum CasePattern {all_lower, first_upper, all_upper};
            static const char consonant = 1, vowel = 2, special = 3;
          
            const char *name_;
            const char *to_lower_;
            const char *to_upper_;
            const char *is_alpha_;
            const char *soundslike_chars_;
          
            SC_Language() {}
          
          public:
            virtual ~SC_Language() {}
          
            virtual string to_soundslike(const const_string &word) const;
          
            virtual int case_pattern (const const_string &word) const;
            virtual string fix_case (int pattern, const const_string &wor
          d) const;
            virtual bool trim_n_try (const aspell &sc, const const_string
           &word) const;
            virtual bool have_phoneme() const;
            virtual string to_phoneme(const const_string &word) const;
          
          private:
            SC_Language(const SC_Language&);
            const SC_Language operator= (const SC_Language&);
          
          public:
            char to_upper(const char c) const
            char to_lower(const char c) const
            bool is_upper(const char c) const
            bool is_lower(const char c) const
            bool is_special(const char c) cons
            // other irrelevant non virtual methods
          };
          
  5.2.2.2 The protected members
  
   All of the protected data members must be given a value by the derived
   class as the public methods relay on them. The ``aspell lang'' utility
   will take care of this for you so for most cases you don't need to
   worry about them.
   
    5.2.2.2.1 const char *name_
    
   This data members needs to point to a null terminated string
   containing the name of the current language.
   
    5.2.2.2.2 const char *to_lower_
    
   This data member needs to point to a 256 character long character
   array which maps the upper case characters to the lower case. A
   static_cast<unsigned char> is performed on the character before it is
   looked up so that a signed value of -1 would become 128. If the
   character c is an upper case character than
   to_lower_[static_cast<unsigned char>(c)] needs to contain c in lower
   case. If c is not in upper case then it needs to contain c.
   
    5.2.2.2.3 const char *to_upper_
    
   The same as to_lower but it maps lower case characters to upper case.
   
    5.2.2.2.4 const char *is_alpha_
    
   Similar to to_lower_ and to_upper_ except that
   is_alpha_[static_cast<unsigned char>(c)] need to be false (0) if c is
   a non-word character and true (anything but 0) otherwise.
   
   In addition if the to_soundslike method is not overridden c needs to
   be SC_Language::consonant if c is a consonant, and SC_Language::vowel
   is c is a vowel. If the trim_n_try method is not overridden c needs to
   be SC_Language::special if c is a non-alpha characters that can appear
   as part of the word, such as the appophes (') in english.
   
    5.2.2.2.5 const char *soundslike_chars_
    
   Needs to contain a null terminated array of characters which contains
   all of the characters that can appear in a to_soundslike string. If
   the to_soundslike method is not overridden this will be all the lower
   case consonant.
   
  5.2.2.3 Virtual Destructor
  
   The destructor must be defined if your class uses any dramatically
   allocated memory as the SC_Language class destructor does not delete
   anything.
   
  5.2.2.4 Virtual Public Members
  
   These methods only have to be overridden if you are unhappy with job
   they do. const_string is a very limited version of the string class.
   It has an iterator and can be used like a random access container
   however it doesn't have any of the fancy string methods such as find
   and substr.
   
    5.2.2.4.1 string to_soundslike(const const_string &word) const
    
   This method needs to return a string which represents what the word
   roughly sounds like.
   
    5.2.2.4.2 string to_phoneme(const const_string &word) const
    
   This method needs to return a string which represents the phoneme for
   the word.
   
    5.2.2.4.3 bool have_phoneme() const
    
   Needs to return true if the to_phoneme method is overloaded.
   
    5.2.2.4.4 int case_pattern (const const_string &word) const
    
   This method needs to study the string and return an integer which
   represents the case pattern (such as all uppercase, first letter
   uppercase, etc..)
   
    5.2.2.4.5 string fix_case (int pattern, const const_string &word) const
    
   This method needs to fix the case of word so that it has the same case
   pattern as pattern and return the new word.
   
    5.2.2.4.6 bool trim_n_try (const aspell &sc, const const_string &word)
    const
    
   This method should try to trim special characters (such as the
   apposhes in english) from the word and then see if it is a valid word.
   If it can find a valid word by trimming it should return true.
   Otherwise it should return false.
   
   To avoid infinite recursion this methods should not call aspell::check
   as aspell::check calls this method. Use aspell::check_notrim instead
   (aspell:check_raw should not be used as it doesn't not try to change
   the case of the word thus 'Do' would come back false)
   
  5.2.2.5 Private Members
  
   Both the copy constructor and the assignment operator are private so
   that you don't have to worry about copies being made.
   
  5.2.2.6 Public Members
     _________________________________________________________________
   
   [43][Next] [44][Up] [45][Previous] [46][Contents]
   Next: [47]6. How Aspell Works Up: [48]Aspell .27.2 alpha A Previous:
   [49]4. Library Interface   [50]Contents
   
   
    1999-03-01

References

   1. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html
   2. file://localhost/home/kevina/devel/aspell/manual/man-html/4_Library.html
   3. file://localhost/home/kevina/devel/aspell/manual/man-html/manual.html
   4. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html
   5. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html
   6. file://localhost/home/kevina/devel/aspell/manual/man-html/manual.html
   7. file://localhost/home/kevina/devel/aspell/manual/man-html/4_Library.html
   8. file://localhost/home/kevina/devel/aspell/manual/man-html/Contents.html
   9. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html
  10. file://localhost/home/kevina/devel/aspell/manual/man-html/manual.html
  11. file://localhost/home/kevina/devel/aspell/manual/man-html/4_Library.html
  12. file://localhost/home/kevina/devel/aspell/manual/man-html/Contents.html
  13. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00610000000000000000
  14. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00611000000000000000
  15. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00611100000000000000
  16. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00611200000000000000
  17. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00611300000000000000
  18. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00612000000000000000
  19. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00613000000000000000
  20. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00620000000000000000
  21. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00621000000000000000
  22. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622000000000000000
  23. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622100000000000000
  24. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622200000000000000
  25. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622210000000000000
  26. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622220000000000000
  27. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622230000000000000
  28. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622240000000000000
  29. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622250000000000000
  30. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622300000000000000
  31. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622400000000000000
  32. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622410000000000000
  33. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622420000000000000
  34. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622430000000000000
  35. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622440000000000000
  36. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622450000000000000
  37. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622460000000000000
  38. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622500000000000000
  39. file://localhost/home/kevina/devel/aspell/manual/man-html/5_International.html#SECTION00622600000000000000
  40. http://metalab.unc.edu/kevina/aspell/international/
  41. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html#works
  42. file://localhost/home/kevina/devel/aspell/manual/man-html/B_Do.html#affixcomp
  43. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html
  44. file://localhost/home/kevina/devel/aspell/manual/man-html/manual.html
  45. file://localhost/home/kevina/devel/aspell/manual/man-html/4_Library.html
  46. file://localhost/home/kevina/devel/aspell/manual/man-html/Contents.html
  47. file://localhost/home/kevina/devel/aspell/manual/man-html/6_How.html
  48. file://localhost/home/kevina/devel/aspell/manual/man-html/manual.html
  49. file://localhost/home/kevina/devel/aspell/manual/man-html/4_Library.html
  50. file://localhost/home/kevina/devel/aspell/manual/man-html/Contents.html
