Google Summer of Code 2012 Project: Disha

New Visual Keyboard For Bengali

Project Background:

Currently, the most commonly used and popular keyboard layouts available for Indic scripts such as Bengali use a kind of non-visual style of typing. Now, the big question lies in the fact that what exactly is a “Non-Visual Style of Typing”? The answer to that can be stated simply as the sequence in which the characters are typed into the system is not exactly the sequence in which they are displayed.
This can be explained by a simple example of an input combination of the Bengali consonant Ka (ক) and the dependent vowel sign E (ে) as follows:-
Typing sequence: ক+ে
Display sequence: কে
This non-visual style is achieved by following a uniform method of typing the characters as per their type(i.e. consonants,  independent / dependent vowels, special characters, conjunct characters) and are defined by specific sets of rules.

Problem with the current system:

Even though the existing non-visual style of writing is quite prevalent, this poses a major learning challenge for new users who are usually more used to the conventional visual way of writing.
How so?
Reiterating the above example the most common problem faced is that inexperienced users usually try the above consonant-vowel combination in the following way:-
ে+ক
thus ending up with the following display:-
েক
which is not how it should be.

This project is thus aimed at creating a Visual Typing Method for complex scripts like Bengali.

Project Implementation Logic:

The examples stated in the previous section outline just one of the implications for the project. However, there are quite a few cases which need to be implemented on top of the existing system to create such a Visual Layout. This project is primarily concerned with dependent vowels and split vowels which need to rendering of pre-base matras for base consonants.
The main implementation focus of this project can thus be listed as below:-

Case 1:
BENGALI VOWEL SIGN E(ে)[Unicode: 0x09C7]: In this case the rendering engine should be able to process an input combination in which the dependent vowel is input first followed by the base consonant, i.e. if the base consonant is ক, the input system should be able to process the input combination: ে+ ক  as কে in which the vowel the input system first takes ে as input, stores in it’s buffer and waits for the following input. If the following input is a consonant like ক as in the current example, it renders ে as the pre base matra, and ক as the base consonant.

Case 2:
BENGALI VOWEL SIGN AI (ৈ)[Unicode: 0x09C8]: This case is similar in behaviour to the previous case, the only difference being in the input vowel, which is Oikar in this case. Thus the input sequence being ৈ+ক, and the display sequence being কৈ.

Case 3:
BENGALI VOWEL SIGN I(ি)[Unicode: 0x09BF]: This case is also similar to the previous couple of cases where the dependent vowel ি is to be input followed by the consonant as ি+ক and is displayed as কি where again ি becomes the pre base matra followed the base consonant ক.

Case 4:
BENGALI VOWEL SIGN O (ো)[Unicode: 0x09CB]: This case implements split vowels, wherein one part of the vowel sign could be input before the base consonant, while the other part maybe be input after the consonant. This can be explained by the following example:-
Input combination: ে+ক+া
Display combination: কো
This input sequence can be implemented by the input system in two parts. In the first part the input system stores the input vowel ে in the buffer. If the next character is a consonant such as ক it renders the combination as কে and again stores this to the buffer. Now, if the next input character is the vowel া, then it renders the combination as কো and commits it as the output, otherwise the input system, commits the output as কে and initializes the input state to render the next character.

Case 5:
BENGALI VOWEL SIGN AU(ৌ)[Unicode: 0x09CC]: This case again behaves similar to the previous case, the only difference being in the second part of the implementation, wherein if the next input vowel is Au, then the system renders the output as কৌ.

The above mentioned five cases are the main priority for the implementation of the Visual Keyboard Layout. The other dependent vowels in the Bengali language are mostly either post base, above base or below base matras which are already implemented by the present input layouts, hence they need not be reworked.

Project Progress:

The new input system is based on the bn-probhat keyboard input layout, as it provides the most comprehensive key mapping for the Bengali language. The essential key mappings in the visual keyboard layout have been kept the same as the Probhat input system based on the m17n database.
For the purpose of the actual keyboard layout implementation the some research work was done on the specifics of the m17n library and database.

The actual implementation of the Project can be done in two ways:-

  1. Combination Based Input System: In this method the combinations for various input sequences for all three pre-base matras can be defined in the mim file itself.
  2. Condition Based Input System: This method uses a logical condition based approach, whereby conditional logic can be defined in the mim file itself based on a logical algorithm.

Pros and Cons of Combination-Based Implementation Method:

Pros:-
1. Since this a direct input mapping, instead of a logical implementation for input mapping, hence I suspect that performance wise this layout may be faster than the former.
2. As this is a combination of one-to-one and two-to-one mapping, this is also a fairly simple implementation to understand.
3. Inherently takes care of a lot of constraint checking, as this kind of input mapping directly overrides any possible conflicts.

Cons:-
1. The source code for the input layout tends to be a redundant and lengthy.

Implementation Logic for Condition-Based Method:

The actual work on the input system consists of two essential parts:-

Part 1:

The creation of a new MIM file defining the basic one to one mapping of individual keyboard inputs to individual characters of the Bengali Language. This mapping has been done in accordance with the existing mapping present in the bn-probhat layout for the m17n database.

Part 2:

Defining new logical rules in the form of conditions in the new input system. These logical rules have been narrowed down to the implementation of the five individual cases as listed above.

Pseudo-Code for Conditional Approach:

After some experimentation and research it has been concluded that the base logic required for the keyboard layout to work in the desired way is as follows. The following pseudo-code has been concluded upon keeping in mind an m17n database implementation:-

  1. The input system reads one character at a time.
  2. Check if the input character is either ি [Unicode: 0x09BF], ে [Unicode: 0x09C7] or ৈ [Unicode: 0x09C8]
  3. If the condition in step 2 is true, go to step 4, else carry on with normal rendering and commit.
  4. If input character is ি [Unicode: 0x09BF], perform the following steps:-
    1. Read the next input character and store it in temporary variable, say c.
    2. Check if it is a consonant, i.e.: (c > 0x0994) & (c < 0x09C0). If true go to sub-step 3.
    3. Check for consonant rule exceptions, i.e.: (c != 0x0999) & (c != 0x099E). If true go to sub-step 4.
    4. Join 0x09BF and c.
    5. Commit.
  5. If input character is ৈ [Unicode: 0x09C8], perform the following steps:-
    1. Read the next input character and store it in temporary variable, say c.
    2. Check if it is a consonant, i.e.: (c > 0x0994) & (c < 0x09C0). If true go to sub-step 3.
    3. Check for consonant rule exceptions, i.e.: (c != 0x0999) & (c != 0x099E). If true go to sub-step 4.
    4. Join 0x09C8 and c.
    5. Commit.
  6. If input character is  ে [Unicode: 0x09C7], perform the following steps:-
    1. Read the next input character and store it in a temporary variable, say c1.
    2. Check if it is a consonant, i.e.: (c1 > 0x0994) & (c1 < 0x09C0). If true go to sub-step 3.
    3. Check for consonant rule exceptions, i.e.: (c1 != 0x0999) & (c1 != 0x099E). If true go to sub-step 4.
    4. Join 0x09C7 and c1 and store the combined characters into c2.
    5. Read the next input character and store it in temporary variable, say c3.
    6. Check if (c3 = 0x09BE) | (c3 = 0x09D7), if true go to sub-step 7, else Commit and pass c3 to Initialized state.
    7. Join c2 and c3.
    8. Commit.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top