International IME Design Specification

Drawing number:2205,201/DS
Issue:2
AMR:5108
Status:Released
Author:Kevin Bracey
Date:22nd September 2005

Contents

  1. Introduction
  2. Programmer's Interface
  3. Japanese IME Notes
  4. Product Organisation
  5. References
  6. Glossary

1. Introduction

1.1 Overview

This document describes the programmer interface to the Japanese IME. The interface is designed to be generic enough to allow an application that supports the interface to also drive Korean, Chinese and similar IMEs.

The interface describes only the IME back-end. For general use with all applications, a front-end will be required to make it easier for applications to use it, and to handle tasks that are IME-unaware.

1.2 Outstanding issues

None.

2. Programmer's Interface

2.1 Introduction

All IME back-ends will function exclusively in Unicode/UTF-8. They are not required to understand RISC OS 8-bit alphabets. If an application wishes to drive the IME back-end when the system alphabet isn't UTF-8, it will need to convert incoming keypresses to the UCS (using Service_International 8), and handle the UTF-8 output of the IME appropriately. All currently planned IMEs (Japanese, Korean, Chinese) will require UTF-8 to function anyway.

IMEs will be controlled via the InternationalIME module, which functions as an SWI dispatcher. Applications use its IME_ SWIs, and it passes the calls on to the currently selected IME.

There is no registration of IMEs - the territory for a given country is expected to know about which IME to use, and the InternationalIME module uses this information to dispatch the SWIs.

Each IME should have its own SWI chunk, providing the same set of 32 SWIs. The last 32 SWIs in the chunk are free for IME-specific use.

2.2 Territory Manager SWIs

Territory_IME
(SWI &43062)

Returns the SWI chunk of the IME that should be used for the given territory
On entry
R0 = territory number, or -1 to use current territory
On exit
R0 = SWI chunk, or 0 if no IME for this territory.
Use
This new territory entry point allows a territory to specify its IME. The InternationalIME module will use this to select the correct default IME for a territory.

2.3 InternationalIME SWIs

IME_SelectIME
(SWI &524E0)

Select an IME
On entry
R0 = territory number, or -1 to use current territory, or 0 to disable, or >= &100 to select by a specific SWI chunk.
On exit
R0 = SWI chunk of IME selected, or 0 for none
Use
This may be used by a front-end control application to select a new IME. If the IME is changed, it should also issue Message_IMEChanged to inform applications.

2.4 Wimp messages

Message_IMEChanged (&524C0)

This message should be broadcast by any program that changes the current IME selection. Any application receiving this message that is currently displaying a composition string or candidate list should remove them.

Message_DeviceClaim (11)

See PRM pages 3-247 to 3-249. Because there is only one IME in the system, applications need to negotiate their use of it. If one application is about to start feeding data into the IME, anything else with a composition string active must be told. Therefore, if you are not the current claimant of the IME, you must issue Message_DeviceClaim, with major device number &1015, minor device number 0.

On receiving this, the current claimant should call IME_Cancel to ensure that the IME knows that that session is finished. It is not normally acceptable to respond with Message_DeviceInUse, as this will render the user unable to use the IME in the other application.

2.5 IME module SWIs

The following SWIs are provided by individual IME modules. The InternationalIME module calls these SWIs by calling the equivalent SWI in the IME's SWI chunk.

IME_ProcessInput
SWI (&524C0)

Ask the IME to handle an incoming keypress
On entry
R0 = flags
On exit
R0 = flags
R1 -> output string, in UTF-8, 0 terminated (0 if none)
R2 -> current display string, in UTF-8, 0 terminated (0 if none)
R3 -> current attribute array (1 byte per character of R2)
R4 = caret position (characters into R2), -1 to hide caret.
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
An IME aware application should pass every keypress received via the Wimp Key_Pressed event to this SWI. If an error is returned, the application should process the key normally.

Any IME aware application will need to be character-set aware. It should assume that text files, Wimp display fields, keyboard input, etc, are in the current alphabet (as read by OS_Byte 70). Applications that use the IME will have to convert text to and from UCS/UTF-8 to pass to the IME, if the current alphabet isn't UTF-8. This might be the case, for example, on a UK machine using Latin1, but with a user typing Japanese into a Unicode-capable browser.

If the system alphabet is UTF-8, you may receive multiple Key_Pressed events per keypress. For example a pound sign will arrive as &C2 &A3. F1 would arrive as &181. You may pass these codes directly to IME_ProcessInput. Alternatively, you may wish to compose them back into UCS codes. Pound would then be &000000A3. Codes &100-&1FF representing function keys should be turned into codes &80000000-&800000FF, outside of the UCS range. Thus F1 would be &80000081.

With each keypress, the IME may request that you do any or all of the following four things in order, by altering the R0 bits on exit:

  1. Insert a string at your current cursor position.
  2. Process the keypress normally (eg act on a function key, or insert the letter at the cursor position).
  3. Display a composition string at your current cursor position, inserting the caret at a fixed point inside this string.
  4. Show a candidate list.

If you are unable to handle output (if the caret is invisible, say), you should make this call with bit 2 of R0 - this will stop the IME producing any output. You must still pass keypresses into the IME so it can pick up on hotkeys.

Here are some specific examples. Let's say that your text editing area contains:

Hello |mum
with the caret position marked by the "|". If the Japanese IME is active, and in Hiragana mode, the following keypresses will have the following effects.

KeyNew displayR0R1R2R3R4
kHello k|mum 01110"k"11
aHello |mum 01110"か"11
nHello かn|mum 01110"かn"1,12
jHello かんj|mum 01110"かんj"1,1,13
iHello かんじ|mum 01110"かんじ"1,1,13
変換Hello 漢字mum 01110"漢字"3,3-1
EnterHello 漢字|mum 1011"漢字"000

Sometimes a call may result both in output, together with a new display string. For example, in a Korean IME:

KeyNew displayR0R1R2R3R4
01110"ㄱ"3-1
01110"가"3-1
01110"간"3-1
1111"가""나"3-1

The styles of the display string are specified by the array pointed to by R3. This contains one byte per character (so a six-byte display string consisting of 2 Japanese characters would only have a 2-byte attribute array). Currently defined styles are:

The caret may or may not need to be shown in the middle of the display string; this is determined by the value of R4 on exit.

If bit 4 of R0 is set, the IME is requesting that you show a candidate list. Details of what should be shown in the list should be obtained using IME_GetCandidateListInfo.

Bits 8-11 of R0 can be used to optimise redraws.

IME_Cancel
SWI (&524C1)

Tell the IME to cancel any current composition
On entry
R0 = flags (should be 0)
On exit
R0-R4 as per IME_ProcessInput, except there can be no display string, and there is no key to claim or pass through
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
On receipt of a Message_DeviceClaim (see above), or if the user moves the cursor using the mouse, or closes the input window, etc, you should call IME_Cancel to terminate any current composition. Otherwise, when the user starts typing in the new location, the previous display string will suddenly appear at the caret position unexpectedly.

You don't need to worry about caret movement via the cursor keys, as this will be spotted and dealt with appropriately via the IME. For example the IME might claim left and right cursor keys to move within the display string, but on receipt of a down cursor key output the current display string then tell the application to process that cursor key normally.

Depending on the reason for this call, you may or may not want to accept an output string. A Korean IME, for example, after the "가나" output shown above, would attempt to output the final "나" at this stage. You might accept that if the caret was just about to move because of a mouse click or a Message_DeviceClaim, but not if the window was closing.

IME_GetCandidateListInfo
SWI (&524C2)

Find out what to display in the candidate list
On entry
R0 = flags (should be 0)
On exit
R0 = flags
R1 -> title for list
R2 = total candidates
R3 = maximum candidates per page
R4 = candidates on this page
R5 = number of first candidate on this page (1..R2)
R6 = entry to highlight (1..R4, 0 if none)
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
During composition, the IME may wish you to bring up a candidate list. In the Japanese example earlier, a second press of the 変換 key would bring up another candidate, and a third would bring up another candidate together with a list showing another 6 possibilities, for quick access.

When a candidate list is up, the main window should keep the input focus and pass keys through to the IME as usual. The IME will inform the application of any changes to the candidate list. IMEs are designed such that the same keys make sense whether or not the list is being displayed. A fourth press of 変換 would change the candidate again, for example, and move the list highlight down to the fourth item.

A candidate list normally appears as a vertical list, near the caret position. It is designed for keyboard control only - switching between the keyboard and mouse is not convenient.

It should be R3 entries high, with the entries visibly numbered 1-R3. Entries 1-R4 should be filled in with the return from IME_GetCandidateListInfo. Entry R6 should be highlighted.

Related SWIs
IME_GetCandidateListEntry

IME_GetCandidateListEntry
SWI (&524C3)

Return the text for an individual candidate list entry
On entry
R0 = flags (should be 0)
R1 = entry number on current page (see IME_GetCandidateListInfo)
On exit
R1 -> UTF-8 text for entry
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
After calling IME_GetCandidateListInfo, you should call this SWI to find the text for each entry.
Related SWIs
IME_GetCandidateListInfo

IME_Configure
SWI (&524C4)

Configure various aspects of the IME's behaviour.
On entry
R0 = reason code
Other registers dependent on reason code
On exit
R0 preserved
Other registers dependent on reason code
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
This call allows you to configure various aspects of the IME's behaviour. If an IME composition is in progress, some of these calls may return an error. To be on the safe side, call IME_Cancel first.

IME_Configure 0
SWI (&524C4)

Select a new dictionary
On entry
R0 = 0
R1 -> filename
On exit
All registers preserved
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
This call allows you to select a new dictionary for the IME. The format of the dictionary is IME-specific. This call should only be made by a front-end IME control application. If the dictionary is in ResourceFS, the IME should make an effort to access it directly rather than loading it into RAM.

IME_Configure 1
SWI (&524C4)

Adjust or read the IME's status flags
On entry
R0 = 1
R1 = flags EOR value
R2 = flags AND value
On exit
R0 preserved
R1 = old flags value All registers preserved
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
The IME's flags are set by this call: New value = (Old Value AND R2) EOR R1. Most flags are IME-specific, but different IMEs should try to share as many bits as possible. Many settings will be altered automatically by special keypresses.

The IME flag bits are as follows:

0IME enabled. If disabled, all calls to IME_ProcessInput will say to process the key normally (except that the IME may claim the keypress that turns the IME back on).
1Kana mode (roman keypresses composed into Kana). May be meaningless on non-Japanese IMEs.
20=Hiragana, 1=Katakana. Should be 0 if bit 1 is clear. Again, probably Japanese specific.
30=Halfwidth, 1=Fullwidth. Some territories have the concept of "fullwidth" characters.

Related SWIs
None

3. Japanese IME notes

The Japanese IME will be provided by the JapanIME. This has SWI chunk &52500 with SWI name prefix "JapanIME_".

The IME engine works internally in Shift-JIS encoding. As a result, the RISC OS port will be required to translate output text to UTF-8. It further expects keypresses to be from the single-byte Shift-JIS ranges (the JIS X 0201 Roman and Katakana ranges), and has ideas about which kana keys are on which Latin keys. This must be worked around, as the keyboard driver will be passing up Unicode codes.

This translation imposes some limits on our implementation, With the IME disabled, the user will be able to press the £ on the Kana layer, and it will appear. Unfortunately, the IME will not understand this as a possible keypress, so it will be forced to ignore it or pass it through (I suggest that it be passed through if no composition is in progress, and ignore it otherwise). This is unfortunate, but Japanese users will not be expecting that key to work anyway, for exactly that reason...

The IME must interact with the keyboard driver to ensure that, say, Kana Lock is turned off when the IME is placed into Roman mode.

4. Product Organisation

The Japanese IME back-end will be provided by a new module, JapanIME.

The IME despatch mechanism will be provided by a new InternationalIME module.

The Japanese territory module will specify JapanIME as its default IME.

The IME dictionary, stored outside the main 8M of ROM, will be registered in ResourceFS by the ExtraResources module.

5. References

RISC OS 3 PRM, chapters 70 and 71 (International module and Territory module).

6. Glossary

FEP Front End Processor - another name for IME(qv)
Kanji The Japanese ideographic characters
Hiragana The Japanese phonetic alphabet used for Japanese words
IME Input Method Engine
Kana Katakana or Hiragana
Katakana The Japanese phonetic alphabet used for foreign words, or for emphasis
Romaji The Japanese name for the Latin alphabet
UTF-8 UCS Tranformation Format 8 - the standard multibyte encoding of UCS data.

7. History

IssueDateAuthorDescription
A06-Jul-1998Kevin BraceyFirst draft
B18-Jul-1998Kevin BraceyUpdated with James Byrne and Bob Pollard's comments
C25-Jul-1998Kevin BraceyMore from Bob. Device claim protocol added
114-Sep-1998Kevin BraceyUpdated after review
222-Sep-2005TematicEdited for release