International IME Design Specification
| Drawing number: | 2205,201/DS |
| Issue: | 2 |
| AMR: | 5108 |
| Status: | Released |
| Author: | Kevin Bracey |
| Date: | 22nd September 2005 |
Contents
- Introduction
- Programmer's Interface
- Japanese IME Notes
- Product Organisation
- References
- Glossary
1.1 Overview
This document describes the programmer interface to the Japanese IME. The
interface is designed to be generic enough to allow an application that
supports the interface to also drive Korean, Chinese and similar IMEs.
The interface describes only the IME back-end. For general use with
all applications, a front-end will be required to make it easier for
applications to use it, and to handle tasks that are IME-unaware.
None.
2.1 Introduction
All IME back-ends will function exclusively in Unicode/UTF-8. They are not
required to understand RISC OS 8-bit alphabets. If an application wishes to
drive the IME back-end when the system alphabet isn't UTF-8, it will need to
convert incoming keypresses to the UCS (using Service_International 8), and
handle the UTF-8 output of the IME appropriately. All currently planned IMEs
(Japanese, Korean, Chinese) will require UTF-8 to function anyway.
IMEs will be controlled via the InternationalIME module, which
functions as an SWI dispatcher. Applications use its IME_ SWIs, and it passes
the calls on to the currently selected IME.
There is no registration of IMEs - the territory for a given country is
expected to know about which IME to use, and the InternationalIME module
uses this information to dispatch the SWIs.
Each IME should have its own SWI chunk, providing the same set of 32 SWIs. The
last 32 SWIs in the chunk are free for IME-specific use.
2.2 Territory Manager SWIs
Territory_IME
(SWI &43062)
Returns the SWI chunk of the IME that should be used for the given territory
On entry
R0 = territory number, or -1 to use current territory
On exit
R0 = SWI chunk, or 0 if no IME for this territory.
Use
This new territory entry point allows a territory to specify its IME.
The InternationalIME module will use this to select the correct default IME
for a territory.
2.3 InternationalIME SWIs
IME_SelectIME
(SWI &524E0)
Select an IME
On entry
R0 = territory number, or -1 to use current territory, or 0 to disable, or >= &100 to select by a specific SWI chunk.
On exit
R0 = SWI chunk of IME selected, or 0 for none
Use
This may be used by a front-end control application to select a new IME.
If the IME is changed, it should also issue Message_IMEChanged to inform
applications.
2.4 Wimp messages
Message_IMEChanged (&524C0)
This message should be broadcast by any program that changes the current
IME selection. Any application receiving this message that is currently
displaying a composition string or candidate list should remove them.
Message_DeviceClaim (11)
See PRM pages 3-247 to 3-249. Because there is only one IME in the system,
applications need to negotiate their use of it. If one application is
about to start feeding data into the IME, anything else with a composition
string active must be told. Therefore, if you are not the current claimant
of the IME, you must issue Message_DeviceClaim, with major device number
&1015, minor device number 0.
On receiving this, the current claimant should call IME_Cancel to ensure
that the IME knows that that session is finished. It is not normally
acceptable to respond with Message_DeviceInUse, as this will render the
user unable to use the IME in the other application.
2.5 IME module SWIs
The following SWIs are provided by individual IME modules. The
InternationalIME module calls these SWIs by calling the equivalent SWI in
the IME's SWI chunk.
IME_ProcessInput
SWI (&524C0)
Ask the IME to handle an incoming keypress
On entry
R0 = flags
bit 0 set => input value is a UCS character code; else it's a UTF-8 byte or Wimp function key code (see below)
bit 1 set => input value is a candidate list selection; else it's a key
bit 2 set => unable to handle any output/display at this point
R1 = input byte (dependent on flags)
On exit
R0 = flags
bit 0 set => key claimed; else process this key/byte normally
bit 1 reserved
bit 2 set => IME session active - display string to be shown (R2-4 valid)
bit 3 set => Output produced by this keypress (R1 valid)
bit 4 set => IME candidate list to be shown
bit 8 set => display string has changed
bit 9 set => attribute array has changed
bit 10 set => caret position has changed
bit 11 set => candidate list has changed
R1 -> output string, in UTF-8, 0 terminated (0 if none)
R2 -> current display string, in UTF-8, 0 terminated (0 if none)
R3 -> current attribute array (1 byte per character of R2)
R4 = caret position (characters into R2), -1 to hide caret.
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
An IME aware application should pass every keypress received via the Wimp
Key_Pressed event to this SWI. If an error is returned, the application
should process the key normally.
Any IME aware application will need to be character-set aware. It should assume
that text files, Wimp display fields, keyboard input, etc, are in the
current alphabet (as read by OS_Byte 70). Applications that use the IME
will have to convert text to and from UCS/UTF-8 to pass to the IME, if the
current alphabet isn't UTF-8. This might be the case, for example, on a UK
machine using Latin1, but with a user typing Japanese into a Unicode-capable
browser.
If the system alphabet is UTF-8, you may receive multiple Key_Pressed
events per keypress. For example a pound sign will arrive as &C2 &A3.
F1 would arrive as &181. You may pass these codes directly to
IME_ProcessInput. Alternatively, you may wish to compose them back into
UCS codes. Pound would then be &000000A3. Codes &100-&1FF
representing function keys should be turned into codes
&80000000-&800000FF, outside of the UCS range. Thus F1 would be
&80000081.
With each keypress, the IME may request that you do any or all of the
following four things in order, by altering the R0 bits on exit:
- Insert a string at your current cursor position.
- Process the keypress normally (eg act on a function key, or insert the
letter at the cursor position).
- Display a composition string at your current cursor position, inserting
the caret at a fixed point inside this string.
- Show a candidate list.
If you are unable to handle output (if the caret is invisible, say), you
should make this call with bit 2 of R0 - this will stop the IME producing any
output. You must still pass keypresses into the IME so it can pick up on
hotkeys.
Here are some specific examples. Let's say that your text editing area contains:
Hello |mum
with the caret position marked by the "|".
If the Japanese IME is active,
and in Hiragana mode, the following keypresses will have the following effects.
| Key | New display | R0 | R1 | R2 | R3 | R4 |
| k | Hello k|mum
| 0111 | 0 | "k" | 1 | 1 |
| a | Hello か|mum
| 0111 | 0 | "か" | 1 | 1 |
| n | Hello かn|mum
| 0111 | 0 | "かn" | 1,1 | 2 |
| j | Hello かんj|mum
| 0111 | 0 | "かんj" | 1,1,1 | 3 |
| i | Hello かんじ|mum
| 0111 | 0 | "かんじ" | 1,1,1 | 3 |
| 変換 | Hello 漢字mum
| 0111 | 0 | "漢字" | 3,3 | -1 |
| Enter | Hello 漢字|mum
| 1011 | "漢字" | 0 | 0 | 0 |
Sometimes a call may result both in output, together with a new display string.
For example, in a Korean IME:
| Key | New display | R0 | R1 | R2 | R3 | R4 |
| ㄱ | ㄱ
| 0111 | 0 | "ㄱ" | 3 | -1 |
| ㅏ | 가
| 0111 | 0 | "가" | 3 | -1 |
| ㄴ | 간
| 0111 | 0 | "간" | 3 | -1 |
| ㅏ | 가나
| 1111 | "가" | "나" | 3 | -1 |
The styles of the display string are specified by the array pointed to by
R3. This contains one byte per character (so a six-byte display string
consisting of 2 Japanese characters would only have a 2-byte attribute
array). Currently defined styles are:
0 = normal
1 = dotted/grey underline
2 = solid underline
3 = highlighted
The caret may or may not need to be shown in the middle of the display string;
this is determined by the value of R4 on exit.
If bit 4 of R0 is set, the IME is requesting that you show a candidate list.
Details of what should be shown in the list should be obtained using
IME_GetCandidateListInfo.
Bits 8-11 of R0 can be used to optimise redraws.
IME_Cancel
SWI (&524C1)
Tell the IME to cancel any current composition
On entry
R0 = flags (should be 0)
On exit
R0-R4 as per IME_ProcessInput, except there can be no display string, and
there is no key to claim or pass through
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
On receipt of a Message_DeviceClaim (see above), or if the user moves the
cursor using the mouse, or closes the input window, etc, you should call
IME_Cancel to terminate any current composition. Otherwise, when the user
starts typing in the new location, the previous display string will suddenly
appear at the caret position unexpectedly.
You don't need to worry about caret movement via the cursor keys, as this
will be spotted and dealt with appropriately via the IME. For example the IME
might claim left and right cursor keys to move within the display string, but
on receipt of a down cursor key output the current display string then tell
the application to process that cursor key normally.
Depending on the reason for this call, you may or may not want to accept
an output string. A Korean IME, for example, after the
"가나" output shown above, would attempt to output the final
"나" at this stage. You might accept that if the caret was just
about to move because of a mouse click or a Message_DeviceClaim,
but not if the window was closing.
IME_GetCandidateListInfo
SWI (&524C2)
Find out what to display in the candidate list
On entry
R0 = flags (should be 0)
On exit
R0 = flags
R1 -> title for list
R2 = total candidates
R3 = maximum candidates per page
R4 = candidates on this page
R5 = number of first candidate on this page (1..R2)
R6 = entry to highlight (1..R4, 0 if none)
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
During composition, the IME may wish you to bring up a candidate list. In
the Japanese example earlier, a second press of the 変換 key
would bring up another candidate, and a third would bring up another candidate
together with a list showing another 6 possibilities, for quick access.
When a candidate list is up, the main window should keep the input focus
and pass keys through to the IME as usual. The IME will inform the application
of any changes to the candidate list. IMEs are designed such that
the same keys make sense whether or not the list is being displayed. A fourth
press of 変換 would change the candidate again, for example, and
move the list highlight down to the fourth item.
A candidate list normally appears as a vertical list, near the caret position.
It is designed for keyboard control only - switching between the keyboard
and mouse is not convenient.
It should be R3 entries high, with the entries visibly numbered 1-R3.
Entries 1-R4 should be filled in with the return from IME_GetCandidateListInfo.
Entry R6 should be highlighted.
Related SWIs
IME_GetCandidateListEntry
IME_GetCandidateListEntry
SWI (&524C3)
Return the text for an individual candidate list entry
On entry
R0 = flags (should be 0)
R1 = entry number on current page (see IME_GetCandidateListInfo)
On exit
R1 -> UTF-8 text for entry
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
After calling IME_GetCandidateListInfo, you should call this SWI to
find the text for each entry.
Related SWIs
IME_GetCandidateListInfo
IME_Configure
SWI (&524C4)
Configure various aspects of the IME's behaviour.
On entry
R0 = reason code
Other registers dependent on reason code
On exit
R0 preserved
Other registers dependent on reason code
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
This call allows you to configure various aspects of the IME's behaviour.
If an IME composition is in progress, some of these calls may return an
error. To be on the safe side, call IME_Cancel first.
IME_Configure 0
SWI (&524C4)
Select a new dictionary
On entry
R0 = 0
R1 -> filename
On exit
All registers preserved
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
This call allows you to select a new dictionary for the IME. The format
of the dictionary is IME-specific. This call should only be made by a
front-end IME control application. If the dictionary is in ResourceFS,
the IME should make an effort to access it directly rather than loading
it into RAM.
IME_Configure 1
SWI (&524C4)
Adjust or read the IME's status flags
On entry
R0 = 1
R1 = flags EOR value
R2 = flags AND value
On exit
R0 preserved
R1 = old flags value
All registers preserved
Interrupts
Interrupts are enabled
Fast interrupts are enabled
Processor Mode
Processor is in SVC mode
Re-entrancy
SWI is not re-entrant
Use
The IME's flags are set by this call: New value = (Old Value AND R2) EOR R1.
Most flags are IME-specific, but different IMEs should try to share as
many bits as possible. Many settings will be altered automatically by special
keypresses.
The IME flag bits are as follows:
| 0 | IME enabled. If disabled, all calls to IME_ProcessInput will
say to process the key normally (except that the IME may claim
the keypress that turns the IME back on). |
| 1 | Kana mode (roman keypresses composed into Kana). May be
meaningless on non-Japanese IMEs. |
| 2 | 0=Hiragana, 1=Katakana. Should be 0 if bit 1 is clear.
Again, probably Japanese specific.
|
| 3 | 0=Halfwidth, 1=Fullwidth. Some territories have the concept
of "fullwidth" characters. |
Related SWIs
None
The Japanese IME will be provided by the JapanIME. This has SWI chunk
&52500 with SWI name prefix "JapanIME_".
The IME engine works internally in Shift-JIS encoding. As a result, the RISC OS
port will be required to translate output text to UTF-8. It further expects
keypresses to be from the single-byte Shift-JIS ranges (the JIS X 0201 Roman
and Katakana ranges), and has ideas about which kana keys are on which Latin
keys. This must be worked around, as the keyboard driver will be passing up
Unicode codes.
This translation imposes some limits on our implementation, With the IME
disabled, the user will be able to press the £ on the Kana layer, and
it will appear. Unfortunately, the IME will not understand this as a possible
keypress, so it will be forced to ignore it or pass it through (I suggest
that it be passed through if no composition is in progress, and ignore it
otherwise). This is unfortunate, but Japanese users will not be expecting
that key to work anyway, for exactly that reason...
The IME must interact with the keyboard driver to ensure
that, say, Kana Lock is turned off when the IME is placed into Roman mode.
The Japanese IME back-end will be provided by a new module, JapanIME.
The IME despatch mechanism will be provided by a new InternationalIME module.
The Japanese territory module will specify JapanIME as its default IME.
The IME dictionary, stored outside the main 8M of ROM, will be registered in
ResourceFS by the ExtraResources module.
RISC OS 3 PRM, chapters 70 and 71 (International module and Territory module).
| FEP
| Front End Processor - another name for IME(qv)
|
| Kanji
| The Japanese ideographic characters
|
| Hiragana
| The Japanese phonetic alphabet used for Japanese words
|
| IME
| Input Method Engine
|
| Kana
| Katakana or Hiragana
|
| Katakana
| The Japanese phonetic alphabet used for foreign words, or for emphasis
|
| Romaji
| The Japanese name for the Latin alphabet
|
| UTF-8
| UCS Tranformation Format 8 - the standard multibyte encoding of UCS data.
|
| Issue | Date | Author | Description |
| A | 06-Jul-1998 | Kevin Bracey | First draft |
| B | 18-Jul-1998 | Kevin Bracey | Updated with James Byrne and Bob Pollard's comments |
| C | 25-Jul-1998 | Kevin Bracey | More from Bob. Device claim protocol added |
| 1 | 14-Sep-1998 | Kevin Bracey | Updated after review |
| 2 | 22-Sep-2005 | Tematic | Edited for release |