How to Add a Character Set to x3270

x3270 can be configured to use any 8-bit character set. Here's how to add the definition for a new one.

What You Need

A Name

You must pick a name for your character set. We will use the ficticious names west-fredonian and east-fredonian in the examples below.

The IBM Base Character Set, Code Page and CGCSGID

An IBM host uses two 16-bit numbers to describe a character set: These two numbers are combined into a 32-bit number, called the CGCSGID. The base character set is in the upper 16 bits, and the code page is in the lower 16 bits. These are usually expressed in hexadecimal. For example, the default x3270 CGCSGID is 0x02b90025.

You will need to configure x3270 with the CGCSGID for the new character set (or just the code page number, if it uses the default base character set). You will also need a copy of the code page table, which is a printed chart or online image showing the glyph displayed for each printable EBCDIC character. This table will allow you to verify that you have correctly configured the character set in x3270.

Many code pages are documented in the 3174 Character Set Reference, IBM document GA27-3831-06.

For this example, we find that IBM hosts use base character set 697 for West Fredonian (the same as the x3270 default), and code page 9991. For East Fredonian, however, they use base character set 4992 (containing some unique symbols used only in East Fredonia) and code page 9992. This becomes CGCSGID 0x13802708.

The Standard X11 Font Registry and Encoding

x3270 uses standard X11 fonts in its emulator window.  (It can also use its own proprietary fonts, but we will not discuss that here.)  The character set implemented by an X11 font is described by two parameters: a registry and an encoding. These are expressed as strings separated by a dash; for example, the default display character set used by x3270 is ISO Latin-1; its registry and encoding is iso8859-1. Another example is the display character set used for Turkish; it's iso8859-9.

There may be multiple registries and encodings for the same character set; this is often an historical artifact of the standards process.  For example, the Thai character set was originally called tis620.2529-0; it was later standardized to iso8859-11. You may configure x3270 to accept either name.

A good reference for non-English font standards is The ISO 8859 Alphabet Soup.

For this example, we learn that the ISO standard font for East Fredonian is Latin-43, with a registry and encoding of iso8859-43. (West Fredonian does not require a special font.)

An X11 Font

If your character set is not based on ISO Latin-1, you will need to find one or more X11 fonts which display your character set. The character sets must have font properties that match the registry and encoding you found above.  For example, fonts which implement iso8859-1 have a registry font propery value of iso8859 and an encoding font property value of 1.

In this example we search the web and discover that on the East Fredonian X User's Organization webpage, there is a 12-point Latin-43 font, called eastfredonian-12. We install it on our server.

A Translation Table

Here's the fun part. You need a table which translates EBCDIC codes to the display character set. These tables are often available from file translation utilities such as recode. For example, to use recode to obtain a translation table for Russian (IBM code page 880 to the KOI8-R Internet standard), use the command:
recode -h IBM880..KOI8-R
A more difficult situation is when no translation table is available. You will need to construct it by hand, by comparing the EBCDIC code page table to a table of your display character set. You can obtain a table for any X font with the xfd utility.

The format of the translation table is simple.  It is an x3270 resource definition with 256 elements, one for each EBCDIC code. The value of each element is a number, representing the display code to use for the corresponding EBCDIC code. For example, the value of the 194th entry (corresponding to EBCDIC code X'C1') will probably be 0x41, the display code for a capital 'A'.

Here is an example for west-fredonian:

*charset.west-fredonian: #table \n\
      0x20 0x01 0x02 0x03 0x9c 0x09 0x86 0x7f \n\
      0x97 0x8d 0x8e 0x0b 0x0c 0x0d 0x0e 0x0f \n\
      0x10 0x11 0x12 0x13 0x9d 0x85 0x08 0x87 \n\
      0x18 0x19 0x92 0x8f 0x1c 0x1d 0x1e 0x1f \n\
      0x80 0x81 0x82 0x83 0x84 0x0a 0x17 0x1b \n\
      0x88 0x89 0x8a 0x8b 0x8c 0x05 0x06 0x07 \n\
      0x90 0x91 0x16 0x93 0x94 0x95 0x96 0x04 \n\
      0x98 0x99 0x9a 0x9b 0x14 0x15 0x9e 0x1a \n\
      0x20 0xa0 0xa1 0xa2 0xa3 0xa4 0xa5 0xa6 \n\
      0xa7 0xa8 0xd5 0x2e 0x3c 0x28 0x2b 0x7c \n\
      0x26 0xa9 0xa8 0xa9 0xaa 0xab 0xac 0xad \n\
      0xae 0xb1 0x21 0x24 0x2a 0x29 0x3b 0x7e \n\
      0x2d 0x2f 0xaf 0xb0 0xb1 0xb2 0xb3 0xb4 \n\
      0xb5 0xb9 0xcb 0x2c 0x25 0x5f 0x3e 0x3f \n\
      0xdf 0xee 0xb6 0xb7 0xb8 0xb9 0xba 0xbb \n\
      0xbc 0x60 0x3a 0x23 0x40 0x27 0x3d 0x22 \n\
      0xef 0x61 0x62 0x63 0x64 0x65 0x66 0x67 \n\
      0x68 0x69 0xbd 0xbe 0xbf 0xc0 0xc1 0xc2 \n\
      0xfa 0x6a 0x6b 0x6c 0x6d 0x6e 0x6f 0x70 \n\
      0x71 0x72 0xc3 0xc4 0xc5 0xc6 0xc7 0xc8 \n\
      0xfb 0xe5 0x73 0x74 0x75 0x76 0x77 0x78 \n\
      0x79 0x7a 0xc9 0xca 0xcb 0xcc 0xcd 0xce \n\
      0xf0 0xf1 0xf2 0xf3 0xf4 0xf5 0xf6 0xf7 \n\
      0xf8 0xf9 0xcf 0xd0 0xd1 0xd2 0xd3 0xd4 \n\
      0x7b 0x41 0x42 0x43 0x44 0x45 0x46 0x47 \n\
      0x48 0x49 0xe8 0xd5 0xd6 0xd7 0xd8 0xd9 \n\
      0x7d 0x4a 0x4b 0x4c 0x4d 0x4e 0x4f 0x50 \n\
      0x51 0x52 0xda 0xe0 0xe1 0xe2 0xe3 0xe4 \n\
      0x5c 0x9f 0x53 0x54 0x55 0x56 0x57 0x58 \n\
      0x59 0x5a 0xe5 0xe6 0xe7 0xe8 0xe9 0xea \n\
      0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 \n\
      0x38 0x39 0xeb 0xec 0xed 0xfd *0x41 0xff
The #table keyword tells x3270 that the definition is a full 256-element table (other possibilites are covered in the x3270 resource documentation). The \n\ sequence must appear at the end of each line, except for the last; it tells x3270 to continue the entry onto the next line.

If an EBCDIC code does not have a translation, you should specify 0x00 for it in the table. Note also that EBCDIC codes X'00' through X'3F' are ignored.

Duplicate Translations

Your table may contain duplicate translations, i.e., the same value may appear in multiple locations. However, the last instance of the value to appear is the one that will be used to do the reverse translation -- translating keyboard codes to EBCDIC. This can cause some confusion. For example, the value 0x41 ('A') appears twice in the example above, once for EBCDIC code X'C1', and once for EBCDIC code X'FE'. The last-instance rule says that the display code 0x41 (the 'A' key on your keyboard) will be translated to EBCDIC code X'FE'. This is probably not what you want.

The correction for this problem is the asterisk * that appears before the second entry in the example above. The asterisk means that this is a one-way translation, that is, it is used when translating EBCDIC codes for display, but it is not used when translating keyboard input to EBCDIC. Thus the above table would translate a keyboard 0x41 ('A') to EBCDIC X'C1', though it would display both EBCDIC X'C1' and X'FE' as an 'A'.

The print.chars File

print.chars is an EBCDIC file which contains a grid of all printable EBCDIC characters. This grid has the same layout as the IBM code page tables. When it is uploaded to your host (using the x3270 File Transfer facility), you can display it to verify the character set mappings. Be sure to upload it as a binary file -- it is already in EBCDIC.

This file is available in the Examples/ directory of recent x3270 distributions. It is also available from the x3270 web page.

Steps

  1. Gather the parts listed above.
  2. Upload print.chars to your host. Be sure to transfer it as a binary file (it is already in EBCDIC).
  3. Create a file called .x3270pro in your home directory, if there isn't one there already.
  4. In that file, put the following information (gathered above), or add this to the end of the file:

  5. *codepage.east-fredonian: 0x13802708
    *displayCharset.east-fredonian: iso8859-43
    *charset.east-fredonian: #table \n\
        0x20 0x01 0x02 ....
    For West Fredonian, the file charset/west-fredonain would be a bit simpler. The code page does not need to be a full CGCSGID, and the displayCharset need not be specified:
    *codepage.west-fredonian: 9991
    *charset.west-fredonian: #table \n\
        0x20 0x01 0x02 ....
     
  6. Run x3270, using your display font and character set definition:

  7. x3270 -efont eastfredonian-12 -charset east-fredonian
    Or, for West Fredonian (which uses Latin-1):
    x3270 -charset west-fredonian
     
  8. Log on to your host, and display the print.chars file. Compare its appearance to the IBM code page table for your character set.

What Might Go Wrong

Display Font Problems

When x3270 starts, it may complain about your font:
Font does not implement 'iso8859-43'
Assuming ascii-7
This means that the X11 font you used did not have the proper font properties defined. The File->About x3270 pop-up will tell you what properties were actually found. For example:
Display character set: ascii-7 (require iso8859-43, have fso1974-1)
This can be corrected in one of two ways. First, you can add the registry/encoding that the font actually implements to your character set definition.  In our example, this would mean the following change (entries separated by commas mean that any of those listed will be allowed):
*displayCharset.east-fredonian: iso8859-43,fso1974-1
Second, you can correct the font itself by editing the font definition (the .bdf file). To match the example, the .bdf file would need to have the following entries:
CHARSET_REGISTRY "iso8859"
CHARSET_ENCODING "43"

Wrong Characters Displayed

If the wrong glyph is being displayed for one or two characters, you can correct it by modifying the charset.xxx resource in the character set definition file. (Note that the CMS XEDIT program displays a quote character " for EBCDIC code X'FF', and the CMS TYPE command displays it as a blank, so it can be difficult to tell if this character is correct.)

If the glyphs for the basic characters (A-Z, a-z, 0-9 and some basic punctuation) are correctly displayed, but other characters are not, then your X11 font probably doesn't implement the character set you want. Try xfd to verify that the font really includes the right glyphs in the right order (as described by the relevant standard).

If even the most basic characters are incorrectly displayed, then your translation table is probably wrong. Make sure that some of the common EBCDIC codes are correct, e.g., X'40' is a space (0x20), and X'F0' through X'F9' are the digits 0 through 9 (0x30 through 0x39).

Wrong Characters Transmitted

This is a subtle problem, that usually comes from having duplicate entries in the charset.xxx resource. A common symptom is something like this in a host session:
type print chars
DMSOPN026E Invalid character y in fileid type module
Invalid CMS command
I.e., the host seems to be confused by ordinary printable characters. The cause of this problem is multiple EBCDIC codes being displayed as the same character (in this example, 'y' (0x79)). The correct EBCDIC code for 'y' is X'A8'. You can verify this by examining the charset.xxx resource and searching for multiple instances of 0x79. Then you can correct the problem by putting an asterisk * in front of all of the 0x79 entries except for the one that corresponds to EBCDIC X'A8'.