BetterCJKSupportSpecification

Differences between revisions 1 and 31 (spanning 30 versions)
Revision 1 as of 2005-10-18 07:30:00
Size: 1212
Editor: 61
Comment:
Revision 31 as of 2005-12-13 10:38:36
Size: 10036
Editor: gnulinux
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from DraftSpecification
Line 3: Line 4:
 * Created: [[Date(2005-10-18T07:30:00Z)]] by Freeflying  * Created: [[Date(2005-10-18T07:30:00Z)]] by ["Freeflying"]
Line 6: Line 7:
 * Contributors: Freeflying  * Contributors: ["Freeflying"], AbelCheung4, ["Atie"], JunKobayashi
Line 18: Line 19:
This project (maybe a package) aims to support CJK better . This project aims at improving CJK support in Ubuntu.
Line 20: Line 22:
We'd like to preconfig some file when people choose Chinese,Japenses or Korea. These configure files include :
   * Install a default input method such as scim and make the scim started when user start X.
   * Configure the fonts.conf, then you will get a perfect CJK fonts display .Surely,to obtain this ,we need some fonts package's support like ttf-arphic0uming ,etc.
   * Make user use CJK can display their mp3 file's tag correctly .
As of Breezy, since it has no default input method, normal CJK users can't write their native language on Ubuntu desktop environment. Additionally, the default configuration for various applications and the whole desktop is not so suitable for Asian users and users from certain countries. For example, default desktop font size is simply too small for CJK users, especially Chinese. To improve the experience for these users, some packages need to be patched, while others may need additional configuration.
Line 25: Line 25:
 * Chulsu installed Ubuntu onto his laptop and opened Firefox to see his favorite Korean web forum. Then, he found that first, "why this page looks diffrent than Firefox on Windows of my desktop", second, "how to input Korean to write my reply to the forum" and so on. He started to search Ubuntu Korean wiki and KLDP, and asked his questions. Spending several days, he just knew about how to install font packages, how to configure .fonts.conf under his home directory, how to install and use his Korean Input Method and so on. Now, he is thinking "why Linux is so difficult than Windows, but if all of these installed and configured when I installed Ubuntu that's the way to go."
 * Yeonhee loves to listen her favorite CDs when she is working on OpenOffice for her writing. For her one month trip to Jeju island, she wanted to convert them into MP3, but couldn't find a convert tool from Ubuntu installed on her new laptop. Anyway, she converted her favorite songs with MP3 music tag from her Windows, then opened Rhythmbox on Ubuntu laptop to listen them in case of testing. Now, she is looking at the song names aren't correctly shown up with Korean, "how I gonna go to my trip?"
 * Miyoung wanted to try Linux for her class, but she never used Linux before. Her classmate gave her Unbuntu CD so she was happy. But, on the way back to home, she felt some difficulties for installing the CD to her desktop which already had Windows installed, decided that "OK, I am going to search Ubuntu site for installing, what a great if I can find Korean guides are there, I should learn English..., does this CD support Korean?...".
Line 27: Line 30:
 * Suggest that split this spec into several parts to concentrate and achieve one by one from most important one.
 * Default Input Method for CJK users, FontConfig and Enabling Embolden with patches are the topics we should focus first, in my opinion.
 * Please add your ideas to specify scope of Dapper implementation.
Line 29: Line 35:
 1. Install a default input method such as scim, and start it automatically when user start X. Besides, users should be allowed to have their own individual setting.
  * Useful links here for Korean Input Methods
          http://wiki.kldp.org/wiki.php/KoreanInputMethod
  * Scim shall be the default input method, there are already many IM engines based on scim, and so far its language support is the best. Even Fedora and Mandriva are using it by default.

 1. Better environment variables tuning for CJK users in language-selector, installer etc.
  * [http://bugzilla.ubuntu.com/show_bug.cgi?id=20442 Bug report on language-selector]

 1. Tune fontconfig setting to achieve better CJK fonts display (e.g. more solid font outline, bold type, use bitmap for medium font size etc). Surely,to obtain this, we need some font package's support like ttf-arphic-uming/ukai.
  * Use ttf-arphic-uming/ukai by default, since these are the only package that contain Hong Kong characters for all sizes.
  * Install xfonts-wqy for simplified Chinese installation; ttf-newsung is not needed since it has already been included into uming/ukai.
  * Regarding this fontconfig topic, Korean Linux users are discussing about default font for Distro instead of ttf-baekmuk, currently most favorite font is ttf-unfonts then ttf-alee. ["KoreanTeam"] will provide up-to-date ["BeautifyKoreanFonts"] once decision will be made for a font package.
  * In Japanese case, there is no completely free and high quality Japanese font. Ubuntu uses Kochi font which is DFSG free, but it is inferior of quality to commercial Japanese fonts. This font issue is barrier to expand use of completely free linux distribution in Japan. There is "IPA Font" and "IPA Mona Font" which are high quality and free-for-use Japanese Font. Unfortunately this font is not completely free. The license but demands redistributing with one of softwares specified by the supplier. At present, Japanese Team supplies IPA Font and IPA Mona Font debian packages for Breezy.

 1. CJK users should be able to display their mp3 file ID3 tag correctly. Historically these tagging issue is a mess, everybody is using her own legacy encoding for mp3 tag because there is no support for non-western languages until very recent ID3 tag specification.
  * For applications which make use of GStreamer, setting GST_ID3_TAG_ENCODING can be an internim solution. There are more discussions on ["UTFEightCurrentProblems"].

 1. (?) Allow users to read/write CJK under console.
  * Or when this is impossible, change $LANGUAGE to C automatically so that users won't see lots of junk on console.

 1. Better support of CJK fonts in OpenOffice.org.
  * Firefly has a large patch which makes OOo2 much better for CJK.
          http://opendesktop.org.tw/modules/mydownloads/visit.php?cid=20&lid=18
  * Bold patching on OOo2 in progress http://www.openoffice.org/issues/show_bug.cgi?id=18285
 

 1. Configure firefox for print CJK correctly
  * This can probably be done in per-language basis, in firefox language packs.

 1. Not only from Firefox, printing support for CJK users also affects these :
  * printing man pages from X terminals
  * printing from xpdf-korean, groff, a2ps and gnome-u2ps : Check hard-coded font names
  * Should able to convert installation guide to PS/PDF for CJK languages

 1. Enable embolden font by default for CJK users
  * Debian unstable has freetype 2.1.10 - '''Dapper has this now (2005/11/11), please take care of next two items below.'''
  * Build xft2, fontconfig, pango and cairo2 with embolden enabled
  * This bug found on Debian's freetype 2.1.10 package, of course same with Dapper's :
       a. words in sentence are individually displayed right-upward like as several slopes. This often happens in bigger size like as web page heading than smaller, and in Konqueror and Opera than Firefox. But, a Gentoo user who had compiled xorg-x11-7.0 rc1 showed much nicer screen. Please see the screenshots in this link. http://bbs.kldp.org/viewtopic.php?t=65304&highlight= Also you might catch rendering quality by Akito's patch (top) and "embolden" (bottom) from this screenshot. Top one is much better. http://bbs.kldp.org/download.php?id=5319
       a. Here is another screenshot which can show the embolden rendering problem on Konqueror(3.5.0-0ubuntu1 + fontconfig2.3.2-1.1ubuntu1). http://bbs.kldp.org/viewtopic.php?p=336668#336668
       a. Received this link, http://lists.freebsd.org/pipermail/freebsd-gnome/2005-July/011838.html In the patch, M_Y is redefined which may solve this problem. (Launchpad #5560)
       a. Had tested Fedora's patches (https://www.redhat.com/archives/fedora-cvs-commits/2005-October/msg00281.html) with dapper's freetype (2.1.10-1), please take a look at the #5560.
Line 36: Line 84:
=== Packages affected ===
input methods:

font packages:
 * As of 2005-12-08, ttf-arphic-uming/ukai packages are moved to main, and original Arphic fonts are obsoleted.

freetype:

fontconfig:

firefox:
 * For printing issues, it is fixed for Firefox 1.5. (c.f. https://bugzilla.mozilla.org/show_bug.cgi?id=190031)

openoffice.org ('''done'''):
 * OOo2.1pre (src680-m137) included Firefly's patches - No extra patches required
          By OOo2.0.0m143-0ubuntu2 : This is done as you see here http://bbs.kldp.org/viewtopic.php?p=335943#335943

im-switch:
 * Improve automatic configuration; currently ONLY those who know this package exists and ONLY those who understand the ins and outs can configure input method settings.

language-selector:
 * It should set appropriate environment variables like $LANGUAGE and $LANG according to real life usage, and not just dummy settings. For example, Hong Kong people are using Taiwan translation mostly, but they may have their own; thus the correct setting is {{{LANGUAGE=zh_HK:zh_TW}}}.
 * Add a variable, say $CONSOLE_NOT_LOCALIZED, and define it for each language. In particular, set it to "yes" for all CJK languages, so that during bash startup it could redefine $LANGUAGE to C under console. (and console ONLY!)

rhythmbox:

totem-xine:

mplayer:

scim :

 * scim-hangul : Asked including newer scim-hangul 0.2.1 for less bugs. (Launchpad #5534)

gc-common(?) : /usr/share/defoma/scripts/gs.defoma의 106 ~ 109 lines
  if ($c eq 'truetype-cjk') {
      '''# FIXME: need to support the sub font id for the collection.'''
       print FFFF '/', $Id->{0}->[$i], ' << /FileType /TrueType /Path (', $f, ') /SubfontID ', '0', ' /CSI [(', $h[6], ') ', $hh{$h[6]}, "] >> ;\n";
  }
  Exprienced that 108 line was complained with "use of uninitialized value ... /var/lib/defoma/scripts/gs.defoma"

Summary

This project aims at improving CJK support in Ubuntu.

Rationale

As of Breezy, since it has no default input method, normal CJK users can't write their native language on Ubuntu desktop environment. Additionally, the default configuration for various applications and the whole desktop is not so suitable for Asian users and users from certain countries. For example, default desktop font size is simply too small for CJK users, especially Chinese. To improve the experience for these users, some packages need to be patched, while others may need additional configuration.

Use cases

  • Chulsu installed Ubuntu onto his laptop and opened Firefox to see his favorite Korean web forum. Then, he found that first, "why this page looks diffrent than Firefox on Windows of my desktop", second, "how to input Korean to write my reply to the forum" and so on. He started to search Ubuntu Korean wiki and KLDP, and asked his questions. Spending several days, he just knew about how to install font packages, how to configure .fonts.conf under his home directory, how to install and use his Korean Input Method and so on. Now, he is thinking "why Linux is so difficult than Windows, but if all of these installed and configured when I installed Ubuntu that's the way to go."
  • Yeonhee loves to listen her favorite CDs when she is working on OpenOffice for her writing. For her one month trip to Jeju island, she wanted to convert them into MP3, but couldn't find a convert tool from Ubuntu installed on her new laptop. Anyway, she converted her favorite songs with MP3 music tag from her Windows, then opened Rhythmbox on Ubuntu laptop to listen them in case of testing. Now, she is looking at the song names aren't correctly shown up with Korean, "how I gonna go to my trip?"

  • Miyoung wanted to try Linux for her class, but she never used Linux before. Her classmate gave her Unbuntu CD so she was happy. But, on the way back to home, she felt some difficulties for installing the CD to her desktop which already had Windows installed, decided that "OK, I am going to search Ubuntu site for installing, what a great if I can find Korean guides are there, I should learn English..., does this CD support Korean?...".

Scope

  • Suggest that split this spec into several parts to concentrate and achieve one by one from most important one.
  • Default Input Method for CJK users, FontConfig and Enabling Embolden with patches are the topics we should focus first, in my opinion.

  • Please add your ideas to specify scope of Dapper implementation.

Design

  1. Install a default input method such as scim, and start it automatically when user start X. Besides, users should be allowed to have their own individual setting.
    • Useful links here for Korean Input Methods
    • Scim shall be the default input method, there are already many IM engines based on scim, and so far its language support is the best. Even Fedora and Mandriva are using it by default.
  2. Better environment variables tuning for CJK users in language-selector, installer etc.
  3. Tune fontconfig setting to achieve better CJK fonts display (e.g. more solid font outline, bold type, use bitmap for medium font size etc). Surely,to obtain this, we need some font package's support like ttf-arphic-uming/ukai.
    • Use ttf-arphic-uming/ukai by default, since these are the only package that contain Hong Kong characters for all sizes.
    • Install xfonts-wqy for simplified Chinese installation; ttf-newsung is not needed since it has already been included into uming/ukai.
    • Regarding this fontconfig topic, Korean Linux users are discussing about default font for Distro instead of ttf-baekmuk, currently most favorite font is ttf-unfonts then ttf-alee. ["KoreanTeam"] will provide up-to-date ["BeautifyKoreanFonts"] once decision will be made for a font package.

    • In Japanese case, there is no completely free and high quality Japanese font. Ubuntu uses Kochi font which is DFSG free, but it is inferior of quality to commercial Japanese fonts. This font issue is barrier to expand use of completely free linux distribution in Japan. There is "IPA Font" and "IPA Mona Font" which are high quality and free-for-use Japanese Font. Unfortunately this font is not completely free. The license but demands redistributing with one of softwares specified by the supplier. At present, Japanese Team supplies IPA Font and IPA Mona Font debian packages for Breezy.
  4. CJK users should be able to display their mp3 file ID3 tag correctly. Historically these tagging issue is a mess, everybody is using her own legacy encoding for mp3 tag because there is no support for non-western languages until very recent ID3 tag specification.
    • For applications which make use of GStreamer, setting GST_ID3_TAG_ENCODING can be an internim solution. There are more discussions on ["UTFEightCurrentProblems"].
  5. (?) Allow users to read/write CJK under console.
    • Or when this is impossible, change $LANGUAGE to C automatically so that users won't see lots of junk on console.
  6. Better support of CJK fonts in OpenOffice.org.

  7. Configure firefox for print CJK correctly
    • This can probably be done in per-language basis, in firefox language packs.
  8. Not only from Firefox, printing support for CJK users also affects these :
    • printing man pages from X terminals
    • printing from xpdf-korean, groff, a2ps and gnome-u2ps : Check hard-coded font names
    • Should able to convert installation guide to PS/PDF for CJK languages
  9. Enable embolden font by default for CJK users

Implementation

Code

Data preservation and migration

Packages affected

input methods:

font packages:

  • As of 2005-12-08, ttf-arphic-uming/ukai packages are moved to main, and original Arphic fonts are obsoleted.

freetype:

fontconfig:

firefox:

openoffice.org (done):

im-switch:

  • Improve automatic configuration; currently ONLY those who know this package exists and ONLY those who understand the ins and outs can configure input method settings.

language-selector:

  • It should set appropriate environment variables like $LANGUAGE and $LANG according to real life usage, and not just dummy settings. For example, Hong Kong people are using Taiwan translation mostly, but they may have their own; thus the correct setting is LANGUAGE=zh_HK:zh_TW.

  • Add a variable, say $CONSOLE_NOT_LOCALIZED, and define it for each language. In particular, set it to "yes" for all CJK languages, so that during bash startup it could redefine $LANGUAGE to C under console. (and console ONLY!)

rhythmbox:

totem-xine:

mplayer:

scim :

  • scim-hangul : Asked including newer scim-hangul 0.2.1 for less bugs. (Launchpad #5534)

gc-common(?) : /usr/share/defoma/scripts/gs.defoma의 106 ~ 109 lines

  • if ($c eq 'truetype-cjk') {
    • # FIXME: need to support the sub font id for the collection.

      • print FFFF '/', $Id->{0}->[$i], ' << /FileType /TrueType /Path (', $f, ') /SubfontID ', '0', ' /CSI [(', $h[6], ') ', $hh{$h[6]}, "] >> ;\n";

    } Exprienced that 108 line was complained with "use of uninitialized value ... /var/lib/defoma/scripts/gs.defoma"

Outstanding issues

BoF agenda and discussion

BetterCJKSupportSpecification (last edited 2008-08-06 16:22:10 by localhost)