From pander at users.sourceforge.net Mon Feb 6 10:53:53 2012 From: pander at users.sourceforge.net (Pander) Date: Mon, 06 Feb 2012 10:53:53 +0100 Subject: [tex-hyphen] Improving hyphenation support for compounds Message-ID: <4F2FA331.30101@users.sourceforge.net> Hi all, OpenTaal (https://en.wikipedia.org/wiki/OpenTaal) has started an initiative with several other groups and individuals such as Taco Hoekwater and L?szl? N?meth to support hyphenation patterns for compounds (as used in German, Dutch, Afrikaans, Greek, Russion and more languages) and to fix some other bugs. At the moment we are writing a grant proposal in which the bugs and benefits of fixing them are described. This will be send to potential sponsors requesting for their support to get this work done. Send me a personal message if you would like to have access to this document to provide constructive feedback or simply follow the development of this project. Regards, Pander From jknappen at web.de Mon Feb 6 18:27:59 2012 From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=) Date: Mon, 6 Feb 2012 18:27:59 +0100 (CET) Subject: [tex-hyphen] Improving hyphenation support for compounds In-Reply-To: <4F2FA331.30101@users.sourceforge.net> References: <4F2FA331.30101@users.sourceforge.net> Message-ID: <1090473644.1137753.1328549279604.JavaMail.fmail@mwmweb029> Hi Pander, I have the details not ready (I can look them up, if you don't know them already), but at the EuroTeX conference in Arnhem 1995 there was a talk about hyphenation patterns for compounds generated with patgen (using a correctly hyphenated list of compound words as input). The author claimed good success for finding correct Haupttrennstellen (of course, some words linke german Staubecken or Wachstube are unhyphenatable with this approach). Keep me in touch with your work, it is very interesting. Unfortunately, E-TeX never implemented different weights for hyphenation points. --J?rg Knappen P.S. Another (very different) approach was the Austrian SiSiSi project, still to be found somewhere on CTAN (in the form of documented change files to TeX.web!!!) From wl at gnu.org Mon Feb 6 19:43:19 2012 From: wl at gnu.org (Werner LEMBERG) Date: Mon, 06 Feb 2012 19:43:19 +0100 (CET) Subject: [tex-hyphen] Improving hyphenation support for compounds In-Reply-To: <1090473644.1137753.1328549279604.JavaMail.fmail@mwmweb029> References: <4F2FA331.30101@users.sourceforge.net> <1090473644.1137753.1328549279604.JavaMail.fmail@mwmweb029> Message-ID: <20120206.194319.66279542.wl@gnu.org> > I have the details not ready (I can look them up, if you don't know > them already), but at the EuroTeX conference in Arnhem 1995 there > was a talk about hyphenation patterns for compounds generated with > patgen (using a correctly hyphenated list of compound words as > input). The author claimed good success for finding correct > Haupttrennstellen (of course, some words linke german Staubecken or > Wachstube are unhyphenatable with this approach). People interested in this topic might subscribe to the `trennmuster' liste on dante also: https://lists.dante.de/mailman/listinfo/trennmuster This is a German list discussing mainly issues related to the German word list repository, containing normal and compound hyphenation points: http://repo.or.cz/w/wortliste.git However, English contributions or questions are welcomed also! > P.S. Another (very different) approach was the Austrian SiSiSi > project, still to be found somewhere on CTAN (in the form of > documented change files to TeX.web!!!) Indeed. The main question is whether SiSiSi is flexible enough to handle other languages like Hungarian (which seems to be even more complicated w.r.t. hyphenation). Werner From pander at users.sourceforge.net Thu Feb 9 16:53:05 2012 From: pander at users.sourceforge.net (Pander) Date: Thu, 09 Feb 2012 16:53:05 +0100 Subject: [tex-hyphen] Next action Improved Hyphenation Support: bug reports Message-ID: <4F33EBE1.2070506@users.sourceforge.net> Hi all, I have recieved a lot of positive reactions and extra information concerning the draft Grant Proposal for Improved Hyphenation Support in FOSS. The read-only version can be found here: https://docs.google.com/document/d/160jBGYzvffOr5LfwhpgBqkEtEjjeKAKpH873I6r3Ke4/edit?authkey=CNj33_4D This document is capturing all information needed to bring hyphenation offered via patgen and libhyphen to the next level. For discussion on this matter, please join this mailing list: http://tug.org/mailman/listinfo/tex-hyphen At the moment we need to convert the desire for supporting compounds and collected information in a clear bug report. This needs to include examples of several languages which fail at the moment and should work when the bug is fixed. Probably also some other bug reports need to be written, such as multiple hyphenation options, in order to formalise the description of this problem. We need two to three people with experience in patgen and/or libhyphen to carry out this task. Please send me an email and I will give you edit rights in order to alter the aforementioned document. Regards, Pander From somloieater at gmail.com Fri Feb 17 22:58:51 2012 From: somloieater at gmail.com (David Gardner) Date: Fri, 17 Feb 2012 23:58:51 +0200 Subject: [tex-hyphen] International romani - Message-ID: From somloieater at gmail.com Fri Feb 17 23:27:53 2012 From: somloieater at gmail.com (David Gardner) Date: Sat, 18 Feb 2012 00:27:53 +0200 Subject: [tex-hyphen] International romani - mid-word exclamation mark Message-ID: Greetings hyphenation experts... I'm trying to get xetex to accept some very very alpha hypenation patterns for the internaltional Romani alphabet. It seems to get more complex the more I look at it. This is basicaly a latin script with diacritics to mark stress and unusual vowels, ?, ? for some cross dialect morpho-phonemic skulduggery and (to mark vocatives) mid-word exclamation points. This latter is my current source of pain. The position of the exclamation mark can be before or after the vowel,(e.g. "rrom?l!len", or "gr?st!a"). so sometimes it is a good hyphenation point, but not always. At the moment initex is moaning about non-letters in the hyphenation pattern. I presume I need to change the catcode of ! to solve that - would that be in the hypenation file, or somewhere else? But also, is it possible for XeTeX to count ! as a letter when mid-word and punctuation (so that the spacing rules work) word finally? Would I need to make it active and do something clever, and if it is active, does that break hyphenation? David From jfkthame at googlemail.com Sat Feb 18 14:52:39 2012 From: jfkthame at googlemail.com (Jonathan Kew) Date: Sat, 18 Feb 2012 13:52:39 +0000 Subject: [tex-hyphen] International romani - mid-word exclamation mark In-Reply-To: References: Message-ID: On 17 Feb 2012, at 22:27, David Gardner wrote: > Greetings hyphenation experts... > I'm trying to get xetex to accept some very very alpha hypenation > patterns for the internaltional Romani alphabet. It seems to get more > complex the more I look at it. > This is basicaly a latin script with diacritics to mark stress and > unusual vowels, ?, ? for some cross dialect morpho-phonemic > skulduggery and (to mark vocatives) mid-word exclamation points. This > latter is my current source of pain. > > The position of the exclamation mark can be before or after the > vowel,(e.g. "rrom?l!len", or "gr?st!a"). so sometimes it is a good > hyphenation point, but not always. > > At the moment initex is moaning about non-letters in the hyphenation > pattern. I presume I need to change the catcode of ! to solve that - > would that be in the hypenation file, or somewhere else? No, what's important here is to give it a non-zero \lccode, because (xe)tex wants to be able to (in effect) apply \lowercase to text before looking up hyphens. So try setting \lccode`\!=`\! before loading the patterns. (You can do this in a "loader" file so that you don't clutter the actual patterns with extra tex commands; look at how things are set up in the hyph-utf8 package.) > > But also, is it possible for XeTeX to count ! as a letter when > mid-word and punctuation (so that the spacing rules work) word > finally? Would I need to make it active and do something clever, and > if it is active, does that break hyphenation? I think you can leave its \sfcode (which is what influences the sentence-final spacing) at 3000 or whatever the default is; that's independent of the (\catcode and) \lccode value that is important for whether it's considered part of the "word" (for hyphenation purposes). JK