Some extended Latin characters not getting kerned

Post Reply
tiro_j
Posts: 20
Joined: 26 Jan 2021

Some extended Latin characters not getting kerned

Post by tiro_j »

I am working on a font that includes a few extended Latin characters used in indigenous North American orthographies. This is not a complete set for all such languages: just a small subset for some specifically targeted languages. Some of these characters are being kerned by Kern On, but some are not. For some potential pairs, I used the manual auto button in KO to kern them, but there are too many possible pairs to make this viable.

Characters that exhibited either no kerning or minimal kerning include Ɬ Ɂ ɬ ʔ ɂ

I suspect there may be others in Unicode’s extended Latin blocks that are also being overlooked by KO.
Eben Sorkin
Posts: 38
Joined: 27 Apr 2021

Re: Some extended Latin characters not getting kerned

Post by Eben Sorkin »

Are you using the 0k feature?
User avatar
Tim Ahrens
Site Admin
Posts: 424
Joined: 11 Jul 2019

Re: Some extended Latin characters not getting kerned

Post by Tim Ahrens »

Eben Sorkin wrote: 29 Jul 2023 Are you using the 0k feature?
This is always an interesting tool for analysis, to understand whether a pair is not kerned because it is simply not in the system, or whether it was lost because of its low frequency.

You can also have a look at the file pair_frequencies.txt inside KernOnGlyphs3.glyphsPlugin (use “Show package contents”). If a pair or glyph is not to be found there then it is not (yet) covered by Kern On.
User avatar
Tim Ahrens
Site Admin
Posts: 424
Joined: 11 Jul 2019

Re: Some extended Latin characters not getting kerned

Post by Tim Ahrens »

To get a better overview:

Latin Extended-B:

Ɂ U+0241 LATIN CAPITAL LETTER GLOTTAL STOP (This is the uppercase of U+0242 and U+0294)
ɂ U+0242 LATIN SMALL LETTER GLOTTAL STOP (This is the lowercase of U+0241 and U+0294)

IPA Extensions:

ɬ U+026C LATIN SMALL LETTER L WITH BELT
ʔ U+0294 LATIN LETTER GLOTTAL STOP

Latin Extended-D:

U+A7AD LATIN CAPITAL LETTER L WITH BELT (This is the uppercase of U+026C)

Hope I got that right. I will have a closer look and get back to you.
User avatar
Tim Ahrens
Site Admin
Posts: 424
Joined: 11 Jul 2019

Re: Some extended Latin characters not getting kerned

Post by Tim Ahrens »

The pipeline that generates the list of pairs has a list of to-be-kerned characters, which is the results of several whitelists and blacklists. I can confirm that all the mentioned characters are on the list: in principle, they are not excluded.

Some more details on the occurrence in my corpus:

ʔ U+0294 LATIN LETTER GLOTTAL STOP

We have some pairs from Wikipedia and Twitter but they are most likely not the ones we need for the languages you are working on.

ɬ U+026C LATIN SMALL LETTER L WITH BELT

Again, we have a decent number of pairs from Wikipedia but probably not for the languages you need.

Ɂ U+0241 LATIN CAPITAL LETTER GLOTTAL STOP (This is the uppercase of U+0242)
ɂ U+0242 LATIN SMALL LETTER GLOTTAL STOP

These do not occur in my corpus. They are used/mentioned in Wikipedia but practically not in combination with other characters. Therefore, they are not in the list of pairs at all.

U+A7AD LATIN CAPITAL LETTER L WITH BELT (This is the uppercase of U+026C)

Does not occur in my corpus.

This means I need a sufficient corpus at least for ʔ U+0294 and ɬ U+026C (see below). Do you have texts in the languages you are working on? Dictionaries could also work; the relative pair frequencies may not correspond to the real-world usage but at least we could determine the required pairs.
User avatar
Tim Ahrens
Site Admin
Posts: 424
Joined: 11 Jul 2019

Re: Some extended Latin characters not getting kerned

Post by Tim Ahrens »

Further:

When I generate the list of supported pairs (i.e. pair_frequencies.txt) I first crunch large amounts of text from various sources, combine them and then remove what appears to be junk data or does not make sense to me, and then synthesize pairs based on what I consider sensible pairs. “Synthesizing” means adding pairs at a given frequency if they do not exist, or increasing their frequency as applicable. As a last step, the code synthesizes derived pairs, such as deriving single guillemets from doubles and vice versa.

Plus, we get pairs derived from others through capitalization. The corpus contains hardly any all-caps text, which is widely used in practice so it is necessary to generate UC versions of pairs from their LC counterparts (this includes combinations with punctuation, of course). There are lots of tweaks such as not capitalizing µ or letters next to @. Also, at some point I decided to completely exclude IPA from this mechanism, the comment line literally says “# strictly no derived pairs from pairs that contain IPA or UCAS:”. The assumption is that IPA is not generally turned to all-caps in a sense that the semantic meaning stays the same. So, in order not to generate senseless pairs IPA is excluded from derivation-through-capitalization.

Now it seems I am being too harsh and there are actually letters in the IPA block that make sense to be capitalized in the classic all-caps or title case sense. Is this the case for U+0294 and U+026C? Then I can remove them from the set of IPA characters for this purpose, which will lead to the generation of the corresponding uppercase versions U+0241, U+0242 and U+A7AD.
tiro_j
Posts: 20
Joined: 26 Jan 2021

Re: Some extended Latin characters not getting kerned

Post by tiro_j »

No, I do not have language corpora using these characters. Such corpora may exist online, but the languages I am targeting are digitally disadvantaged.

A lot of IPA characters are used in the orthographies of American and African languages. Most of these are identifiable by having case mappings in Unicode, although a few are used only in unicameral alphabets.

As a work around, I created ‘User-set autopairs’ for the combinations that I wanted to ensure would be kerned, but when I clicked the Kern On button to kern the font, most of my 30 user-set autopairs did not get included. I think it should be standard KO behaviour that all user-set autopairs are included in final kerning. If a user has gone to the effort of manually setting autopairs, KO should assume that these pairs are wanted.
tiro_j
Posts: 20
Joined: 26 Jan 2021

Re: Some extended Latin characters not getting kerned

Post by tiro_j »

[If I first set an autopair and then change it to an independent pair, then it gets included in final kerning, but that second step should not be necessary.]
Post Reply