Back in 2016 I started a mini-project to play with neural networks[1]. I concluded that the approach I used didn't give me enough control over the procedural generation. I wanted to revisit the project and focus on word manipulation (the problem I wanted to solve) instead of neural networks (the tool I was learning).
1 Motivating example#
Last time I was looking for ways to modify the pronunciation of words and get new spellings. The way a language sounds and looks can be correlated with geography and climate[2] (altitude, temperature, humidity) and also with shapes[3] (round, sharp). It may be useful to make a procedural world generator also modify languages to match the characteristics of the location and culture of the people who speak those languages. I'm still interested in that larger problem but I also want to have some control over spelling, whether the pronunication has changed or not. I decided to focus on spelling alone for now. Some examples:
- Maybe the elves in the west write the
K S
sound with the letters ks. For example the word tax would be written taks. - Maybe the giants in the north drop the final
T
sound in a word, and write it with an apostrophe. For example the word bucket would be written bucke'. - Maybe the goblins change the final
Z
sound to a z spelling, because they heard it makes them look cool. For example the word words would be written wordz.
For these spelling changes the simplest thing to do would be to use regexp string substitution.
- x → ks would turn tax into taks. But it would also turn xerox into kseroks; I want it to become xeroks. This means I need to know which x sounds are
K S
and which areZ
. It also misses monarchs which should become monarks because the ch in that word is pronouncedK
. - t$ → ' would turn bucket into bucke' . But it also turns valet into vale', which may or may not be what I wanted.
- s → z would turn words into wordz. But it also turns illinois (pronounced without the final s) into illinoiz which is not what the goblins expected.
To make these substitutions properly I need to know the sounds corresponding to the letters. Last time I trained a neural network to turn spelling (graphemes) into sounds (phonemes). It spelled some things well: kirk → cherk, kurch (instead of chirk, kirch) and other things poorly: clerc, instead of clerk or klurk. However it didn't tell me which letters corresponded to each sound. It was a black box. I wasn't able to figure out how to override spelling.
2 Letter-phoneme alignment#
I wanted to find something that told me which letters corresponded to each sound. It's not going to be perfect, as English spelling is pretty weird and not necessarily decomposible in this way. I found a paper by Jiampojamarn and Kondrak: Letter-Phoneme Alignment: An Exploration[4].
Letter-phoneme alignment decides which letters correspond to each phoneme. An example in their paper is the word accuse, which maps to sounds AH K Y UW Z
. Letter-phoneme alignment would tell us:
{{g}} |
{{p}} |
(I'm using the ARPAbet[5] phonemes from cmudict[6] instead of IPA[7])
3 m2m-aligner#
One reason I picked this paper instead of others is that it comes with code[8]! Let's try it:
cd ~/Projects/src git clone --depth 1 \ https://github.com/letter-to-phoneme/m2m-aligner.git cd m2m-aligner make
g++ -O3 -ffast-math -funroll-all-loops -fpeel-loops -ftracer -funswitch-loops -funit-at-a-time -pthread -c -I./tclap-1.2.1/include/ mmAligner.cpp -o mmAligner.o g++ -O3 -ffast-math -funroll-all-loops -fpeel-loops -ftracer -funswitch-loops -funit-at-a-time -pthread -c -I./tclap-1.2.1/include/ mmEM.cpp -o mmEM.o clangclang: : warning: optimization flag '-funroll-all-loops' is not supported [-Wignored-optimization-argument] warning: clang: optimization flag '-funroll-all-loops' is not supported [-Wignored-optimization-argument]warning: optimization flag '-fpeel-loops' is not supported [-Wignored-optimization-argument] clang: clang: warning: optimization flag '-fpeel-loops' is not supported [-Wignored-optimization-argument]warning: clangoptimization flag '-ftracer' is not supported [-Wignored-optimization-argument]: warning: clangoptimization flag '-ftracer' is not supported [-Wignored-optimization-argument]: warning: clang: optimization flag '-funswitch-loops' is not supported [-Wignored-optimization-argument]warning : optimization flag '-funswitch-loops' is not supported [-Wignored-optimization-argument] g++ -O3 -ffast-math -funroll-all-loops -fpeel-loops -ftracer -funswitch-loops -funit-at-a-time -pthread mmAligner.o mmEM.o -o m2m-aligner -lgcc_s -lpthread -lc -lm ld: library not found for -lgcc_s clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [m2m-aligner] Error 1
Ok, that didn't work. It's written for Linux/gcc and I'm trying to compile on Mac/clang. I looked at the makefile
and wondered if I really needed the linker flag -lgcc_s
. I took it out:
perl -pi -e 's/-lgcc_s//' makefile
make
It compiled. Yay!
I tried the sample file they provided:
./m2m-aligner --delX --maxX 2 --maxY 2 -i toAlignEx
It took a few seconds and produced these files:
-rw-r--r-- 1 amitp wheel 809869 Jun 10 14:21 toAlignEx.m-mAlign.2-2.delX.1-best.conYX.align.model -rw-r--r-- 1 amitp wheel 0 Jun 10 14:21 toAlignEx.m-mAlign.2-2.delX.1-best.conYX.align.err -rw-r--r-- 1 amitp wheel 37213 Jun 10 14:21 toAlignEx.m-mAlign.2-2.delX.1-best.conYX.align
What's in them?
The convYX.align
file has lines like this:
th T 5.12597733532431e-46 th TAY 8.6366493765573e-90 th TH 0.858903460553724 th THAH 1.25326766755916e-16 th THB 5.68427970865339e-85 th THEH 1.85924288138775e-13 th THER 9.67104430410393e-47 th THIH 0.0130481705894507 th THIY 0.00169761557891407 th THR 3.05144787378854e-32
They seem to be probabilities for a particular spelling to correspond to a particular set of phonemes. I haven't investigated further.
The convYX
file (two columns, tab separated) is more useful to me:
b|a|b|y|l|o|n| B|AE|B|AH|L|AA|N| f|e:r|t|i|l|i|z|e|d| F|ER|T|AH|L|AY|Z|_|D| w:r|i|n|k|l|i|n:g| R|IH|NG|K|L|IH|NG| r|e:a|l|i|s|t:s| R|IY|L|AH|S|S| a|s:s|o|c|i|a|t:e| AH|S|OW|SH|IY|EY|T| p|r|i|s|m| P|R|IH|Z|AH:M| s|c|o|t:c|h| S|K|AA|CH|_|
This gives me the correspondence between letters and sounds. You can see that sometimes multiple letters correspond to one sound (e r → ER
), sometimes a letter corresponds to multiple sounds (m → AH M
), and sometimes a letter corresponds to no sound (e → –).
f | e r | t | i | l | i | z | e | d | p | r | i | s | m | |
F | ER | T | AH | L | AY | Z | – | D | P | R | IH | Z | AH M |
Sometimes it does things I'd consider wrong. For example in scotch it seems like tch should correspond to the CH
sound, but it instead assigned tc to CH
and h to –. In associate it seems like the e at the end should be – but instead it's marked as part of T
.
s | c | o | t c | h | a | s s | o | c | i | a | t e | |
S | K | AA | CH | – | AH | S | OW | SH | IY | EY | T |
When running the program you have to tell it the maximum number of tokens it can group together. The sample command line allows 2 letters or 2 phonemes in a group. I tried 3 letters which improved some mappings but made others worse. Another improvement would be to filter out acronyms (see data file[9]) before running m2m-aligner.
It's not perfect but it's pretty good I think.
4 A bigger dictionary#
The sample file included with m2m-aligner has 1134 words. I wanted a larger set. I downloaded cmudict[10]:
curl -O \
http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b
I then converted it into the format needed for m2m-aligner (drop comments, drop multiple pronunciations for a word, drop words with digits and punctuation, drop emphasis markers):
import re f = open('cmudict.txt', 'w') for line in open('cmudict-0.7b', 'r', encoding='latin1'): if not re.match(r'^[a-zA-Z\d\s]+$', line): continue word, phonemes = line.strip().split(' ', 1) if not re.match(r'^[a-zA-Z]+$', word): continue print(' '.join(word.lower()), '\t', re.sub(r'\d\b', '', phonemes.strip()), sep='', file=f)
Before: AMITAA2 M IY1 T
. After: a m i tAA M IY T
.
Now I can run m2m-aligner on this new file, which has 116506 words:
./m2m-aligner --delX --maxX 2 --maxY 2 -i cmudict.txt
It took around ten minutes but now I have alignment data for most common English words. Great! Let's try using it.
(I put the output file here if you want to use it without compiling m2m-aligner yourself.)
5 Modifying spelling#
Let's try the examples at the beginning of the page.
The elves write the K S
sound with the letters ks. Let's pick some words that end in the K S
sound to see how they'd fare:
grep '|K|S|' cmudict.txt.m-mAlign.2-2.delX.1-best.conYX.align \ | sort --random-sort | head
g|r|e:e|k|s| G|R|IY|K|S| a|t:t|a|c:k|s| AH|T|AE|K|S| i|s|a:a|c|s| AY|Z|IH|K|S| u|n|s|u|c|c|e|s:s|f|u|l:l|y| AH|N|S|AH|K|S|EH|S|F|AH|L|IY| t|a|r|m|a|c|s| T|AA|R|M|AE|K|S| b|a|c:k|s|l|a|s:h|e|s| B|AE|K|S|L|AE|SH|AH|Z| d|o:c|k|s|i|d|e| D|AA|K|S|AY|D|_| n|a|r|c|o|t|i|c|s| N|AA|R|K|AA|T|IH|K|S| w:r|e|c:k|s| R|EH|K|S| a:e|r|o|b|a|t|i|c|s| EH|R|AH|B|AE|T|IH|K|S| …
To make the change to the spelling, we find the letters corresponding to |K|S|
and change them to ks. For example, in attacks, |c:k|s| corresponds to the |K|S|
sound. We replace those letters (cks) with the new spelling rule (ks). Some words will keep their spelling (greeks), but some will change:
- attaks
- isaaks
- unsuksessfully
- tarmaks
- bakslashes
- dockside
- narcotiks
- wreks
- aerobatiks
Hey, that works well!
Let's try the giants. They're changing a T
sound, but only at the end of a word:
grep '|T|$' cmudict.txt.m-mAlign.2-2.delX.1-best.conYX.align \ | sort --random-sort | head
a|b|r|i|d:g|e|m|e|n|t| AH|B|R|IH|JH|_|M|AH|N|T| l|i|n|k|e:d| L|IH|NG|K|T| m|a|l|n|o|u:r|i|s:h|e:d| M|AE|L|N|_|ER|IH|SH|T| p|o|d|i|a|t|r|i|s|t| P|AH|D|AY|AH|T|R|IH|S|T| t|r|a|n|s|a|c|t| T|R|AE|N|Z|AE|K|T| h|i|j|a|c:k|e:d| HH|AY|JH|AE|K|T| i|n|d|i|v|i|d|u|a|l|i|s|t| IH|N|D|IH|V|IH|D|UW|AH|L|IH|S|T| …
The giants would write:
- abridgemen'
- link'
- malnourish'
- podiatris'
- transac'
- hijack'
- individualis'
I'm not as happy with these results. It is following the rule, but my rule doesn't produce what I expected. Interesting. Maybe I only want to change S T
and N T
. I'm not sure.
Let's try the goblins. They're changing a Z
sound at the end of the word to be spelled with a z.
grep '|Z|$' cmudict.txt.m-mAlign.2-2.delX.1-best.conYX.align \ | sort --random-sort | head
r|e|p|t|i|l|i|a|n|s| R|EH|P|T|IH|L|Y|AH|N|Z| g|a|r|g|o:y|l|e:s| G|AA|R|G|OY|L|Z| p|a:u|s|e|s| P|AO|Z|AH|Z| s|m|o|t:h|e:r|s| S|M|AH|DH|ER|Z| p|i|z|a|z:z| P|IH|Z|AE|Z| w|e:a|k|n|e|s:s|e|s| W|IY|K|N|AH|S|IH|Z| o|r|p:h|a|n|s| AO|R|F|AH|N|Z| …
The goblins would write:
- reptilianz
- gargoylz (dropped the e unnecessarily)
- pausez
- smotherz
- pizaz
- weaknessez
- orphanz
That worked pretty well. A variant of the rule would be to find the letters corresponding to the Z
sound and change s to z in there. That would result in gargoylez instead of gargoylz and pizazz instead of pizaz.
As you can see, the alignment data helps a lot with implementing these kinds of rules.
6 Demo#
For a demo, I only coded the simplest rules: a single phoneme to a single spelling, ignoring context.
(Abraham Lincoln's Gettysburg Address from Wikipedia[11])
A better demo would let you mouse over any of the rules and see what text would be affected. It'd also let you apply generalizations like voiced/unvoiced[12] matches (subscri*b*e vs subscri*p*tion, wol*f* vs wol*v*es, brea*th* vs brea*th*ing, hou*s*e vs hou*s*ing), but that starts getting into sound changes[13], which are harder.
Related:
7 Next steps#
Changing the spelling of words is nice, but it'd be even cooler to change the sounds and get back a new spelling. For example if there's a rule that says all G
sounds turn into K
, I could apply this to graft and get kraft.
- g|r|a|s|p|s| -
G|R|AE|S|P|S|
→ krasps - g|r|a:y|b|e:a|r|d| -
G|R|EY|B|IY|R|D|
→ kreybeard - c|o|n|g|r|u|e|n|c:e| -
K|AO|N|G|R|UW|AH|N|S|
→ conkruence - m|a|g:g|o|t|s| -
M|AE|G|AH|T|S|
→ makots - n|i|c|a:r|a|g|u|a|n|s| -
N|IH|K|ER|AA|G|W|AH|N|Z|
→ nicarakuans
Although these examples look okay, this runs into the problem I had with the original project: the rules of spelling are a mess. For example, I can't just change l in laughter (L AE F T ER
) to d; it produces daughter (D AO T ER
), which no longer has the F
sound! I'd have to spell it dafter.
Here's another example: change the IH
in min to AY
. How should it be spelled? Most likely, i. So if I only change the spelling of that one sound, I end up with … min. Not what I want! I need it to be mine, right?
These two examples show that I can't modify the spelling locally and expect the word to be pronounced as I expect. There are many more examples in this poem[16].
For artificial languages, the book Language Construction Kit recommends a 1:1 mapping of letters to phonemes. That also allows for neat word modification tools like Sound Change Applier[17]. There are also writing systems for phonemes, like Unifon[18] and Shavian[19] and Alphanumerenglish[20].
Since I want to stick to English, it seems that I need a way to solve this problem. A neural network may be the next step. Is it practical to train a neural network to learn the rules of English spelling and pronunciation, especially for new words that I create? The answer seems to be no, from this paper[21]: "Since English spelling conventions are notoriously inconsistent, there is no algorithm for accurately predicting the pronunciation of an out-of-vocabulary word. The current state-of-the-art letter-to-phoneme (L2P) converters are typically reported with 10-30% error rates on dictionary words". So where should I go from here? I could learn more about neural networks to attempt this, but it seems like it probably won't work that well. I think I'll rethink my procedural name generation project.
8 More reading#
- The language rules we know but don't know we know[22] has some examples of how we may not even realize all the things we need to know
- Hagrid's spelling rules[23] are quirky
- Phonics Engine[24] has some potentially useful data
- Phonetisaurus's[25] phoneticize.py can also provide letter-phoneme alignment
- Neural networks can generate new names; see British place names[26] and English village names[27] and Oregon place names[28] and character names[29] and pub names[30].
- Connectionist Temporal Classification[31] neural networks can train data with variable alignment.
- Kondrak's paper Automatic Generation of English Respellings[32] is worth a read.
- Heteronyms[33] are two different words with the same spelling but different pronounciations (many differ by part of speech).