Thursday, February 03, 2005

That's not a phoneme!!!

I recently read a review (from 2002) by Jon Udell about Fast-Talk, which is now Nexidia. Their product uses what they call phonetic searching in order to mine audio - do keyword search and that sort of thing. This paragraph is from their website, under "The Science of Phonetics":

Phonetics is the systematic study of human speech-sounds. It provides means of describing and classifying virtually all the sounds that can be produced by human vocal tracts. This study is based on “phonemes”, which are the smallest unit of human speech.

All utterances made in the entire world have been catalogued within a 400 phoneme range. The majority of languages use a 40 phoneme range, and the most widely spoken languages fall within an 80 phoneme range. that really a phoneme? First of all, phonemes are the minimally *distinct* units of human speech, not just the minimal units. That would be a "phone" they were thinking about. Plus, a phoneme is completely, utterly, language-dependent. You can't talk about a phoneme without already having a certain language already in mind. This is because different languages draw the boundaries between their phonemes in different ways. English speakers generally can't tell the difference between Hindi's dental and retroflex t's and d's, for example.

I don't know if there really are 400 phones in all the world's languages. I wish there was a single copy of Ian Maddieson's Patterns of Sounds available in Singapore (there aren't any at the main libraries) so I could check.

At first I thought that when they said "the majority of languages use a 40 phoneme range" that each individual language had ~40 phonemes. This is true for English but the usual range is 20-37 (according to this and UPSID) and the mode is at 25. Then I realised that they meant that there are 40 or so sounds that cover a respectable percentage of the world's languages. (Again, does Maddieson - or anyone else - say anything about this?) I guess that's plausible enough, after all vowels (for example) tend to have quite similar distributions around the vowel space don't they?

Elsewhere on Nexidia's site they say that for each language one has to define a "phonetic grammar" - and mention that

[a] phonetic grammar likewise depends upon the natural language in use (particularly the set of phonemes used to represent basic sounds and meanings of the input speech)...

So I *think* they've hit on the right thing any case, their system seems to work quite well whether or not they actually are using phonemes or phone groups or whatever - they're more or less market leaders and all I've seen are good reviews. I just wish they'd get their terminology right.

Well, that's what you get when you hire a bunch of electrical engineers and no linguists, I guess.