Thursday, September 30, 2004

Words, words, just words

I remember the good ol' days in Ling 101 when we watched this series of videos about language. One of the videos was called "American Tongues", but I don't think the bit I'm thinking of now comes from that particular video. It was a discussion about what a word is - and I remember that the only really convincing - yet unsatisfying - reply was that it was the string that went between two spaces.

I remember also looking in amazement at "words" from agglutinative languages like Chukchi which were, basically, a whole sentence - predicate and arguments all mashed up into one - and thinking to myself, "Why don't they just put spaces in between and split them up into "logical" words?

Well, now I "know" there're more complicated ways of figuring out what a word is - by prosody and minimal units and so on - though I don't think there's 100% agreement on a definition - witness the qualifier "sometimes" repeated in the SIL definition.

But there seems to be something psychologically "real" about the notion of a "word", doesn't there? I mean, would any native Turkish speaker object to writing this word (Cekoslavakyalilastiramadiklarimizdan) as a single string? (Of course, there's ambivalence over some words, especially compound words - should they be written as one word (redbrick), a hyphenated word (in which case does it count as one word or two?) (red-brick), or two words separated by a space (red brick)? But there's probably genuine ambiguity there due to different parses of such words by different native speakers.)

We're so used nowadays to the notion that a word is the thing that goes between the spaces that we still turn to that as the easiest, most accurate and yet most artificial definition of the word "word". This article about the history of punctuation (via BoingBoing) reminded me that once upon a time, words weren't separated by spaces at all. They just ran one into the other. So here's my question: when the practice of inserting spaces between words first came into general use, did people have difficulties with the whole idea? Were there ambiguities about what was to be separated? What sorts of mistakes, if any, did people make? Or are words such psychologically real objects that they wouldn't have had any trouble at all?

Tuesday, September 28, 2004

中秋节快乐

Tonight's the 15th day of the 8th lunar month in the Chinese calendar (look out your window - the moon's completely round!) and therefore it's Mid-Autumn Festival, a time for mooncakes, lion dances - and lantern riddles. So, in honour of Mid-Autumn Festival, I'm going to give a little exposition about lantern riddles. In particular, word riddles, or 字谜 (zi4 mi2), my favourite kind of riddle, in which the answer takes the form of a single Chinese character.

Probably the closest equivalent to the Chinese word riddle in English is the cryptic crossword puzzle clue, which can involve (1) breaking a word down into its component morphemes; and (2) breaking a word down into its component letters (e.g. by scrambling them). In Chinese, these would correspond to (1) breaking down a character into its component radicals, and (2) breaking a word down into its component strokes, respectively. One major difference, however, is that cryptic crossword clues always contain a clue to the meaning of a word; this is, in general, not true for (at least one major strain) of Chinese word riddles. There are also Chinese word riddles that are entirely to do with semantics. I don't find these as interesting, so I shan't discuss them here.

Disclaimer: I'm no expert on Chinese riddles, nor even Chinese itself. Probably native speakers of Chinese would have a totally different conception and method of solving word riddles than I. I'll just give a few choice examples to illustrate how ingenious and fun to solve they are.

Let's start with some easy examples of the second type, which involve thinking about characters at the stroke level.

(1) 九点 jiu3 dian3 lit. “nine dot”, or “nine o'clock”

This riddle basically involves noticing that the second word, 点, is actually the name of a stroke in Chinese. The riddle is basically giving you complete instructions about how to write the character. The answer is to write the character 九, then a dot. The answer, therefore, is 丸.
Now for a variation on the first riddle.

(2) 十二点 shi2 er4 dian3 lit. “ten two dot”, or “twelve o'clock”

Any ideas? Again, this riddle is telling you exactly what strokes to write: those that comprise 十, then two dots, which yields 斗dou4 “fight” as the answer.

Here's a third simple one:

(3) 田中 tian2 zhong1 lit. “(in) the middle (中) of a field (田)”

Again, think in terms of characters. The riddle is asking for what's in the middle of the character 田. The answer, therefore, is just a horizontal stroke and a vertical stroke: the numeral 十 shi2 “ten”.

Now for a more challenging riddle.

(4) 夫人何处去 fu1 ren2 he2 chu4 qu4
lit. "Mrs. [wife] (夫人) where (何处) go (去)

Since “Mrs” (i.e. this is a term for addressing a married woman) 夫人 is a word in Chinese, most people take it to be a unit, in which case the riddle seems to mean "where did Mrs. go?". But, if you split them up and read the riddle like this: 夫 (the character), 人 (the character) 何处 (where) 去 (go), it means "in 夫, where did 人 go?" (something like that). So – what happens if we take the character 夫 and make the character 人 leave it? We are left with just the two horizontal strokes, which make up the numeral 二 er4 “two”, which is the answer to the riddle.

[I've corrected the explanation to (4) based on the comment from Anonymous. Thanks for pointing out the mistake!]

Now for a few examples of the first type, which really allow for more variation and ingenuity. Radical thinking, one could say.

(5) 挥手告别 hui1 shou3 gao4 bie2 lit. “wave hand (to) say goodbye”

For this sort of riddle, it's crucial that one is familiar with the system of Chinese radicals, and what they're called. The left hand side of the 挥character is called the “hand” radical. The second character, 手, means “hand”. The last two characters indicate that something is going away, or being left out. Putting them together, the riddle tells us to say goodbye to the “hand” part of the first character, 挥, which leaves us with the right hand side, 军 jun1 “soldier”.

(6) 春和秋都不热 chun1 he2 qiu1 dou1 bu2 ri4
lit. “spring and autumn (are) both not hot”

This is one of my favourites. First, let's look at the clue about heat. What gives off heat? The sun, written 日ri4, and fire, written 火huo3. Now look very closely at the first and third characters: 春 chun1 “spring”, and 秋 qiu1 “autumn”. That's right, the lower radical of the “spring” character means “sun”, and the right hand radical of the “autumn” character means “fire”. Now, it says that they're not supposed to be hot, so we take away their heat sources, so to speak, and combine them in the only way possible to yield 秦 qin2 which is apparently a variety of rice. As well as the name of the first dynasty in China.

(7) 眼看田上长青草
yen3 = eye kan4 = see tian2 = field shang4 = on zhang3 = grow qing1 = green cao3 = grass
“one (the eye) sees green grass growing on the field”

One thing to seize on is the adposition “on”, whose object in this case is the noun before it, “field”. That's a pretty clear indication that we're supposed to take the character for “field” (or some synonym – but in this case it's just “field”) and put something on top of it. There's “green grass growing” on that field. We can't put any of the whole characters for “green”, “grass” or “grow” on top of “field” to make a proper character, but there is a radical called the grass radical, consisting of one line across and two short lines down (the upper bit of 草 “grass”, which refers to any sort of flora). So, we put that on top of the word for field, which gives us 苗. But we still haven't used the first two words of the riddle. At this stage it's fairly clear that all the first two words can contribute is another radical, which must be the “eye” radical, or 目, which is the left hand side of 眼 “eye” and the lower bit of 看 “see”. It doesn't give any indication of where to put it, but in this case there's only one place that makes sense: to the left of 苗, which finally gives us 瞄 miao2, meaning to stare or look hard.

One last one, which I'm also rather fond of.

(8) 两狗谈天 liang3 gou3 tan2 tian1 “two dogs talking”

This one really requires some lateral thinking, so I'm going to work backwards and give you the answer: 狱 yu4 “jail”. This breaks up into no fewer than 3 radicals: 犭(the animal radical) 讠(the speech radical) and 犬 quan3 (a character meaning “dog”). So: we have one dog, an animal (represented by 犭) and a second dog (represented by 犬), and there's speech going on between them (represented by 讠).

I guess I like these riddles because they require you to think on so many levels. A riddle might look like a perfectly innocuous sentence, but it mightn't give you any clues until you start parsing it in a totally different way, a way that your grammar probably hasn't trained you for. And then you have to go deeper and analyse the characters in terms of radicals - which I suppose one might do on a sub-conscious level, but not in the conscious way required by the riddle. And then you may have to go still deeper and analyse the characters at the stroke level, which I doubt anybody really thinks much about except when they're taking dictation. So it really requires exploration of all the levels of language that we only use subconsciously, which I find rather fun.

(I suppose that if one wants to be pedantic, one might argue that all of language is somewhat subconscious - we don't actively have to think, "Oh, how does one say that?" but somehow I feel that these levels are deeper even than the subconscious, because our ordinary use of language doesn't really call for them.)

By the way, all these riddles come from this website, where you find others (with answers).

Thursday, September 23, 2004

Cross-linguistic entropy

I stumbled across this paper today: Estimating and Comparing Entropy across Written Natural Languages using PPM Compression [PS], by Behr, Fossum, Mitzenmacher and Xiao. I've mentioned before that I've been unable to find papers on the entropy values of different languages, so this seems to fill the gap somewhat, even though it relates to text rather than speech, which is what I was discussing.

Basically, they took two kinds of text: (1) the King James' Version of the Bible and its translations into Spanish, French, Chinese, Korean, Arabic, Japanese and Russian, and (2) the UN Treaty Collection and its translations into Spanish, French, Chinese, Arabic and Russian. They estimated the entropy of the various texts by compressing them and comparing their size in bytes after compression. In the first case, the translations all compressed to within about 15% of the (17th century) English original. In the second, Russian was about 20% off while the others were within the 15% range. Interestingly, the different scripts made the original sizes of the various documents vary by quite a bit (e.g. the Chinese text was half the size of the English text in bytes), when compressed the ratios were English 1:0.864 Chinese.

They also took the English KJV and translated it into French, Spanish, German and Italian using the Systran machine translation tool (which powers Babelfish). In this case the resultant machine-translations were larger than the original. This may have been due to faulty translation; also, the choice of text was somewhat injudicious, as Systran couldn't handle archaic words like "giveth" and "taketh" and just left them untranslated (which, I suppose, amounts to a faulty translation!).

The paper also has an interesting discussion of the relationship between expressibility and entropy of different languages.

We suggest that the compressed size of texts with the same information content should remain close to constant across languages, even when the uncompressed texts vary in size...[o]ur hypothesis is based on the following intuition. ...[T]he estimates of the entropy of English are based on a finite stochastic model of the language. The relevant attributes of these models can be applied to all natural languages. The first is the set of statements that can be expressed in this language...[o]ur conclusions rely on the assumption that S^{L} [= the set of statements that can be expressed in the language] is the same for all natural languages...Over this set, we have a probability distribution describing the likelihood that a statement is expressed, or output by the source...for large samples of statements, the probability distributions [p^{L}] for different languages are likely to be quite similar...[i]f our assumptions that p^{L} is roughly the same across all languages is true, we would expect compressed translations to have approximately the same size.
Now, this section made me think of the recent debate over Pirahã, which supposedly displays:

(i) the absence of creation myths and fiction; (ii) the simplest kinship system yet documented; (iii) the absence of numbers of any kind or a concept of counting; (iv) the absence of color terms; (v) the absence of embedding in the grammar; (vi) the absence of 'relative tenses' ... (viii) the absence of any individual or collective memory of more than two generations past; ... (ix) the absence of any terms for quantification...

If you're anything like me, you're probably wondering what on earth these people talk about. But anyway, supposing Pirahã really does have these gaps. Then it seems to me that they would have a lot less/fewer (without embedding, can they have an infinite number of sentences?) sentences in their set of expressible statements, S^{Pirahã}.

So here's my question. Would a test such as that which Behr et al. conducted be able to detect a much smaller S^L? I'm guessing that smaller S^L would mean a lower entropy - it'd be much easier to guess what's going to come next when you have fewer possible things to express. Here's the bit I'm unsure about. If you're translating an English text into Pirahã, then presumably you've got the basic information across, in which case regardless of the syntactic structure of the language, the compressed texts should fall within about 20% of each other. After all, English and Chinese have very different syntactic structures and yet they fall well within 15% for the cases given above. So, you wouldn't be able to detect gaps elsewhere in the language other than the set of sentences you'd worked on. So it wouldn't work.

On the other hand, I guess translating from English to Pirahã would result in a lot of simplifications. You couldn't say "fifty men" but just "a large number of men", you couldn't say "a yellow bird", just "a bird" - or maybe there's some paraphrase available. In which case the compressed size of the document might be a lot smaller, since a lot of information is left out. But then again (in another bold twist of the plot!) no compression algorithm is going to recognise that, say, "a large number of men" is less information than "fifty men". If a lot of paraphrases are necessary, then the size of the file might be larger.

So I guess the whole issue is over what "translate" means. Is it still an accurate translation if a lot of the little details are left out?

OK, enough rambling. It's time for bed.

Friday, September 17, 2004

Library Lookup P.P.S.

What I had to do to convert it -

(1) change the URL of the website, in my case to vistaweb.nlb.gov.sg
(2) change the number after cw_cgi? to 10100 (from the original 10002)
(3) change the database (after use_Database) from 735 to 3002.

I don't know whether both (2) and (3) were necessary - I'm too lazy to experiment to find out. I did need to change (at least one of) them, though, otherwise it wouldn't work. How I found these numbers? The intermediate webpage when loading CARLweb (the page that says: "Loading CARLweb3.8.7.2; one moment please...." or similar) has the URL http://vistaweb.nlb.gov.sg/cgi-bin/cw_cgi?10100+REDIRX+useDatabase_3002, and that's where I got the numbers. Probably there are other numbers that work too, but somehow 10100 and 735 didn't work for me.

Library LookUp redux - success!

I was complaining a few weeks ago about not being able to use the LibraryLookup bookmarklet on my local (= national = Singaporean) library system's catalogue, which uses Carlweb as its OPAC system. Well, someone smarter than me (Ian Olsen-Clark at Notes From The Box Factory - thanks Ian!) managed to find a way to get Carlweb to cooperate. I've modified his Java code slightly so you can use it to search the Singapore library system's catalogue from any webpage containing an ISBN number such as Amazon's (Amazon.uk might work better for some of the books). And its works!!! Oh frabjous day, hurrah, hurray!

Unfortunately, performance isn't so great. With IE6, it takes about 10-15 seconds, while with Mozilla, it seems to take even longer. But I think this is an artifact of Vistaweb (the NLB catalogue) - it takes quite a long time even when searching directly from the site.

Here is the link (click and drag to your Links toolbar (in IE) or Personal toolbar (in Mozilla) and pretend that you trust me, then when you're on any page with an ISBN in the URL just click on the button (called NLB Library Look-up - but you can rename it) and it will look it up in NLB. Note that it will just open in the same window.

NLB Library Look-Up

Wednesday, September 15, 2004

Lynne Truss in Singapore

The other evening, I went to a talk by Lynne Truss, who's in Singapore for two days to promote her book. It was a wonderful evening, full of wry British humour (the talk was arranged by the British Council in Singapore), phonetic punctuation (in a tribute to Victor Borge) and more hilarious punctuation errors. Here are two - drop the apostrophes to appreciate the humour:

Residents' refuse to go in the bins
Those old things over there are my husband's

Lynne Truss does come across as a lot less mavenish in person than in her book, I must say. Contrary to popular belief, she really doesn't go around correcting every punctuation mistake she meets, but is often quite shy about pointing such errors out. Well, that's what she claims, anyway. And she seemed quite resigned to the loss of proper punctuation.

I suppose I'm sympathetic because I feel her pain. As someone who's studied linguistics, I know that common usage dictates what's right and what's wrong in grammar, but still, punctuation and spelling mistakes like saying "Apple's 50 cents" drive me up the wall. After all, if it ISN'T an abbreviation for "is" and it ISN'T a possessive, then there's no apostrophe!!! I guess the discussion on Language Log recently, particularly by Arnold Zwicky, about "the thin line between error and mere variation" is of significance here.

So that's how they do it!

I was looking at some ventriloquism websites yesterday, wondering how it works. Turns out that ventriloquists use three techniques:

(1) strenuously avoid words with labial consonants, which are the only "visible" consonants, but
(2) if they must, pronounce (mostly initial, because later consonants can be slurred more) labial consonants by making substitutions, such as:

(according to one site - different people seem to have different ideas about what substitutions are appropriate)

/g/ for /b/
/θ/ for /f/ and /v/
/n/ for /m/
/kl/ for /p/
/ku/ for /kw/ (e.g. in quality)
/u/ for /w/

(3) priming the audience to expect a problematic word, by saying it beforehand in one's original voice, then having the dummy say the word with problematic consonants substituted.

Looking at the success of ventriloquism over the centuries (it's been around since the Greeks, and the Zulus are supposed to have (had?) it too), ventriloquism and its techniques would appear to be good confirmation that there's a lot of room in speech for errors that nevertheless make no difference to the comprehensibility of the speech stream. Fitting to the lexicon, with the benefit of context, is enough to smooth over things like wrong consonants.

Related ponderings:
throwing one's voice

I wonder how people speech-read with any success, since they would be unable to observe featural differences such as [+/- voice]. I'll have to get a book out on it.

All these issues must be related to issue of entropy of natural languages. As I understand it, a language with a large amount of entropy could have basically no phonotactics at all - any letter (considering the written language) can follow any other letter. So mishearing a single consonant could be particularly bad - there's a pretty high chance of having another consonant following it. But with less entropy, there's more room for mistakes. I wonder if different languages differ in their entropy values? What does having a different phoneme inventory and system of phonotactics do to the entropy value of a language? Here's some interesting linguistics research that's been carried out regarding the entropy of natural language: link. I still can't find any estimates of entropy of any other natural language besides English, however.

I wonder, also, how well a speech recognition system would be able to pick up on these ventriloquistic differences. Probably too well - it wouldn't be able to make any sense of it.

Thursday, September 09, 2004

Chinese Tones in Music

I was originally going to entitle this post, "Who needs Chinese tones, anyway?" because of something that I noted before but never really thought much about: that when one sings Chinese music, the tonal information on the syllables is deleted, since the musical tones overwrite those of the syllables. And yet in most cases people have no problem understanding what's going on in Chinese songs. Of course, this may just be due to context - but nevertheless, I feel (felt) that perhaps Chinese tones were becoming irrelevant, especially in view of the fact (?) that Chinese words, consisting of >1 syllable, probably are more or less unique on the syllable level. That is, if you have two syllables (for example, chang and ge), the chances of there being another combination changge with different tonal features other than the one I am thinking of, namely chang4 ge1, is pretty remote. (It would be neat to do an analysis of this - if only I could find a list of Chinese words as opposed to just characters - to see if what I'm claiming really is true. I'm just speaking from rather limited personal experience.)

But I've decided to reserve judgment until I finish reading this paper, When Tones Are Sung, by Lian-Hee Wee. Seems that the case of Chinese tones in music might be more complex than I thought. Reactions once I've fully digested the paper.

This reminds me of a little paper I wrote explaining why the rule for placing tonal markers on vowels in hanyu pinyin, the commonest method of transcribing Mandarin Chinese, is the way it is. I really must blog about that at some point, since it's pretty neat - but not of any real consequence, and hence not something to publish.

Oh, and, the paper is hosted at the National University of Singapore, much to my surprise. I really ought to get in touch with some of the linguistics people there.

Tuesday, September 07, 2004

Last week's good readings

So you want to learn Japanese - read it for yourself!

----------------------

I've always preferred the Library of Congress classification system to the Dewey Decimal System. I don't know why, since DDS is what I grew up with. But here's a good reason not to like the DDS: Western bias.

----------------------

An amusing essay by Arthur Phillips about how he overcame "Hemingway's tyrannical proverb... [w]rite what you know" to write The Egyptologist. With the help of the British Museum, of course. Followed by two interviews with him, one about The Egyptologist and the other about Prague, his earlier first novel, which I really should get around to reading. That I should, in fact, have read while I was in Budapest. But no matter. I particularly like this line of his:

"My 'ultimate destinations' tend to be a little more difficult to explain to a travel agent. Prague in 1913. Budapest in 1931. Rome in 1964."

I feel pretty much the same: there are just these magical intersections of ages and places in my memory's reckoning that I wish I could visit but never can. Because, of course, "the past is another country", and one that's very, very difficult to visit.

----------------------

"The Age of the Essay", by Paul Graham. There's some really interesting stuff in this essay, about why writing skills are always taught in English literature classes in high schools, about what it really means to write an essay (essai, of course, is the French word for attempt), how to write a good essay (follow the example of the Meander river in Turkey - though probably not in the way you think), etc. A couple of good bits from his essay:

To some extent it's like learning history. When you first read history, it's just a whirl of names and dates. Nothing seems to stick. But the more you learn, the more hooks you have for new facts to stick onto-- which means you accumulate knowledge at what's colloquially called an exponential rate.

History seems to me so important that it's misleading to treat it as a mere field of study. Another way to describe it is all the data we have so far.

----------------------

French police stumbled upon an underground cinema and bar among the 170 miles of tunnels, caves, galleries and catacombs - which were formerly quarries. Patrick Alt, a "cataphile" who has published a book on urban underground exploration, said there were "a dozen more where that one came from".

Monday, September 06, 2004

Links from the past week

The beach as a miracle of self-organised and emergent behavior (Prospect Magazine) - and guess where the whole idea of going to the beach originated? Not in some sunny clime, as you might expect! Although avid readers of Enid Blyton might be able to hazard a guess!

Miracle on Probability Street (Scientific American) - on the laws of large numbers: "miracles" happen much more frequently than you think - a one-in-a-million event occurs 295 times per day in America, and in the course of a normal person's life, miracles should happen at a rate of one per month. This is actually a review of Georges Charpak and Henri Broch's Debunked! (I have the French edition, Devenez sorciers, devenez savants and highly recommend it).

An informative piece on Language Log about how computer-assisted transcription is done - this is used for closed-captioning and other purposes.

Hearing colours, eating sounds: two half-hour programmes about synaesthesia from BBC Radio 4 . As I was listening to these programmes, it struck me that the links between, for example, the senses of sight and of sound in some random homo sapiens could have given rise to the first proto-language and given them the idea. And of course, someone has already come up with this idea - it's mentioned towards the last bit of the second programme. I wonder if any other animals experience synaesthetic effects?

Mosque names in Singapore

One last thing for the night. I've noticed something about the transliterated names of mosques here: they transliterate some post-velar consonants such as /ʔ/ (glottal stop) and /ʕ/ (voiced pharyngeal fricative) as . The relevant examples are: where replaces the glottal stop hamza, and where replaces the pharyngeal fricative 'ain. I suppose it makes sense: /k/ is the nearest English/Malay consonant to these two, excepting perhaps /h/ for /ʕ/. Just a little linguistically-related tidbit.

El cheapo in the bookshop

Well, our entire national library system may have fewer books than a university library in a small city in upstate New York, but one good thing about the local book scene: we get plenty of cheap international textbook editions! These are usually paperback and printed on cheaper paper (though not necessarily, since I got Kandel et al's Principles of Neural Science in what seems to be exactly the same edition that's sold in the U.S. for about half the price). The other day I got Russell and Norvig's Artificial Intelligence for about 1/5 the U.S. price - this was a paperback edition, but doesn't look to be inferior quality paper.

And today, I found the strangest thing: linguistics textbooks by a variety of publishers re-published by a Chinese publisher (Foreign Language Teaching and Research Press) - again, in paperback and this time with cheap paper. I got The Handbook of Contemporary Syntactic Theory (Baltin & Collins, ed.) for S$32.35 which makes it about a quarter of the U.S. price. Funny thing is, the prefaces and introduction were all translated into Chinese (as well, of course, as being reproduced in English). The rest of the book remains the same, and with the same pagination.

Now, if only we had a few more decent used bookstores, where I wouldn't go in only to find books that I recognised from five years ago, and university libraries that I could stroll into without paying exorbitant membership fees, I would be happy.

Sunday, September 05, 2004

Reflections on Old Entish

I was just watching Lord of the Rings: The Two Towers the other day, quite by accident since I didn't know it was going to be shown on Singapore TV. One thing struck me as a bit odd: Old Entish, which raised two questions in my mind.

Firstly, if there's an Old Entish, that more or less seems to imply the existence of a New Entish. And if there's a New Entish, why are the Ents still using the old one? One reason I thought of was that Old Entish is something like Latin - no one speaks it as a first language any more, but it's the language to use when you're discussing diplomatic affairs and such-like. And I guess deciding whether or not to go to war against Saruman is a diplomatic affair. A second reason might be that the Ents from a very large geographic area were gathered for the Ent-moot (though this reason seems a little suspect since (1) the Ents converged very quickly, and (2) the Ents were supposed to be only found in Fangorn forest, which doesn't seem to cover a very wide geographic distance - not for the Ents, anyway), and so there may have been different dialects spoken among the Ents, and the only language that would be mutually intelligible to them all was Old Entish. Alternatively, it was more or less agreed-upon that in such situations, Old Entish would be the language adopted. I suppose this is more or less akin to the diglossic situation of Arabic - though no one would call "Modern Standard Arabic" anything like "Old Arabic". Probably the first reason is more reasonable.

A second question that Old Entish raised was as follows: are there significant differences in human languages in terms of the amount of time used to say something? OK, so there're languages like Hawaiian that have a small phonemic inventory and therefore have words many syllables long - but is there a corresponding reduction in the length of a syllable, since with a smaller set of possible syllables to choose from, careful enunciation of every syllable is not required?

Let me put it another way. Assume that we have a certain sentence translated into languages A and B. The sentences mean exactly the same thing in languages A and B - let's not bother for the moment about cultural differences in the way languages express different concepts - nothing like schadenfreude, no metaphors, not even any compound tense and aspect or whatever, just simple sentences like, oh, I don't know, "John kicked the dog". Presumably, these sentences convey the same amount of information. Taking this amount of information (how is it to be quantified?) and dividing it by the amount of time needed to communicate these sentences, and averaging this over a whole bunch of random sentences gives you the efficiency of the language. You'll also need to figure out what the average syllables/second rate is for the language, I suppose. Ultimately, my question boils down to this: is any one language significantly more efficient than another in communicating concepts? It seems to me that Old Entish is a singularly inefficient method of communication.

Stop press! : a couple of articles I've just found relating to the discussion above:

Peter Roach. 1998. "Some languages are spoken more quickly than others". In L. Bauer and P. Trudgill (eds.) Language Myths. London: Penguin. pp 150 - 8.

Plus, a surprisingly detailed "definition" of Ents, including a discursus on Old and "New" Entish, from wordIQ. Turns out "New" Entish is mixed language, with Quenya vocabulary (and presumably, morphology and phonology) but with Old Entish grammar, so it took just about as long to say anything as in Old Entish.