Tuesday, October 26, 2004

Emotionally-coloured synaesthesia and category-specific deficits

ITEM 1. More on a synaesthetic theme: an article in Nature News about a synaesthetic woman who sees colours around faces and names:
G.W. is a young woman who sees colours around words or things only when the object has an emotional association for her. Many synaesthetes see letters as coloured, for example in the word 'love', 'l' might be green, 'o' might be cream-yellow, 'v' might be crimson, and 'e' royal blue.

But instead G.W. sees the whole word 'love' as pink or orange because it is a positive word. She sees the word 'James', or James himself, as pink for the same reason: she likes him. Her case is described by Jamie Ward, a psychologist at University College London in the latest issue of Cognitive Neuropsychology.
I suppose this woman has a foolproof way of knowing whether she's gone "off" someone: wait for the pink hue around their name to fade to whatever colour non-positive words take on.

ITEM 2. Snooping around Cognitive Neuropsychology (the journal in which the above research was published), I came across an issue (vol. 20, 3-6, 2003) devoted entirely to category-specific deficits in the mental lexicon. This subfield began in 1983 when Warrington & McCarthy reported "a patient with preserved knowledge for animals, foods, and flowers, relative to inanimate objects". After that, all sorts of deficits began to be reported in different patients - e.g. human vs. non-human, animate vs. inanimate, implying that objects of similar category were in some way grouped together in the brain. Even more strongly, it's been claimed that these brain groupings are the result of evolutionary pressures, restricting a natural conceptual category to 'categories for which rapid and efficient identification could have had survival and productive advantages. Plausible candidates are the categories "animals", "plant life", "conspecifics", and possibly "tools".' (from Mahon & Caramazza 2003 in the issue cited).

I think it'd be interesting if syntactic correlates could be found for many of these specific concepts. We know already that many languages are sensitive to semantic categories such as human vs. non-human and animate vs. inanimate - for example, in English we can see a distinction between animate and inanimate nouns according to what form of possession they preferentially take. For example, "the boy's sister" is more natural to us than "the sister of the boy", while "the fruit's core" is less felicitous than "the core of the fruit". Or alienable vs inalienable possession reflecting the animacy hierarchy. Etc., etc.

My question is - are the less intuitive categories such as natural vs. man-made reflected in any language? I'm thinking about languages that divide up their nouns into much more specific categories. For example, Chinese, which assigns its nouns different classifiers [link to MSWord file], or Swahili, which has several noun classes, the division of which may be said to be loosely semantics-based. Do these divisions have any correlation with the sorts of "natural" conceptual categories that neuropsychologists have been studying through brain lesions and fMRI?

Language variation/change & community size

Haven't been posting much on linguistics recently, partly because I haven't been thinking a whole lot about it, other than trying to resolve a paradox in my own mind concerning the effect community size has on language variation/change in a closed monolingual environment (i.e. we'll forget about language change due to borrowing and lexical diffusion).

I'd always thought that language change should be slower in a smaller community; I think part of the reason that I held so tenaciously onto this belief for so long was due to (1) stories (myths, in some cases) about, for example, how isolated communities speak like their 16th century ancestors, (2) not separating out from these stories other variables such as trade and contact with the outside world and hence other dialects and languages.

Most of all, though, I think my misconception boils down to the difference between variation and change: a small community might not have very much in the way of variation between its speakers, but a single change can travel through the whole population a lot faster and become a part of the language than such a change in a larger community.

I suppose part of the difficulty is teasing out the difference between "language" as a product of the community and "language" as a product of one's own internal grammar. Language change is mostly a property of the first, while variation - how much one's own internal grammar varies from that of the community - is mostly a property of the second.

In the course of thinking about this I came across a few interesting papers, for example "Is the rate of linguistic change constant?" by Daniel Nettle and this simulation of language change, which you may want to check out.

In a moment I'll post about a couple of vaguely linguistically-related pieces I came across today. So the linguistics drought is over.

Monday, October 18, 2004

The Distributed Library Project

This is an idea that has been simmering in my head for a long time, and now I've found that someone's actually implemented it for real! Introducing the Distributed Library Project (via LibrarianInBlack):

The Distributed Library Project is a library catalogue that allows anyone to nominate their own collection of books as a library, and themselves as a librarian. The software was developed intially in West America, see communitybooks.org . The London DLP is based at the UO in Limehouse, at the domains http://dlp.theps.net and dlpdev.theps.net.

The original blurb for the library is as follows:

The distributed library project is part of an experiment in sharing information and building community. Unfortunately, the traditional library system doesn't do much to foster community. Patrons come and go, but there is very little opportunity to establish relationships with people or groups of people. In fact, if you try to talk with someone holding a book you like - you'll probably get shushed. The Distributed Library Project works in exactly the opposite way, where the very function of the library depends on interaction.
I'd been thinking that something similar for us book-starved Singaporeans might be a good idea, but there were certain issues that needed to be worked out, e.g. how do you know your books will be returned? They solve this issue using a feedback system similar to that used on Ebay whereby borrowers who return their books on time and in good condition get positive feedback, and vice versa. They don't say anything about how books get from one user to another - presumably users arrange that between themselves. But anyway, it's a wonderful idea. I only hope that something like this can be implemented here, and save us all the horrible price of postage.

Thursday, October 14, 2004

Randomness post

Item 1: Google Desktop Search, via John Battelle's Searchblog and SearchEngineWatch. A step towards solving the somewhat bizarre predicament we currently find ourselves in, in which it's easier to find things on the web than on one's much smaller and better-organised disk drives!

I've just downloaded and tried it. Unfortunately, right now it's all geared towards Microsoft-based non-open-source stuff. It searches Word, Excel, Powerpoint - not OpenOffice files. And that's what I use, OpenOffice. I still save some of my files in Word and Excel formats, but only the ones that I know I'll be distributing to other people. The ones I create for myself, I just save in .sxw and .sxc formats (and so on), because I don't need to go through an extra couple of mouse-clicks (so I'm lazy). It also doesn't search PDFs - how weird is that? It only retrieves them if you have the keyword in the title. It searches AOL IM chats - and I think that's pretty cool - but most of the rest of the world (including me) uses ICQ. Oh well, I'm sure they're going to add the capabilities for all these things at some point, and if you desperately want something, you can send in a request. Or perhaps someone will hack it to include other file formats and make it a Mozilla add-on.

One thing that's kind of cool about it is that you can basically search your own browsing history with it (provided you don't delete your history, I suppose). Naturally, right now it only supports Internet Explorer (blast!) but this is a function that I've wanted to have for a long time. It's really frustrating when you've browsed through things for hours and then said, wait a minute, there's that thing that was neat but I've forgotten which page it was on - even though you knew you saw it just hours ago - and then had to look through incomprehensible URL titles to find it. Having to Google the whole web for it, instead of searching within the much smaller subset that you know your computer would be able to find if only it were enabled for it, is the ultimate in frustration. As John Battelle observes, this is challenging A9's new personalised search history feature - anyone who downloads the Desktop Search tool automatically has it, and therefore about a third of A9's advantage is gone. The Desktop Search also does some cool things with caching that I didn't realise because I don't use IE - look at the rather comprehensive SearchEngineWatch article for details.

Another interesting thing is that searches are ranked not based on relevance or anything like that, but chronological order according to when it was last accessed. I think this is a good thing. But there are improvements that could be made - for example, I think a paned search (like the interface A9 uses) would be really good. One thing I would like to have would be a pane for documents that you've created yourself or have had sent to you and a pane for documents in your browsing history and another for all others e.g. help documentation. I find it rather irritating that I get results from Microsoft documentation when looking for documents that I've authored myself. And 99% of the time you probably would want stuff from the former two categories rather than from the third.

John Battelle also has some pretty cool predictions for how Google can expand on this service - such as "a lightweight word processor so you can take notes on your searching". Oh my gosh, now that would really be something. Can't wait till they enable all the other media formats, either.

(2) Lucky Luke, my favourite bande dessinee, has been made into a film! Or even several. Now, I don't think I would find this at my local video store. I wonder if Netflix would have it. Hmm, apparently not. The good thing about America is that you can get almost anything there - any used book you want, any DVD you want - unless of course it's in a foreign language. Now that's a different story.

(3) I never thought I would get interested in Chinese serials. My mother used to try switching on the TV to Channel 8 (then the only Chinese channel on TV) when we were learning Chinese in school to interest us in the programmes there, but we never watched them for longer than 5 minutes. Now I find myself captivated by the series "Heavenly Sword and Dragon Sabre", or 倚天屠龙记 (yi3 tian1 tu2 long2 ji4), which just finished its run on Channel 8. It's an adaptation of a martial arts novel by the famous Jin Yong, also known as Louis Cha. OK, I call him "famous", but I actually had never heard of him before watching "Heavenly Sword and Dragon Sabre".

Now, why were his books (and TV serial adaptations) never mentioned in my eleven years of formal schooling in Chinese? Surely I wasn't just not paying attention the whole time? Recently there's been a lot of debate over the teaching of Chinese in schools here in Singapore, because they're realising that the system just doesn't work for a large majority of the students. Students are getting turned off from having anything to do with Chinese, as I did, because it's forced down their throats, requiring lots of memorization and reading stuffy old stories. When they could have sat us down in front of these TV serials and got us interested that way. I never thought I would, but now I find myself actually reading Chinese novels!

Part of the attraction for me is the fact that these martial arts, or wuxia, novels, although fantastic, have their basis in reality. The sects in China mentioned in the serial - Shaolin (everyone knows about them), Wudang, E-mei, Ming, etc., all exist. It's also funny to think that for so long I thought that Europe had a monopoly on knight errants and chivalry, but that halfway around the world there were similar "knights" (xia - "knight" is not the best translation, since it's really too culture-specific, but it's hard to come closer) and codes of chivalry. And apparently European chivalry was adapted from an earlier Islamic code. I think I first heard about this at the Institut du Monde Arabe in Paris, but it was only a snippet stored at the back of my mind.

Well, you live and learn. To learn more about the stuff I'm nattering on about, check out some online (non-official) translations of Jin Yong's works.

Friday, October 08, 2004

Arabs "decoded" hieroglyphic Egyptian?

Firstly, go read the post "An Arab Champollion?" on Language Hat's blog for background. Basically, an Egyptologist named Okasha El Daly has claimed that hieroglyphs were "decoded" (I hate that word) hundreds of years before Champollion, by Abu Bakr Ahmad ibn Wahshiyah, an Arab alchemist. This post immediately jumped out at me because I have wondered about how much knowledge was retained by the descendants of the ancient Egyptians of that culture. In folk dances and such, for example, they still retain knowledge of the legend of the battle between Osiris and Seth (see Nefertiti Lived Here by Mary Chubb for a description of such a dance, performed in the 1930s). And I believe that there was work carried out at the Dar al-Hikmah, which was charged with the translation of foreign texts, on unknown scripts - certainly hieroglyphic Egyptian would have been interesting to them. But I never really had the time or inclination to go any further than posing the question to myself.

In a comment to LH's post, John Hardy points to the English translation of ibn Wahshiyah's work (the Arabic text is on the same site as well) published in 1806 - prior to Champollion's breakthrough, it may be noted. So I hotfooted it over there (well, actually the text hotfooted its way to me) to take a gander. I haven't read it very closely, just taken a peek at the bits about hieroglyphic Egyptian (or the "Hermean" alphabet, as it's called in this work).

OK, so El Daly claims that "[t]he important thing is they realised that these hieroglyphs were not pictures, which was the prevailing view among classical writers". OK, that seems to be the case. On page 16, there's what seems to be an indication of some knowledge of determinative signs:

These expressions consist in innumerable figures and signs, which are to lead the mind directly, and immediately to the object expressed thereby, viz: there is a sign which signifies the name of God Almighty, simply and alone. If they wished to express one of the particular attributes of God they added something to the original sign, and proceded [sic] in this manner, as you will perceive by the alphabet in question.

I think this may be the paragraph El Daly refers to when he says, "Ibn Wahshiya was the first scholar ever to talk about determinatives, describing them in a paragraph which any modern scholar would be proud of". Though I don't know that any modern scholar would really have expressed himself in such a fashion, or been proud of it.

And then on page 43, there's a discussion of the "Shimshim alphabet", which "was inspired by divine revelation, and varied in four manners by the people who used it", one group of whom were the "Hermesians". Some of these look like hieroglyphs to me, and they are given phonetic values, unlike in the previous tables which are all given meanings. The phonetic values all consist of a single phone(me), like Z or "H hard". Although El Daly claims that some of the phonetic values given are correct, most of them seem pretty wrong to me, in consultation with Budge's Egyptian Hieroglyphic Dictionary (be aware, though, that I checked only a few signs which I recognised as hieroglyphs I'd seen before - and I'm no expert, I just have the books!) For example, the crook-shaped character is given the phonetic value /l/, whereas the accepted value today for that character is /s/. And there's no indication of any awareness of the possible syllabic nature of the phonetic elements. There's a sign on page 46 (fifth down) consisting of an enclosure and a straight line underneath it; to the best of my knowledge the straight line indicates that the ideographic meaning of the sign above should be taken, so the character as a whole indicates a house, with the corresponding phonetic content /pr/. The whole character is given instead, however, completely phonetic content consisting of a single /p/. The determinatives, similarly, don't seem to match up with any of the ones I see in Budge's dictionary.

So it seems to me that although it certainly is interesting that ibn Wahshiyah saw that hieroglyphic Egyptian was more than ideographic, I certainly wouldn't call it a "decoding". Surely he just had a deeper insight into the character of the script?

Anyhow, the English translation seems a pale shadow of the original Arabic text - 54 vs 136 pages. (The Arabic text, by the way, seems to be set in a strikingly modern typeface - the same one used in the textbooks from which I studied Arabic. But I digress.) I haven't made any attempt at all to read the Arabic (or, indeed, to read the translation in full), which seems to have much more content. So perhaps the Arabic text contains much more exciting stuff than what I found in the English translation - and then again, maybe not. Those familiar with hieroglyphic Egyptian may care to take a look at page 48 and see whether the sequences of characters given there really mean what ibn Wahshiyah claims.

My experience with Cantonese

I was reading this article, "Why do people learn languages?" at Mark Rosenfelder's Metaverse site. It has some interesting discussion about child language acquisition, particularly his assertion, which is in all likelihood true, that "children don't learn a language if they can get away with not learning it" (emphasis his). He gives several examples of children exposed to two or more languages who nevertheless wind up speaking only one, because they realise that, for example, the one caregiver who speaks to them in a secondary language actually does understand their first language, and begins communicating with that caregiver in the primary language, gradually losing their fluency in the secondary language.

I had a similar experience with Cantonese. My father and mother both speak fluent English; the extended family with whom I have significant contact all speak English with varying degrees of fluency. But my mother's side of the family usually communicate in Cantonese - with some Mandarin and English thrown in. I don't think anyone ever directed any significant amount of Cantonese speech to me, so I wound up just learning English at home. Mandarin had to wait for school, and as for Cantonese - I now can understand it (with some help from context), but not speak it (apart from a few stock phrases).

I used to think to myself, "Why didn't I learn Cantonese when I had the chance? After all, I had all this linguistic input - maybe I just didn't put enough effort into it." And I guess I didn't, but it wasn't a conscious decision on my part. Some acquisition occurred, but not enough to make me fluent. If anything, I got only the parsing part of the grammar and not the generating part - now try devising a model of language processing that can take care of that! I don't think this is an uncommon situation. I think there are quite a lot of people who can understand certain languages but not speak them.

Anyway, I'm now trying to better my Cantonese. I've decided to try "leveraging" my knowledge of Mandarin Chinese to help me learn Cantonese. Even though they're far apart enough to be separate languages, their grammatical structure seems similar enough for me to do direct transliteration, as least for the time being. As for phonology, there are tables available for converting Cantonese syllables to Mandarin, vice versa, and even one for converting tones. The conversion isn't one-to-one by any measure, but there are patterns and principles to be found. For example, the final (i.e. rhyme) -im in Cantonese almost always converts to -ian in Mandarin.

The trouble with this approach, I've found, is that Cantonese is more conservative than Mandarin. In itself, this is not such a bad thing; however, Mandarin has undergone a lot of mergers, resulting in a much smaller syllabary. This means that a Mandarin syllable may correspond to many more Cantonese syllables than vice versa. So it's a lot easier to convert from Cantonese to Mandarin (fewer possibilities), than Mandarin to Cantonese (many more possibilities) - which is the direction I want!

To give the statistics: for any one Cantonese syllable, the mean number of corresponding Mandarin syllables is 2.47; for any one Mandarin syllable, the mean number of corresponding Cantonese syllables is 3.58 - a significant difference. For 60% of the Cantonese syllables, you have a 50-50 chance or better of guessing the correct Mandarin syllable; but for Mandarin syllables, it's only 46%. Only 2% of the Cantonese syllables have 7 or more possible correspondences, but an incredible 13% of the Mandarin syllables have 7 or more possible Cantonese correspondences. You begin to see the problem. Add to the fact that the original 8 tones of Middle Chinese merged into four (plus one neutral) in Mandarin, while Cantonese preserves 7 tones - gack!

So I don't know if this approach is really going to work, but it's fun going through the table I compiled from the tables linked to above and seeing all the regularities.

Thursday, October 07, 2004

Recent exciting book news

I've been getting quite excited by a lot of book-related news recently. Here are some things that really got me going:

The Long Tail at Wired: we're moving from a world of hit-driven economics, where what you read, watch, and consume is dictated by "mainstream culture", to a world of niche-driven economics, because publishers and sellers are discovering that there's just as much money to be made, if not more, from "the long tail" - the non-hits that are bought only by a very few people:

What's really amazing about the Long Tail is the sheer size of it. Combine enough nonhits on the Long Tail and you've got a market bigger than the hits. Take books: The average Barnes & Noble carries 130,000 titles. Yet more than half of Amazon's book sales come from outside its top 130,000 titles. Consider the implication: If the Amazon statistics are any guide, the market for books that are not even sold in the average bookstore is larger than the market for those that are ... In other words, the potential book market may be twice as big as it appears to be, if only we can get over the economics of scarcity. Venture capitalist and former music industry consultant Kevin Laws puts it this way: "The biggest money is in the smallest sales."
This especially resounds with me because I know I'm a niche reader. There aren't that many people interested in linguistics in Singapore. There're two shelves of linguistics books in our local Borders. Kinokuniya has more, but they're really, really expensive.

Then there's Brewster Kahle's call to arms at the Web 2.0 conference yesterday [via Boing-Boing]. Here are the key points I gathered from listening to the MP3 (available here):

- Universal access to all human knowledge is possible.
- There are 26 million books in the Library of Congress, of which more than half are out of copyright, then there are 8 million in copyright and out of print, and a comparatively small number of in-print books.
- A book in ASCII takes up about 1 MB, so it would take 26 terabytes to store the whole LOC - which would cost $60,000.
- It costs $10 to scan a book. Efforts ongoing at the Library of Alexandria, in China and India (not to mention Project Gutenberg, etc.)
- So, it would cost $260 million to scan the whole LOC. Which isn't that much!
- It makes no sense that we're not allowed to put out-of-print stuff but copyrighted stuff on the net, where people can actually *use* it. Brewster Kahle calls these "orphans".
- So he's suing John Ashcroft in the Supreme Court for the right to bring these orphans onto the net. Go, Brewster, go!
- Furthermore, it costs $1 to print and bind a 100-page black and white book. Harvard says it costs them $2. In other words, it's cheaper to make books available for people to bring home - forever, and free - than to go to all the effort of getting them back again!

Now, to take the idea further from the "Long Tail" article, it makes perfect economic sense to make *all* books available in this way - digitized, then cheaply printable - because sometime, somewhere, someone might pay some money for it. And if you add up all the dollars and cents from each of these out-of-print books, that's a heck of a lot of money. Plus, it's perfect for places outside the US and EU - it costs us a tonne of money to get books shipped over here. It'd cost me $8.98 to get a single book shipped to me from Amazon - which is half the price of the book! What we need is things like the Bookmobile (or more permanent and stationary print-and-bind-on-demand kiosks in bookshops). It's an old idea, but it only makes economic sense if there's the inventory and the demand. The demand's been demonstrated. We just need the inventory. There's all the out-of-copyright stuff, which is being digitized by efforts like Project Gutenberg, coming along. And if Kahle succeeds in his lawsuit against Ashcroft, that'd be a huge number of 20th-century works, timely and relevant. And then get some major publishers to jump aboard and make their works available in the same way...

And once we have all this stuff, Google will help us search it: Google has started a print service to digitize and make available for search in-print books, which is clearly to combat against Amazon and A9's Search-Inside-the-Book feature.

Oh, and, an update to the LibraryLookup bookmarklet thing. One of the big problems with it as it currently stands is that you might be looking at a certain edition of a book on Amazon with a certain ISBN, but your library has a different edition with a different ISBN, and there's no way to jump from one to the other. Ah, but in gallops the OCLC with its xISBN service, which takes an ISBN and looks for all editions, translations, etc., of that book and returns you a list of the ISBNs. Now, take the output from that and send it through your LibraryLookup bookmarklet and you should get all editions, translations, etc., available at your library! Wonderful! The OCLC already has some ready-made bookmarklets available here, but naturally not Singapore's. Next on the project list: figure out how to modify my NLB bookmarklet to take advantage of the xISBN service.