Saturday, February 26, 2005

Optical delusions

This is a really nice site on optical illusions that includes various animations demonstrating the various effects and, for some of the illusions, explains what's going on in our visual perception that causes those illusions. [Via BrainWaves].

My favourite one is the hollow-face illusion, but I won't give it away by describing it. A related activity that demonstrates this effect is this dragon's head optical illusion [via Mindhacks] which I've made myself. For some people it takes a while to see (some people just peer and it and say, "so what's supposed to happen?"), but eventually most people seem to get it.

Mindhacks, by the way, is the blog for the book Mind Hacks, which I've just bought and can't wait to get into. It looks pretty good.

Monday, February 21, 2005

More about ISBNs than you ever cared to know

It's rather sad that I should find this fascinating, but did you know you could identify where a book was published simply from its ISBN? The first few digits will tell you. 0 and 1 are for English-speaking countries (Australia, Canada, NZ, S. Africa, UK, Gibraltar, USA, Ireland - and, strangely, Namibia, Swaziland and Zimbabwe), 2 for French-speaking countries, etc., etc., then you get to 8 and the double digits - 80 for the Czech Republic and Slovakia, up to India at 93, then the 3-digit ones beginning with Argentina at 950 (don't know where 94 went) , 4-digits at the Dominican Republic with 9945, and then Bahrain at 99901 down to Eritrea at 99948.

Of course, this is in proportion to the countries' abilities (or proclivities) to churn out books, since Eritrea can only publish 10,000 books under this scheme - 999480000 through to 9994899996. The last digit is a check-digit, so there are only four decimal places for them to play around with. You can check out the full list of country identifiers here.

The next few digits are similarly assigned to unique publishers. Again, bigger publishers get smaller codes, so they have more room to play around with the number of ISBNs they can assign.

You know, this looks a lot like a prefix-free Huffman encoding(if you're into information theory) where things that occur more frequently get a smaller identifier. The funny thing is, in information theory, that implies that these major publishers (and countries) put out stuff containing less information! This page breaks down the distribution of publishers for English-speaking countries (look down at the bottom of the gory details).

Further reading:

This is a really good textbook about information theory available on the web: Information Theory, Inference, and Learning Algorithms by David MacKay. I especially recommend the section about estimating the entropy of English given the fact that we can construct American-style crosswords. (British-style crosswords are harder to solve though!)

Another useful website: ISBN check
Wikipedia's entry on ISBNs

Oh, and, ISBNs are becoming 13 digits long - current ISBNs will have 978 prefixed to them and a recalculated check digit (which is what appears on most barcodes anyway - the 978 indicates that it's a "Bookland" EAN). This is, of course, because the 10-digit ISBNs restrict the number of books published. Looks like those Library Lookup regexes are going to need a tweak before too long.

Job advice from George Orwell

Bookshop Memories, by George Orwell. [Via Language Hat]:

But the real reason why I should not like to be in the book trade for life is that while I was in it I lost my love of books. A bookseller has to tell lies about books, and that gives him a distaste for them; still worse is the fact that he is constantly dusting them and hauling them to and fro. There was a time when I really did love books — loved the sight and smell and feel of them, I mean, at least if they were fifty or more years old. Nothing pleased me quite so much as to buy a job lot of them for a shilling at a country auction ... But as soon as I went to work in the bookshop I stopped buying books. Seen in the mass, five or ten thousand at a time, books were boring and even slightly sickening.

Reason #2 not to open a bookshop, which used to be one of my dreams. The first reason, of course, is that I wouldn't make any money, since I wouldn't want to sell any of my books, because I wanted them all for myself! I wonder if working in a library would be any better...I think I would really like to become a university librarian.

Friday, February 18, 2005

Buckwalter Arabic transliteration - now a Windows keyboard

Back in the day when I was studying Arabic, I used to use the Buckwalter transliteration all the time. Mostly because I was using the XRCE Arabic Morphological Analyzer as an Arabic-English dictionary - and believe me, it's better than any dictionary! Especially since I never bothered to memorise the order of the Arabic alphabet (alef, beh, um...). That proved a problem, though, when I started wanting to type in Arabic, since the Arabic input method editors basically arrange the Arabic letters alphabetically along the keyboard. So what's on the t key doesn't sound like t. This was not good. I basically gave up typing anything in Arabic.

Finally I decided that I had to do something about it, so I looked to see how I could modify keyboard mappings so that the mapping was the same as Buckwalter's. This was much more logical to my mind, since sounds that are similar in English and Arabic share the same key. So 'qaf' is a 'q'. Of course, this doesn't work for every letter - Arabic and English have very different phonological inventories - but it works for the majority of the cases, and anyway I had more or less committed the mapping to memory from using the morphological analyzer so much.

I found a few utilities that seemed to do something similar, but the ones I looked at didn't seem suitable for me: not for XP, or you can only map a few key combinations, that sort of thing. I wanted to be able to create a whole new keyboard within, say, Arabic (Egypt). Then I found that Microsoft had a keyboard layout creator - for free (!!!??? - did you say Microsoft?). Catch is, you have to download the Microsoft .NET framework, which is 20-some MB. (BTW, I should mention that Unix/Linux users can do this really easily - they have utilities built in to help with this.

The keyboard layout creator is pretty simple. It gives you a blank keyboard. You click on a key then define the Unicode character to associate with it. So you just do this for key after key, and then you build it, and it makes a nice little dll and installation file for you. Then you install it, add it to your language options, and voila!

The only thing was, I'd forgotten all about Arabic question marks being the "wrong" way round. So there I was, gaily typing away, and then ? - oops, that looks very wrong, when you're typing right to left! So I just went back to the keyboard layout creator and changed that.

So, if it so happens that you're in the same boat as me - too used to Buckwalter to use Microsoft's silly mapping, drop me an e-mail or write a comment to the effect that you would like a copy, and I'll send it to you. Or, you can give yourself a bit of fun and go through the whole rigmarole yourself! (But seriously, if you'd like this - only for Windows - let me know.)

Wednesday, February 16, 2005

The - national - library - is - driving - me - nuts!

For a long time I wondered why books would be listed in the NLB (Singapore) library catalogue as "not yet available". That in itself wasn't so bad...after all they could be waiting for them to be delivered, or catalogued, or whatever. But then when I found that books from *1984* were listed as "not yet available", I began to wonder. That's kind of long to wait for a delivery, isn't it? And then books that I searched for three months ago were still listed as "not yet available"...hmmm.

Today the mystery was solved by a kindly librarian - apparently these books have *never been ordered*!!! Why, then, were they in the system? Because any book that is "acquire-able" is listed in the system. I was so shellshocked I just stood there staring at the librarian. I couldn't really think of anything to say, the illogic was just so dumbfounding.

So (1) why is it that there are books that aren't even listed in the library catalogue? Are they not even order-able? (My, what a lot of productive morphology I'm using today.) And (2) does it make any sense to list these books in the library catalogue? Surely the catalogue should include just books you can actually get your hands on. It's really, really annoying to think "aha! They have the book!" and then have your hopes dashed. And what's with the "not yet available" thing, anyway? I'm sure that's violating some implicature somewhere...doesn't *yet* indicate that it's almost certainly going to happen? But it's not if you haven't ordered the book!!!

In other rantings regarding the library, I was tinkering around with the library's Carlweb OPAC the other day and to my delight I found an alternative URL syntax (using +'s) that seemed like it might be more flexible than the current one I know of (using underscores) since the former might be able to handle other Boolean operators. But alas, after spending a week away that alternative syntax no longer worked. No clue what happened. Maybe some administrator got spooked by me trying to find all the ways of searching the catalogue without actually going through the catalogue's search page.

Tuesday, February 15, 2005

Psychology experiment interrupted by the alarm

Garn. The most annoying thing happened this morning. I was having this really interesting dream: I was playing hangman (in a library...go figure) and I was guessing letters and they were coming up. But, I didn't know the answer to the puzzle!

So I thought (well, I don't know if this entire thought formed while I was having my dream but it came to me all at once when I woke up) that this would be interesting: if the puzzle turned out to be a real English phrase (it comprised four words) then that had to mean my brain knew the answer - but then whoever was guessing didn't know the answer! So there's some kind of divide between what you're dreaming of and what the character you're playing is thinking. To put it another way, there's a sort of conscious unconscious - the one playing the puzzle, who didn't know the answer - and the unconscious unconscious, which set the puzzle.

Unfortunately, I had just guessed the second letter (e, which turned up in _ _ _ E E _, the third word of the four) and I was going through words it could be (career, exceed...) when the alarm rang and the dream instantly dissipated. Dash and bother. Oh well, it wouldn't have been a real experiment anyway - no way to repeat it. I suppose this is what's called the introspective method.

Monday, February 14, 2005

Where in the world have I been?

Thought this was kind of neat: a map showing what countries and cities I've visited. It's a zoomable Flash file, so zoom on in and out as you like. Green countries are ones I've visited. Blue cities/towns are ones that I've called "home" for one reason or another. Orangey-red means cities I've visited more than once or for a week or more. Yellow indicates I've only spent days there and don't really know them all that well.

I made this map using this software by Social Design Notes (thanks guys!) Was pointed there by WorldChanging. Basically, I took one of their already-created XML datafiles and modified it by inputting latitude and longitude coordinates for the cities I'd visited. A bit time-consuming, but pretty easy. There are websites where you input what countries or states you've been to and thus generate a map, but this software allows you a bit more flexibility. You can put Timbuktu in the middle of the Atlantic if you want to. And you even draw lines indicating a round-the-world trip! Neat!

Odd folk medicine remedy

Was pointed to UCLA's folk medicine database [via ResearchBuzz]. Look at this cure for migraine headaches:

Woman was told she should “have her head cut off and thrown to the hogs.” She became very angry and had no more headaches.

That's just great. Yeah, I wish someone would tell me that and my migraines'd just stop...

Tuesday, February 08, 2005

Math jokes

An article in Nature all about math jokes. This one's my favourite:

A mathematician awoken from his hotel bed by a fire sees a fire hose in the hall, exclaims, "Ah, a solution exists!" and goes back to bed.
They also feature the only math joke the author (or I) have ever seen that is based on integration. Go check it out, it's a hoot.

Thursday, February 03, 2005

That's not a phoneme!!!

I recently read a review (from 2002) by Jon Udell about Fast-Talk, which is now Nexidia. Their product uses what they call phonetic searching in order to mine audio - do keyword search and that sort of thing. This paragraph is from their website, under "The Science of Phonetics":

Phonetics is the systematic study of human speech-sounds. It provides means of describing and classifying virtually all the sounds that can be produced by human vocal tracts. This study is based on “phonemes”, which are the smallest unit of human speech.

All utterances made in the entire world have been catalogued within a 400 phoneme range. The majority of languages use a 40 phoneme range, and the most widely spoken languages fall within an 80 phoneme range. that really a phoneme? First of all, phonemes are the minimally *distinct* units of human speech, not just the minimal units. That would be a "phone" they were thinking about. Plus, a phoneme is completely, utterly, language-dependent. You can't talk about a phoneme without already having a certain language already in mind. This is because different languages draw the boundaries between their phonemes in different ways. English speakers generally can't tell the difference between Hindi's dental and retroflex t's and d's, for example.

I don't know if there really are 400 phones in all the world's languages. I wish there was a single copy of Ian Maddieson's Patterns of Sounds available in Singapore (there aren't any at the main libraries) so I could check.

At first I thought that when they said "the majority of languages use a 40 phoneme range" that each individual language had ~40 phonemes. This is true for English but the usual range is 20-37 (according to this and UPSID) and the mode is at 25. Then I realised that they meant that there are 40 or so sounds that cover a respectable percentage of the world's languages. (Again, does Maddieson - or anyone else - say anything about this?) I guess that's plausible enough, after all vowels (for example) tend to have quite similar distributions around the vowel space don't they?

Elsewhere on Nexidia's site they say that for each language one has to define a "phonetic grammar" - and mention that

[a] phonetic grammar likewise depends upon the natural language in use (particularly the set of phonemes used to represent basic sounds and meanings of the input speech)...

So I *think* they've hit on the right thing any case, their system seems to work quite well whether or not they actually are using phonemes or phone groups or whatever - they're more or less market leaders and all I've seen are good reviews. I just wish they'd get their terminology right.

Well, that's what you get when you hire a bunch of electrical engineers and no linguists, I guess.

On irregular verb conjugations

Thinking about the forsook/forsoke problem, two questions came to mind:

(1) suppose someone doesn't "know" the past tense of a verb that happens to be irregular. What makes them decide which verb's conjugate to make the analogy to - if they don't choose to go the regular-plural route? OK, let me make that more concrete. I have no idea whether I already *knew* that "forsook" was the past tense of "forsake" back in primary school, when I argued with my teacher about it. After all, I couldn't have been more than twelve. But it's not a very common word, so suppose I didn't. Of all the possible verb conjugations to choose, why that one? After all, there's at least four conjugations I could have followed:

bake - baked - baked (regular)
make - made - made
take - took - taken (the one "forsake" follows)
wake - woke - woken (the one Logos decided to follow)

Is it something to do with the frequencies of the conjugations? Or perhaps the verbs' semantic closeness to "forsake"? Though I don't see why "take" should be any closer than "wake" or "make". This reminds me of the example we looked at in historical linguistics class, of how the plural of "dwarf" morphed from "dwarfs" to "dwarves" on analogy with elf-elves, since they were so similar semantically. Anyway, on to question...

(2) Why is it that the past tense and past participle in English are so irregular, while the 3rdSgPres -s (as in "he teases", "she laughs") and the continuous -ing ("they were doing", "I was playing") are so regular? The only irregularities I can think of with -s are "he does" (vowel change), "he is" (rather than "bes") and "he has" (rather than "haves"), and I can't think of any irregularities with "-ing". But why? Are these conjugations something newer, that haven't had the time to get corrupted by morphological madness? Or have they, for some reason, never changed - i.e. there has never been another way to form the continuous, whereas for the past tense there've been any number of ways? But then you have to explain why those are so stable while past tenses are not. Interesting question, I can't think of a simple way to answer it.

Tuesday, February 01, 2005

Singular "they"

Language Log has another post up regarding singular "they", in the context of an SAT grammar test and links back to a bunch of previous posts about the phenomenon. In discussing the sentences

This person is not ignorant.
They are a prophet...

Geoffrey Pullum comments:
The sequence they are exhibits, of course, the syntactically correct plural verb agreement. The following phrase a prophet is a singular predicative NP complement.

So here's my question. If you're using they in a singular manner, what's the reflexive form? Themself, or themselves??? The reason I'm asking is because my "significant other" (there's them air quotes John McWhorter was talking about in his LL post!) actually said "themself" the other day. I can't remember the actual sentence, but it was something like "they bought themself an X" where the antecedent for "they" was clearly singular.

I don't know which of the two would be "right", I can see arguments for saying either. Searching for "themself" on Google yields references to an Emily Dickinson poem, some clearly incorrect instances where "themselves" was what the writer wanted, as well as genuine singular-themself instances such as

Everyone post a photo of themself?
inspire one person to better themself on 43 Things

as well as a discussion of this very phenomenon here.