The Big Sing

John H. McWhorter

The Big Sing

A new account of the origins of language.

September/October 2006

I must admit that the basic premise of Steven Mithen's The Singing Neanderthals—that human language was an outgrowth of Homo sapiens' natural propensity for singing and music-making—makes me a little itchy. When Mithen gives us a chapter subhead like "Nina Simone singing 'Feeling Good': Homo sapiens at Blombos Cave, wearing shell beads, painting their bodies and feeling good," I know he's just making the text reader-friendly. But this is the kind of idea that attracts many readers less because of its scientific coherence than because it's such a cool notion.

Traditional analysis of our mental capacity for language depicts it in a coldly abstract manner couched in a user-unfriendly jargon, focused on obscure wrinkles in how we talk that strike the layman as dry and trivial, such as the fact that you can say Who do you think will say what? but not What who do you think will say? How much more appealing the idea that language actually piggybacked on something as warm, visceral, and social as music. And an evolutionary tale featuring jamming Neanderthals will find a ready audience even among many academics, happy to be exempted from dealing with the navel-gazing abstruseness of modern theoretical linguistics.

After all, the aforementioned savants have yet to present a scenario for how language could have emerged among a species that began without it. It's a tough nut to crack: listen to apes grunting, and then explain how to get from that to Oscar Wilde's "I can resist everything but temptation"—or even "Pass the salt." In these circumstances, we need as much fresh thinking as possible, and Mithen's is certainly that.

The usual idea is the intuitive one: that language began with words—Leopard! Ouch! and so on—and that gradually humans began putting the words together as sentences expressing more complex thoughts; i.e., grammar was born. But Mithen argues, on the contrary, that humans first communicated via little strings of syllables expressed with musical intonations, similar to animal calls. Indeed, Mithen thinks that people would have started by imitating such calls themselves, rather as if cavemen watched a pack of wolves hunting down an elk and started saying "ruffRUFFruff" among themselves when gathering to hunt down big game. After a while, ruffRUFFruff would come to "mean" Let's hunt. Animals such as birds and, more significantly, our closer relatives, monkeys, develop a considerable repertoire of calls to signify warning, fear, domination, submission, and so on; Mithen suggests that early humans did the same and then some.

The reader is to be pardoned if so far it sounds a little far-fetched that we ever got from here to We, the people, in order to form a more perfect union. But Mithen—drawing on work by University of Cardiff linguist Alison Murray—proposes a clever mechanism for doing just that. Imagine that when you wanted to tell someone to give something to a woman, you would warble tebima!, and that there was another warble, kumapi!, for when you wanted to tell someone to share something with a woman. Tebima and kumapi would not be "words" or "sentences" but just calls, like ruffRUFFruff. But then suppose a smart human noticed that both calls had -ma- in them, so that abstracted, ma could be taken to mean "her." Here would be the birth of a word. And then imagine that humans abstracted lots of words like this, and started combining them to express whole thoughts—maybe, for instance, something like ma ruff to mean she hunts.

Mithen supports his scenario with various observations. The South American Huambisa tribe call one bird a chunchuíkit and one fish a máuts. Asked which word named a bird and which a fish, 98 percent of a group of American students answered correctly. Hence, Mithen argues, such "musical" intuition could have generated the "calls" from which the human proto-language was constructed. And parents worldwide talk to their babies in a simplified singsong manner decorated with musical coos; in turn, the babies respond to such vocalizing more readily than to ordinary "flat" speech. Infants seem to begin life with a musical intuition that serves as a springboard for learning language.

But why should this musical communication have evolved into language only among Homo sapiens and not also among the countless earlier members of the genus Homo or its direct precursors, such as the famous Lucy, a kind of post-ape? Mithen has it that earlier Homos lived in such small, tight groups that there was no need for sophisticated communication, since experience was so tightly shared and predictable. Real language was an advantage only when humans developed more populous societies, with social stratification and frequent contact with other groups. Bipedalism would also have helped, since having hands free would have allowed aggressive and transformative interactions with the environment that are clearly impossible for our housecat, whose front paws are (almost) always supporting them on the ground. Predictably, then, Homo sapiens has been found to have a gene connected with language, a gene that chimpanzees and apes only have in an alternate form.

This is one of those books where an infectious passion bubbles under the prose. I have held off revealing until now, to avoid its infelicitously lightweight air, that Mithen terms his hypothetical musical-call precursor to real language "Hmmmm," an acronym for Holistic Manipulative Multimodal Musical and Mimetic communication. Hats off to him for venturing an account of how language arose that could square with the tenets of Darwinian natural selection, given that leading theoretical linguists seem oddly uninterested in explaining just how their notions of what is configured in our brains could possibly have upped the odds of survival for early Homo sapiens individuals running around on the African savannas. Yet ultimately, Mithen's scenario leaves more questions than it answers.

For example, if language grew out of music (i.e., if Nina Simone albums shed light on how Sanskrit and Mandarin Chinese emerged), then we would like to find that music is processed in the same part of the brain as language. But Mithen is honest enough to show us that mostly, it isn't—language is centered on the left side. (Apparent counterexamples, including the notorious case of a woman with brain damage who lost the ability to hear intonation in speech—and could no longer appreciate music either—turn out to have little traction; the woman in question had suffered damage on both sides of her brain.) By and large, the brain does not process language and music as variations on the same process.

Another problem: Traditional linguists who think language started with words cobbled together in a primitive way can show intermediate steps, as when chimpanzees are taught sign language, which they use to combine two or three words at a time in primitive ways. Infants' first attempts at talking are similar: Me fall down. But babies do not go through a phase in which they communicate only or even mostly with singsong phrases, nor do chimps vocalize in any like fashion, despite sharing about 98 percent of our dna and even having a variant of our language gene.

And we can't help but wonder just why a group of early people would get to the point of having singsong fragments making distinctions so fine as "give that to a woman," as opposed to a man. Give it—that's one thing. But why bother singing about the fact that the giving is to someone who is not a man, when if one would be uttering a phrase about giving, presumably the non-man would be right there? Why would there be "calls" referring specifically to sharing as contrasted with giving? Indeed, just how many words would "Hmmmmm" yield? For example, do we really imagine that there would have been multiple "calls" that you could have abstracted a concept like heavy out of? How, precisely, would early Homos have come up with a word for heavy from spiky battle-cries, yelps of warning, and burbles of affection? Mithen does not even broach that question.

Then there is the "civilization" problem. Mithen has it that Homo sapiens adopted a gene for real language when humans got beyond the state of tiny bands of hunter-gatherers, society stratified into distinct classes, and groups began encountering one another more often, thus needing to communicate often and intimately beyond their very dearest. This implies that Homo sapiens was closer to "urbanity" than earlier Homos—but in this, Mithen is tens of millennia too early. The "language gene" foxp2 traces back almost 200,000 years, and there is not the slightest evidence that humans drifted into anything anyone would recognize as "civilizations" until much more recently—perhaps 100,000 years ago, give or take ten or twenty thousand. Homo sapiens definitely had something on Neanderthals and Lucy in terms of painting caves and burying people with style, and this may well have meant, as Mithen and others suppose, that Homo sapiens had real language and his predecessors didn't. But the Homo sapiens who distinguished himself thus was, nevertheless, a humble hunter-gatherer just like the Neanderthals, albeit aesthetic, ceremonious, and articulate, just as modern ones are.

In the end, Mithen's idea reminds me of an artificial language that an enterprising Frenchman came up with in the early 1800s called Solresol. It was to be a language based on pitches of the scale, a language that one could sing or even play on an instrument. Do-re-mi meant day. Do-re-fa meant week, do-re-sol meant month, do-re-la meant year, and so on. Mi-sol meant good, while sol-mi meant bad.

As utterly adorable as that seems, we hit a wall in wondering how in Solresol one would have said Look how he can't even jump halfway over the wall they put up in Lyon just because that guy threw him when he asked him if he was really up to it.

Yet even a drunk could belch out that sentence without pausing to think. That kind of sentence, flexible enough to encompass the utterances of Shakespeare, Bob Dylan, and George W. Bush, is what our genetically specified language competence allows us to generate effortlessly, and figuring out how humans got to this point— when neither our house pets nor the chimpanzees coached at multimillion-dollar research centers can come remotely close to communicating on that level—is currently a matter of supreme frustration for scientists of language. Sympathetic though I am to an insightful scholar like Mithen, I cannot say that he has brought us much closer to a compelling solution than anyone else.

John H. McWhorter is a senior fellow at the Manhattan Institute. He is the author most recently of Winning the Race: Beyond the Crisis in Black America (Gotham Books). Among his other books is Defining Creole (Oxford Univ. Press).

John H. McWhorter

The Big Sing

*

November/December 2016