New Working Paper on Zulu published

I recently gave a talk on Zulu morphosyntax in which I (hopefully politely an respectfully) challenged some of the mainstream approaches to Zulu syntax. The working paper is now out, in the Proceedings of the Linguistic Society of America, available here (pdf download under "full text").

It's not a fun read for a layperson, but the general gist is that (1) a lot of previous syntax work doesn't pay enough attention to the phonology, (2) the justifications for arguing that the noun augment is really a determiner are a little shaky, and (3) if we just treat the 'linking vowel' as a determiner, everything is simpler. This has the unexpected outcome of also suggesting that Zulu has construct state, something known (and controversial) in Semitic languages, but not known to exist in Bantu languages. To paraphrase a colleague at Penn, I've reduced a seemingly unique thorny problem to an already known thorny problem, which is about as good as you can hope for in syntax.




©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What would Wakanda sound like?

Today, Marvel's Black Panther is released. The Black Panther, aka T'Challa (played by Chadwick Boseman), is the king of Wakanda, a fictional country in Africa (neighbored by other fictional countries like Azania and Narobia (but not Nambia). While I'm extremely excited for the movie (NO SPOILERS PLEASE), I don't have high hopes for a surprise fictional language in the movie, given the pre-film hype about the inspiration for design elements, costume, and even T'Challa's accent. In previous films, T'Challa's father was played by a Xhosa speaking actor, and it now seems that Xhosa being spoken in Wakanda is now Marvel Cinematic Universe head-canon.

Geographic improbability aside, I don't have a problem with this, as Chadwick Boseman does a great Xhosa accent --- far better than, say, Morgan Freeman in Invictus. But, given that Wakanda is supposedly 5,000km away from South Africa (where the non-Wakandan Xhosa people are), what would the languages of Wakanda sound like? This is just a short blog post to (shallowly) explore that question with some links for the interested.

Location, Location, Location

Wakanda is situated somewhere in East Africa, by either Lake Victoria, or Lake Turkana. That means it's somewhere around Uganda, Kenya, Rwanda, Ethiopia, and South Sudan. What's great about this is that it's an area where a lot of languages from different language families are spoken. So the five major ethnic groups in Wakanda could all potentially have their own very different languages.

What about the comics?

The character and country were created by Stan Lee and Jack Kirby in 1966. Both white guys, neither linguists. So there are a lot of elements of the Black Panther mythos that have names that sound, well, like what a white guy would make up to sound exotic and African (or look it on the page). That said, certain things are just part of the canon. So The kings have an (evidently) ejective /t'/ as the first part of their names. The all female fighting force, the Dora Milaje are called what they're called. Anyone contracted to construct languages for the MCU will have to work with the existing material, much like how Marc Oakrand developed Klingon by building around what was already uttered on-screen in Star Trek.  And, that will have an effect on the backstory and character development. To my knowledge, Ta-Nehisi Coates and other recent writers have not done a deep dive into the linguistic side of Wakanda, but we can't really expect Ta-Nehisi to solve everything for us.

What's spoken in that area?

As I mentioned above, that particular (vague) part of East Africa has representation from a few of the major families: Afro-Asiatic, Niger-Congo A, Niger-Congo B ("Bantu" languages), and Nilo-Saharan languages.

In Kenya, there is, of course, Swahili --- a Bantu language spoken by 50 to 100 million people and a lingua franca for the region. Swahili's huge number of speakers means you can hear it on internet radio if you want. It also means that it has lost lexical tone (when the pitch of a word or syllable changes the meaning), and because it's used for trade by so many people who speak so many other languages as their native language, it is relatively regular, meaning there's not a lot of unpredictable grammatical stuff.

But there's also a lot else spoken there. Kenya alone is home to 68 languages. The most prominent of which are Kikuyu, with 8 million speakers, and Dholuo, or Luo, with 4 to 5 million speakers.

The latter, Dholuo, is not a Bantu language, but a Nilo-Saharan language. What's the difference? The main difference is that all the Bantu languages group nouns into types (think gender in European languages, except there's 10-17 of them). Every noun has a prefix for its noun class, and the prefixes generally com in pairs (singular vs plural). So in Zulu (and Swahili!) the base form for the noun 'person' is ntu. But this doesn't just show up on its own. Rather, it has one of these noun class prefixes, as in :

  • umuntu 'person'
  • abantu 'people' (hence the name for the languages...they all call people some form of "bantu")
  • ubuntu 'humanity, humanness' (whence also the operating system).

So you can get phrases like umuntu ngumuntu ngabantu: "a person is a person through other people".

Bantu languages also generally have a LOT of sounds, but simple syllable types, almost always CV --- Consonant Vowel. You'll never see a word like English strengths. This is obscured by the writing a bit, so for instance, <ng> in Zulu is one sound, not two (the sound of <ng> in sing). Swahili also has syllabic nasals, so for instance, the <m> in mzungu 'white person' is it's own syllable: m-zu-ngu.

Back to Luo: Luo has vowel harmony, meaning all the vowels in a word have to share the same feature. What's the separating factor? How advanced your tongue root is. So words with the vowels in (an American pronunciation of) bean, bait, bot, boat, and boot, are one class, and words with the vowels in bin, bet, bat, bought, and foot are in another. A single word will not have vowels from both groups, only one.

Even cooler, Luo grammatically distinguishes between alienable and inalienable posession, so for instance, the word for a dog's bone has different forms depending on whether you mean the bone is part of the dog's skeleton, or a cow bone it's chewing on. If it can be taken away, it's got a suffix marking that fact.

Wakanda is also close to Ethiopia and South Sudan, where Afro-Asiatic languages are spoken. The most well-known subset of these are the Semitic languages, which include Arabic and Hebrew, but also languages Americans are often less familiar with, like Amharic, spoken in Ethiopia.

Amharic, like other Semitic languages, has what's called non-concatenative morphology, meaning that words aren't always built by adding prefixes, suffixes, or infixes, but are instead built with a system of (unpronouncable) roots that combine with vowels in between. The standard example linguists use is from Arabic (also spoken in that region), where k-t-b is always in things related to books and writing, but the vowels make it mean different things: kitaab 'book', kataba 'he wrote', kutib 'was written', etc. Amharic, like Swahili, has a massive number of speakers: roughly 22 million. It also has an objectively cool writing system.

Semitic languages like Amharic and Ge'ez are not the only Afro-Asiatic languages, though. To the south of Lake Victoria (so, somewhere sort of near Wakanda?) Iraqw, a Cushitic language, is spoken by approximately 460,000 people (because it's spoken by a much smaller number of people, the best video I could find was about porcine cysticerosis --- tapeworm in pigs).

And of course, we've established that Xhosa is MCU head canon (I really want to know the back story of how they first arrived in Wakanda, reversing the Bantu Migration, and how they rose to power!), which means that one could expect to hear clicks in Wakanda, too.

Wakanda Forever!

Given pre-release ticket sales alone, it seems like Hollywood has been sleeping on Black Panther's type of pan-African magic just the way the rest of the world has been sleeping on Wakanda's advanced technological civilization. If we're lucky, BP is going to be a smash hit with future films, TV series, Spinoffs...and maybe we'll get to hear the sounds of Wakanda just as we hear the sounds of Essos and Valyria, Middle Earth, and Qo'noS.

A great resource for the IPA

One of the best tools a linguist uses is the International Phonetic Alphabet, however learning it can feel daunting. I have historically referred students to the wikipedia page on the IPA, because it has links to individual pages for each sound, with descriptions of how the sound is produced, and audio recordings.

Now, there's another tool: an interactive IPA chart with a cross-sectional MRI so you can see the position of the tongue, lips, velum, etc. while a sound is being produced.

It's courtesy of the UCLA Speech Production and Articulation Knowledge Group, and can be found here.

One caveat: of the five available speakers, John Esling is the only one who pronounces the alveolar click /!/ correctly. Everything else seems to be great across all speakers.



Fun With Morphology!

Causative Smallening

Friends and family members have recently said some morphologically interesting things, and I decided to take a quick second to put them down here, for posterity, because they're so freaking cool.

The context for the first was manipulating images for a slideshow. The sentence used was:

I smallened it

Everyone clearly understood it as "I made it smaller," and also knew that it was non-standard. But why?

Well, some adjectives can be made into inchoative verbs. This means if you have some adjective X you can make a verb that means 'to become X'. It's super easy: just add an -en to the word:

  • darken: to become dark
  • redden: to become red
  • liven: to become more alive/lively
  • quicken: to become quick.
  • leaven: to raise (from an older word in English we no longer have, ultimately from Latin levare 'raise')
  • toughen: to become tough
  • smarten (up): to become smart

These can also then be made transitive and are then causative verbs, meaning someone causes something to become X.

The thing is, it's normally taken to apply only to what linguists call a "closed set" which is a fancy way of saying you can only do it to some adjectives and not others. That is, it sounds weird to say "dumben it" (instead of "dumb it down") or "absurden the story" or "spicen the food."

And yet, we all have the grammatical competence to be able to (playfully) generalize to new instances, so everyone knew what "I smallened it" meant.


When linguists get to the morphology segment of Intro to Linguistics, we teach "bracketing" as a tool for recognizing the internal structure of words. It's literally drawing brackets around word-pieces (let's call them morphemes). For example:

  • [ nation ]
  • [[ nation ] al ]
  • [ inter [[ nation ] al ]]

Some kinds of ambiguity are then easy to explain, as in, the door is:

  • [ un [ [ lock ] able] ]  == unable to be locked ~ un-lockable
  • [ [ un [ lock ] ] able ] == able to be unlocked ~ unlock-able

Similarly, we can bracket words that go together in sentences:

  • [ [that ridiculous man ] [ looks [ dumb ] ] ]

Sometimes, though, things break free. A classic example is the suffix -ish, which for many people now can modify much more than adjectives:

  • It was a yellowish color.
  • I guess I was excited about it, ish

All of that was to get to a family member recently saying:

There's no point in waiting to leave, it's not going to get any not dark er

That is, it's not going to get any [ [ not [ dark ] ] er ], where -er is modifying the complex structure not dark.

Often, linguists will treat these kinds of examples as mistakes, play, or somehow not part of the object of study (and make pronouncements like "inchoatives derived from adjectives are a closed set" and sometimes even claim that words like smallen are "impossible"). I think it's important that we take these kinds of novel forms --- forms that sometimes challenge theory we've learned in grad school --- seriously. In part, because if you start listening for them, they happen all. the. time.

Happy listening!



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Habitual hiring

I recently came across a couple of images of African American English use in hiring signs. I think they could be an excellent tool for teaching about AAE in Introduction to Linguistics, or Intro to Sociolinguistics, since

  1. neither is 'standard' English
  2. they have a difference in meaning
  3. The difference in meaning affects strategy for someone on a job hunt.

So without further ado, let's say you've been out hunting for work, and you're down to your last resume. Which of these two places do you take your last resume?


If you don't speak AAE and don't know about its system of tense and aspect (which is more complex than mainstream American English), you may think it's a toss-up between the two.

However, you'd be wrong.

  • we hiring features what's called copula deletion, which is common in many languages (including Russian, Arabic, and others). It means "we're hiring (right now)".
  • we be hiring makes use of habitual 'be' which is a grammatical marker of, well, habitualness. It means "we are usually/habitually/often hiring."

Therefore, if we're to trust the signs, you've got a better chance of being hired right now going to the first store.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Bare Subject Relatives and the Sophisticated Complexity of AAE

[Trigger warning: My focus here is on a syntactic phenomenon, but a video example I'll be focusing on includes the threat of police violence and a man hypothesizing about his death at the hands of the police. The man in question is alive and unharmed.]

I've been thinking a lot lately about the complexity and sophistication of AAE syntax. Much of the work and outreach around AAE in the last 50 years has been trying to demonstrate that AAE is neither deficient nor wrong. There's a big jump, however, between not wrong, which is the end goal for many linguists, and marvelously rich, which seemingly hasn't percolated through the field much beyond AAE specialists. Christopher Hall, a colleague and friend, often implores lay people to flip the script and think about language with the starting point that AAE is the default against which other dialects should be judged.

My focus here is a syntactic feature of some varieties of African American English that doesn't get as much attention, but that is surprisingly common (especially in the South) is referred to as null subject relatives, or bare subject relative (clauses).

A recent, salient example can be found in this video a motorist took of himself chastising a cop for approaching his car with his (the officer's) service weapon drawn, about 15 seconds from the end. It probably goes without saying, but the video may be triggering for some.

[Link to video here]


The subtitles say "Dad shot dead by a cop who made a mistake," however this is yet another case of reporters "translating" AAE. The gentleman actually said:

"Dad shot dead by a cop made a mistake".

What's going on here?

Well, first, a relative clause is like a little mini sentence or sentence fragment that adds more information about a part of the main sentence. For instance:

  • That is the man [who I saw yesterday]
  • That is the man [who saw me yesterday]
  • The book [that I recommended to you] is on sale now.

In most varieties of English, you can delete -- that is, not say -- the relative marker (who, which, that), if it refers to the object of the relative clause.

To take the first example, the man who I saw yesterday, we can rework the relative clause as meaning something like "I saw him yesterday." In fact, many varieties of English make use of such resumptive pronouns, so it would be perfectly natural to say "That's the man who I saw him yesterday." And unsurprisingly, this kind of things is cross-linguistically common, and in some languages it's obligatory.

So if it's:

  • I (subject) saw him (object)

Most varieties of English allow you to do away with the relative marker:

  • That's the man who I saw yesterday
  • That's the man ___ I saw yesterday

AAE is interesting in that it also allows deletion of the relativizer if it marks the subject.

  • That's the man who saw me yesterday.
  • That's the man ___ saw me yesterday.

This is pretty well described in the literature, so for instance, Stefan Martin and Walt Wolfram have a chapter in Salikoko Mufwene's book African American English: Structure, History, and Use that gives a ton of excellent examples:

  • He the ___ man got all the old records
  • Wally the teacher ___ wanna retire next year
  • Jill like the man ___ met her brother last week

The above example in the video was particularly interesting because syntactic structure of the full utterance is extremely complex.

There's a pernicious and widespread view that AAE, or "ebonics" is somehow inferior or defective. It's widely regarded as both "simpler" than "standard" English, and simpler in ways that are "broken" or "wrong." However, not only does it have more complex grammar in some respects, but AAE speakers deploy sophisticated combinations of syntactic structures even under extreme stress. The sentence the motorist in the above video uttered makes use of:

  1. An "imposter" construction in which the speaker is understood to mean himself when using a name/title ("Daddy") instead of a first person pronoun ("I").
  2. Copula deletion ("Daddy shot" instead of "Daddy was shot"). This is very common cross linguistically, and is standard in Arabic, Chinese, Russian, etc.
  3. A resultative compliment to the verb ("shot dead")
  4. Passive voice --- with copula deletion --- which we understand because of the resultative. Compare "Daddy shot a gun" vs. "Daddy shot dead."
  5. A bare subject relative ("a cop ___ made a mistake").

This is a sophisticated interlocking clockwork of syntactic structures, produced under extreme stress. A tree diagram of this sentence would show all kinds of movement and deletion. And there's some evidence that people who speak other dialects do not have the complex grammatical knowledge to correctly parse this kind of utterance. And yet, people like this motorist are routinely treated as though their language is deficient.

It's a starting point for us linguists to point out that AAE is rule-governed and syntactically well-formed. However, I don't think this goes nearly far enough. "Technically not inferior" is a far cry from the truth: AAE is a varied, complex, sophisticated language variety that makes use of many complex grammatical rules that "standard" English lacks. AAE speakers are doing things other people don't understand, and not because the AAE speakers are wrong, but because they have a fuller syntactic toolbox.




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Lately, I've been noticing a particular phenomenon in speech, in which the word "particularly" is pronounced more or less as "partickerly" or "partickly."

It turns out I'm not the only one to notice this, as Mark Liberman has an excellent, and much more in-depth description of the phenomenon at Language Log, with a ton of excellent audio.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

A linguist's take on the Great GIF Controversy

The Conflict:

For years, the English-speaking internet has been divided. We cannot agree on how to pronounce gif, the acronym for graphics interchange format. Much as with the dress, each side thinks their own position is the only correct one, and that the other side is absolutely crazy. And much as with the dress, it's probably a little more complicated.

People write articles with titles like you are 100 percent wrong about how to pronounce gif. People share mocking gifs with arguments bolstering their point of view. People yell at one another. Things get entirely too heated.

I intend to shed some light on this situation.

The Options:

There are technically three ways you could pronounce gif in English, although the conflict is over the first two. The three are:

  1. so-called "hard g" which linguists represent with /g/. This is <g> as in "gift".
  2. so-called "soft g" which linguists represent with /d͡ʒ/. This is <g> as in "George." It is also sometimes represented with <j> as in "Jazz".
  3. The "French" or "super soft g", which linguists represent with /ʒ/. It is in (some) pronunciations of "rouge". (Note that some English speakers "nativize" words with this to have the /d/ sound in the "soft g", so what I call "baton rouge" they may call "baton roudge".

While I relish in ironically using the third option and watching people on both sides of the hard/soft g debate lose their minds, I recognize that nobody is going to take seriously the argument that "French g" is correct.

The Arguments:

Arguments for "hard g":

  1. It's an acronym, and the word the <g> comes from is one where it is pronounced "hard" (namely, "graphical").
  2. We often pronounce acronyms differently than we would pronounce a word spelled the same way (CIA is "see eye aye" and not "kia").
  3. Feelings. People have really strong feelings that this is the only correct way.

Arguments for "soft g":

  1.  Lots of words spelled with <gi> are pronounced with a "soft g": ginger, gin, giraffe, giant...
  2. It's easier to pronounce gif as a word and not as an acronym. Nobody is actually saying "gee eye eff". If you're going to make it a word, then make it a word!
  3. "Foreign" words often have a "soft g" (giraffe...).
  4. Feelings. People have really strong feelings this is the only correct way.

A dash of science:

I decided to take a look at this list of over 58,000 (relatively common) English words, and see what the patterns are for g-words.

There are 1836 words that start with <g> in this list, and there's not a clear rhyme or reason to the choice of "hard" versus "soft" g, so one would have to look at each of them to get a sense of the overall pattern. That's a pain in the ass. However, there is a helpful fun fact from linguistics that can constrain this problem a bit more:

"soft g" often comes from a combination of sounds, historically: a "hard g" followed by a non-low front vowel. What does that mean? That means that for the vowels /i/ "bead", /e/ "bade", /ɪ/ "bid", and /ɛ/ "bed", your tongue is actually higher in your mouth, and closer to the front of the mouth than it is for the vowels /u/ "booed", /o/ "bode", etc. The "hard g" sound is made by the back of the tongue forming a closure at the back of your mouth. These high front vowels tend to cause people to move their tongues slightly forward, and over time (we're talking hundreds of years) the sound changes to one made intentionally further forward. "Soft g" is created by a tongue closure further forward in your mouth than "hard g". Try saying words with them and pay attention to where your tongue is. (Try it! It's fun!)

This fact is part of why Italian spelling is so weird, for anyone who's tried to learn Italian.

All of that means I don't need to bother with words like "goof" because nobody is going to pronounce that with a "soft g."

So I chose to limit myself to words that start with <gi>. It turns out there are 102 of them, which meant I could simply read them and split them into "hard" and "soft". Of those, 30 are "soft" and almost all of this are of foreign origin.

30/102 (29.4%) of words that start with <gi> have a "soft g."

It's not entirely unreasonable then to thing that gif should perhaps be pronounced with a "soft g." People will argue There are more with a hard g, and that's true, but the same people will say that "soft g" is crazy, which is clearly not true.

BUT WAIT. What about words with <ge> you ask? I'm glad you asked. There were 223 of those. Of them, 197 were pronounced with a "soft g" (e.g., gene, gender, geriatric, geology, gelatinous...).


197/223 (88.3%) of words that start with <ge> have a "soft g."

This means that:

Of all of the words with <g> where it could be pronounced hard or soft, 227/325 (69.8%) are pronounced with a "soft g".

It's also worth noting that in the particular list I have, fully 38% of the words are <g> either <i> or <e> and then <n>. This is important, because many people have what is referred to as the PIN-PEN merger, meaning that <i> and <e> before <n> are pronounced the same. That means Jim and gem are both pronounced the same (namely, as Jim). This is a feature of Southern American English, pretty much the entirety of the West, most of Canadian English, and most of African American English. A LOT of people do this.

This means that even if they're limiting themselves to only words that are pronounced <gi>, there are 109 more words in this list that they believe are pronounced with the "ih" vowel than if they don't have the PIN-PEN merger.


For people with the PIN-PEN merger, 139/211 (65.8%) of <gi> words are pronounced with a "soft g."

The Takeaway:

Even if people are being completely rational about their decision about how to pronounce gif, it's informed by their dialect, and their personal pronunciations of other words. While it is rational to say "it's from graphics which has a 'hard g'" Nobody is saying "gee eye eff" (which coincidentally, has a "soft g"). While it's rational to say that foreign words are often nativized with a "soft g" (like giraffe), nobody says "gift" with a "soft g".

Finally, even if people are thinking statistically about it (even if it's sort of "fuzzy" math based on what they have heard in their life and not hard numbers), The conclusions they come to are dependent on their dialect, speech community, and vocabulary.

This is why I ironically go with the "French g": if you have strong feelings about the pronunciation of gif, no matter what they are, you're probably wrong. And if you're having the argument, it's because someone tried to share an image with you. Why not just be nice, instead of pedantically (and no matter what side you choose, wrongly) lecturing your acquaintances on how to say words?




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Bill Maher, the N-word, and that pesky R

[Trigger warning: n-words]

Bill Maher is in the news right now for dropping the n-bomb on his show in a context that many, many people found offensive. Predictably, people are coming to his defense with two arguments: (1) he was referring to himself, and (2) he "didn't say the /r/."

As a linguist, and as one of the handful of us who has given serious thought to the n-word(s), (shout out to Christopher Hall, to Arthur Spears, and to Geneva Smitherman) I want to weigh in with a (socio)linguistic perspective. My argument is:

  1. It was not ok for him to say either, and,
  2. White folks (in general) should not say either if they don't want to offend, because
  3. It is an artificial distinction for most white people, if they are borrowing from a dialect they do not speak...and the vast majority of white people do not speak (or understand) African American English, natively or otherwise. And also,
  4. In most white people's native dialect, the only n-word is a slur.

Elsewhere, Christopher Hall and I have written about the grammatical and social functions of the n-words in some varieties of AAE. We argued that there are multiple words that all include the "n-word" that fulfill various grammatical and pragmatic functions: from first person pronouns to social distance markers, to politeness (yes, politeness) forms. If you are not a native speaker of AAE, it is easy to misunderstand these uses because they are what Arthur Spears coined the term "camouflage constructions" to describe. That is, they look like they might mean something else, and so people assume they understand when they don't. Recent pilot work on cross-dialect comprehension that I worked on with a team at U Penn and NYU confirms that in general, white folks don't understand the range of uses of the n-words.

More importantly, these are uses that occur in African American English, which is a dialect that has its own accent (really, range of accents, but we'll set that aside for now). Crucially, most forms of AAE are what linguists call non-rhotic, meaning /r/s after vowels are often not pronounced. Many white dialect varieties are not non-rhotic, including Bill Maher's normal speech. So Maher will make the argument that nigga and nigger are different words, and that he said the "acceptable" one.

HOWEVER, Maher, I would argue, only has nigga in his vocabulary as a taboo deformation of the word nigger. It's the same as claiming he didn't call someone bitch, he called them betch, or bish. The point is to say "I technically didn't say the word" while still saying the word.

Here's the crux of my argument: If you don't speak AAE, whether you borrow AAE sounds or not to say nigger doesn't change what you're saying. For people to be comfortable (or less uncomfortable) with Maher's use of nigga, he'd have to (1) use it in the appropriate social context, which this was not, and (2) back it up with literally any other features of AAE... and this would still probably not make it ok. As is, he was just "being edgy" by saying a taboo word he knew would offend.

That is, we white folks don't get to say "I was using that word like you people do!" without actually being able to use any other words like AAE speakers. If the accent is right, if the word choice is right, if the grammar is right (yes, you can butcher AAE grammar --- it is as systematic and rule governed as any other language variety), and if the cultural context is right you can maybe get away with speaking AAE as a white person. Notice I didn't say "saying the n-word". That's still pretty much off the table. Even if you understand the grammar, social function, and pragmatics of use. 

Here are some tips and general rules of thumb around the n-words if you don't want to offend, and you're white in America:

When you can say "nigger" without offending:

  1. maybe in citation, either directly quoting old racist stuff, or discussing the word itself, best if at a linguistics conference or conference on race, and even then you might encounter pushback.
  2. never in casual conversation.

So basically, you can't.

When you can say "nigga" without offending:

  1. To a POC who has specifically said to you "yo, we cool, you can call me nigga. You get a pass." To that person ONLY. Probably not within earshot of anyone else. I've never heard of this situation occurring, but who knows. Also, even if you find yourself in that situation, if you actually do it, I'm not saying it's gonna go great, or that I endorse that path. 
  2. discussing the word nigga in citation form at a linguistics conference. And even then, not everyone will agree.
  3. Never in casual speech.

So theoretically it's possible, but maybe just don't.

The distinction between r-full and r-less forms has a long history, and linguists are not remotely settled as to the history of the word (for instance, Hiram Smith argues the semantically neutral r-less form goes back 200 years or more). While it's interesting, it's completely orthogonal to the question of whether it's appropriate for white people to say it. Because it has been a slur in white English from its beginnings to literally right now, in both r-full and r-less varieties of white English, people like Bill Maher don't get to decide that it no longer has all that historical baggage.

And even if you deeply understand its use in AAE speaking communities, and participate in those communities, if you actually care about the people in those communities, you still won't say it.  Even when it's linguistically appropriate. Because our language use is culturally and socially situated.




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!



I ran all Trump's tweets through a neural net to try and figure out the meaning of Covfefe. Here is what I learned.

Unless you've been living under a rock, by now you probably know that just after midnight two days ago, Donald Trump tweeted:

Despite the negative press covfefe

Twitter went wild, predictably. For two days now, there has been heated debate about (1) how to pronounce covfefe, and (2) what covfefe means. Yesterday, Trump's press secretary Sean Spicer declared that Trump meant to type covfefe, and that its meaning was known to Trump and a select few others.

In an attempt to get to the bottom of this mystery, I decided that semantic Word Embedding Models might be useful. I have written about such models elsewhere. The (extreme oversimplification) general gist is that if you treat some document or documents as a big bag of words, you can start to treat individual words as being related to one another by their position in a (high dimensional) space. Like words cluster together, dissimilar words are far apart in this vector space. The actual implementation is technically a "Feed forward neural net" that "fine tunes through back propagation," but this is all linear algebra and code and ignores the fun of it.

In order to try to get at the Mystery of Covfefe, I decided to train a word2vec (that is, word to vector) model in R, using Ben Schmidt's wonderful package in R. In order to do so, I first needed to gather all 30,999 Trump tweets (at the time I gathered them). I did so by cloning the Trump Twitter Data Archive (note: if you have a cool coding idea, chances are someone did most of the work already. I'm learning half of coding well and fast is just finding the appropriate already collected data, already written module/library, or already worked recipes).

Once I gathered all 30,999 Trump tweets, I needed to clean them. I did minimal cleaning on the data set, so I just made all words lowercase, eliminated punctuation, and eliminated common "stopwords" -- words like "and, are, in, at, be, there, no, such" etc. This has the effect of normalizing a bit, so sad and SAD! are treated as the same word. I have not yet gotten around to lemmatization: grouping words like ran, run, running all under "run", but I'm not sure to what extent that will really affect the output.

Having run the results through Word2Vec, I did some quick sanity checks by investigating which words are the closest to a handful of given words. Closest to could? would, honestly, and can. Closest to america? safe, again, outsider, make, lets. Closest to new? york, hampshire, albany, yorkers

Clearly, it's working the way we would want it to, but are these really Trump's tweets? Closest to hillary? clinton, email, unfit, crooked, judgement, 33000, temperament. Closest to rosie? odonnell, theview, unprofessional, rude, bully.  IT WORKS!

As I did before, I chose to visualize the word embedding space by using t-SNE (for t-distributed stochastic neighbor embedding). This does not preserve relationships exactly, but keeps near things near to one another and far things far. I present the full results for your enjoyment:

 Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Some really fun/interesting/hilarious clusters emerge. There's read book art deal. There's barackobama obama obamas china iran. There's my favorite: totally sad bad terrible wrong. There's the small cluster of bush cruz. There's scotland golf course.

What's missing? Covfefe.

So I decided to up the size of the model and include more words. Normally, you want 200-500 vectors in a model like this. I gave it 1000. The results are even better.



This model results in a cluster: realdonaldtrump mr awesome 2016. And, as a quality check, crooked is still right next to hillary.

But where's covfefe?

STILL not in the model. When I manually search for it, it shows up as excluded from these findings, and is returned next to realdonaldtrump, you, and i. Which is, frankly, perfect. Perhaps Covfefe is the word for all of us together with realdonaldtrump.

I know that's kind of a cop-out, but in the process, I learned a few other interesting things. In no particular order:

First, pick almost any word and the top 10-20 nearest words in either of the resulting vector spaces will include some negative sentiment. GOP? Establishment. Christian? Jailed. Beheading. Media? Fake.   He's even hard on Russia in tweets: Russia? Traitor, laughs, taunting.

Second, closest to Ivanka? Daughter. For Barron, you have to wait till number 5 for "son" (most of the top 10 are family related words, or the names of family members).

Third, closest words to usa are miss, pageant, missuniverse, and perplexingly moscow. If you subtract pageant the closest word to usa is...balls. Checks out. Also, further down the list needs, trump, and businessman.

This brings me to one of my favorite findings. A classic example of word embeddings capturing something about semantics is that on other data sets these models have been trained on, you can add and subtract vectors meaningfully. So for instance,

paris - france + italy = rome

...which is intuitively correct. The classic example is:

king - man + woman = queen

Trump doesn't use the words man or woman all that much, actually, so in Trump's world:

king - man + woman = larry

I'm certain there are other relationships in the data that I've missed, but if there's anything that's clear from the above, it's that word embedding models really, really, really work (even if adding or subtracting "man" and "woman" are basically adding and subtracting zero, in Trump's tweets). I love the examples from cookbooks, historical newspapers, and RateMyProfessor reviews, but there's something really validating about these results, in part because Trump's speech (and twitter speech) is so colorful, and the above so clearly accurately captures it.

Finally, it looks like covfefe is off the charts, even for the surprisingly regular logic of Trump's twitter.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Cablese and Wirespeak

I'm always interested in jargons, cants, patois(es?) and codes, and recently learned that my father-in-law, a career newsman, didn't just have a problem with his throat this whole time, but rather has been communicating with me and in Cablese (and movie references).  I knew it wasn't that he had a problem with his throat, but I had no idea what was going on. Allow me to explain:

For about a hundred years after the invention of the telegraph, the main way news was shared was using the telegraph. Converting a story written in regular old English to Morse code was time consuming, expensive, and crucially, charged by the word. And you couldn't just stick words together, like say joining phrasal verbs like WRITEUP for "write up." That's obviously two words smushed together to get around being charged by the word. However, you could get away with making a new word like UPWRITE. Newsmen developed a complex system they used on the cables ("Cablese") as well as their own codes for the news wire ("wirespeak") and each news agency also had their own secret codes (so they couldn't get scooped). Even though they don't use the telegraph anymore, Cablese and Wirespeak live on. 

This last week, my father-in-law gave me the Rosetta Stone: the book Wirespeak: Codes and Jargon of the News Business. It is fantastic.

The book has chapters on Cablese, Wirespeak, and various news agency codes. It's chapter on Cablese is entitled "backwards run the words."

So how does it work? Well, first, anything that can be joined is. But backwards, so it's clearly a different word. So for instance, DOWNHOLD for "hold down." There's a story that when British writer Evelyn Waugh was asked to investigate a rumor a British nurse had been killed in an air raid he received the cable from his editor: SEND TWO HUNDRED WORDS UPBLOWN NURSE. Waugh investigated, found the rumors were untrue, and wrote back NURSE UNUPBLOWN.

That brings me to the second part of how it works: prefixes. everywhere. Most of them are Latin, but some are French, or other.


  1. CUM = with
  2. EX = from
  3. ET = and (e.g., MOM ETDAD)
  4. PAR = by
  5. PRO = for
  6. AD = to
  7. ANTI = against
  8. DANS = in (e.g., DANSRIVER 'in the river')
  9. UN = no, not
  10. POST = after
  11. PRE = before
  12. SUPER = on, over
  13. OMNI = all (e.g., OMNICHEERED 'everyone cheered')
  14. UNI = one
  15. SANS = without
  16. SUR = on

There are also suffixes:

  1. WARD = toward
  2. WISE = manner adverb
  3. EST = most. (Why we don't just say "est" for all superlatives will get its own blog post, to come later)
  4. ING = makes a verb from a noun, or light verb construction (This will also get its own post).
  5. SOME = full of (e.g., GLADSOME TIDINGS)

There is an apocryphal story that an international correspondent quit their job with the cable:


There are also a ton of one offs, like SMORNING for "this morning" and SNIGHT for 'last night.'


(taken from the excellent blog post on the subject Onwriting: Unearthing a lost language, which explains some of the more specific terms as well, like "thumbsucker" for "news analysis" and "art" for "photographer".)

So when my wife etme downwent DCward sweek advisit mother etfather inlaw, it outturns father unupmade weirdtalking. It was preupmade parnewsmen prehim. And now, postwise, I understand his texts meward.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Benedict Cumberbatch: Ye Shall Know Him By His Dactyls

[EDIT: I've been beaten to the punch, unsurprisingly by Gretchen McCulloch. I'm only 4 years late to this party, apparently.]

For some time, people on the internet have been playing with British actor Benedict Cumberbatch's name. They call him all manner of other names. And yet we all know who is being discussed. This post is about the simple reason why.

First, an example:

 This image is legit saved to my computer as "Crindlesnatch.jpg"

This image is legit saved to my computer as "Crindlesnatch.jpg"

Now, a few more examples. And why not throw in Reddit's favorites?

So what is going on here? A few things:

  1. Meter
  2. Vowels
  3. Bs and Cs???
  4. Context?

Meter is the most important. Cinnamon Thundercat has a distinctive name in that (1) he's almost always referred to by both first and last name, and (2) both his first and last name are dactyls. That is, they are a stressed syllable followed by two unstressed syllables. Once you know the context, any string of STRESSED-unstressed-unstressed STRESSED-unstressed-unstressed referred to as a person can be easily recovered as actually referring to Brandywine Crumplepuss.

Second, the replacements often have the same kinds of vowels in the same places. Most important seems to be that the last vowel be an /æ/ as in "batch."

Third, people often, but not always, use replacements that start with B and C.

Lastly, there's often a picture, or an introduction like "British actor ______." From here, it's clear who Battleship Crustybrunch refers to.

I don't have the time at the moment, but a true overkill analysis for the Hashtag SCIENCE fans out there would be something like:

  1. collect a corpus of name replacements
  2. have study participants rank them on felicity: how good are they at being "Benedict Cumberbatch names"?

Once you've got some large number of good ones:

  1. count how many start with B--- C---, B---- X----, or X---- C---- (where X is any other letter).
  2. count how many conform to a two dactyl pattern.
  3. run them all through some tokenizer, and associate each part with a pronouncing dictionary pronunciation (e.g., the CMUDICT pronouncing dictionary).
  4. Evaluate how well each maps its vowels to those in the original name.

The real question is how can his name and the game people are playing with it be so distinctive that when I talk about Enterprise Custardshirt, you think of Khan:

 The guy on the right...

The guy on the right...



...and not Kirk?!

For good measure, here's a Benedict Cumberbatch name generator. Enjoy!



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Why you probably didn't understand that one guy from Atlanta

When the first few episodes of Donald Glover's show Atlanta on FX aired, a lot of people were blown away by the writing and acting on the show and praised the authenticity of the characters. Others, however, were blown away by the fact that they simply did not understand what some of the characters were saying.




In particular, there's a short scene in which Donald Glover's character is in a detention center and is subjected to a short monologue by a man he met there, about how the man came to be in jail. I have included it here (under educational "fair use" --- FX still retains all rights. The scene is from Season 1, Episode 2. Season 1 can be purchased on YouTube, iTunes, etc.).

Much of my research is on regional variation, and on to what extent people understand different accents and dialects. I should say first, and most importantly, there is nothing wrong with this man. The character is not in any way intended as impaired (yes, I'm addressing this because it has been expressed to me). The way he speaks is a regional dialect, not any sort of personal idiosyncrasy. Plenty of other people from Atlanta, including some very famous people from Atlanta have the same accent. Some of them make a living as wordsmiths (Rich Homie Quan, Plies).

I have taken the liberty of transcribing the clip, and analyzing what the main triggers for misunderstanding are. I've been led to believe that his speech is typical of a certain subset of the population in Atlanta, and while there are more and more characterizations of regional varieties of African American English, to my knowledge Atlanta AAE is still under studied and under described.

The Transcript, for the curious:

I don’t believe this shit.

Ridiculous, man.

What'd you, uh, what'd you do to get in here?


Damn, man!

I should've just went home, boy.

Instead I’m in here locked up 'cuz of this fool I ain’t seen in about eleven years, man.

Boy I was at Five Points, bout to catch the bus, you feel me?

and this nigga I ain’t seen in eleven years come here talkin' 'bout “man, hey, listen here, hey boy I ain’t seen you in about eleven years boy, let’s hang out. go get a beer.”

So I follow him to the god damn gas station.

We get two beers.

We aint get but two of them, but they was the big ones, though.

They were the big ones.

Mmm, anyway, so, nigga like “man, come on let’s go-- go-- go to the house and drink ‘em.”

So we get to the house he’s like “man. My old lady.”

And so we just gonna drink ‘em on the porch. Feel me?

I’m like “boy, APD be rollin through here boy.”

And he, and he done talked me into it.

So sure enough, APD done rolled up and seen the god damn two cans out there, locked me up for public intoxication.

You know what I’m talking about?

Man, I’m in here man cuz this nigga, man, I ain’t seen (in) eleven years. Man, I’m gonna be in here till tuesday cuz I ain’t cashed my check.`

[That’s messed up]

Oh man, I should’ve went home, boy. Shit!

[Damn, man, I said I was sorry. I just ain’t seent you in like twelve years—]

Man! Fuck you Grady! Shut up!   

While some of the things people may misunderstand are questions of morphology (what word forms he uses) and syntax (how they fit together), I think far, far more important is his accent. Not only is the US still quite segregated, but --- some rappers notwithstanding --- much of the mainstream has almost no exposure to African American English from Atlanta. The triggers for misunderstanding are:

  1. AAE vowels: He has a number of features that are common across many regions, including the PIN-PEN merger (both words sound like "pin", and this extends to all words with en or em), monophthongization of ay so five might be more like fahv, etc.
  2. A shift in AAE vowels unique to the south: Just as white people from Chicago have vowels that have "rotated" from where many other Americans pronounce them ("I ride the boss to my jab!"), this guy's vowels are different than what you might expect if you're not familiar with his accent. For instance, his catch sounds like kitsch to most people, and his just sounds like jest to most people (the vowel is [ɛ], not [ʌ]).
  3. A strong preference for "open syllables." If we represent consonants with C, vowels with V, and syllable breaks with "." then there's a strong preference to reduce syllables to CV.CV.CV This generally means deleting anything after the vowel, unless it's an n or m, in which case that ends up just making the vowel nasalized, as in French. This means that believe (CV.CVC) is pronounced belee (in IPA: [bəli:]), fiveis pronounced fa, let's is pronunced leh,  and just (CVCC) is pronounced as jeh (CV) (in IPA: [d͡ʒɛ]). For many people this kills their ability to understand what they're hearing, although interestingly, it shouldn't necessarily.
  4. Deletion of unstressed syllables: public intoxication is pub toxication, eleven is lebm.
  5. AAE specific syntax: talmbout to introduce quotes (this nigga I ain’t seen in eleven years come here talmbout “man, hey, listen here, hey boy I ain’t seen you in about eleven years boy, let’s hang out. go get a beer.” to mean "this guy I hadn't seen in 11 years was like..."); nigga to refer to specific people; habitual be to mark usual or habitual behavior (APD be rollin' through here meaning "APD often comes this way"); perfective done to mark completed actions (And he, and he done talked me into it meaning "he (successfully) talked me into it"); and so on.
  6. Word final devoicing: sounds like b,d,g,v,z, are realized as p,t,k,f,s respectively, if they are at the end of a word. Moreover, b,d,g and p,t,k can be come a glottal stop (the sound in the middle of 'uh-oh') at the ends of words.
  7. Atlanta specific knowledge: APD is the Atlanta Police Department, Five Points is a place. If you understand the rest, you can figure this out from context. If not, you're cooked.
  8. The use of bwa ("boy") as a term of address, along with other (reduced) filler, like his very fast, very reduced "you know what I'm talking about."

All of these factors interact, so boy, I was at five points about to catch the bus ends up sounding to many white folks like bwa awa a' fapoi bouda kitschabuh yafih me? So many viewers who have never been exposed to Atlanta AAE could not even begin to figure out where the word boundaries are, let alone what the words themselves were. And even if you do figure out the word boundaries, many people might still be confused: I should a jeh went home bwa is just different enough for some people to think "man, I'm not sure what that was."

Some Notes

Syllable Codas:

Lots of research on AAE discusses deletion or reduction of things that happen at the ends of syllables or the ends of words, but they're all taken (justifiably) as different phenomena. So there's a rich literature on AAE that discusses:

  • possessive -s deletion: this is how you get things like baby mama for baby's mama. Or my best friend apartment door for my best friend's apartment's door. Basically, sometimes word final -s is deleted.
  • consonant cluster reduction: if a word ends in consonants that are pronounced with the tongue in the same location, you can drop the second if both are voiced or both are unvoiced: e.g., hand -> han', just -> juss. Basically, clusters of consonants sometimes lose some of those consonants.
  • deletion or vocalization ( = making a vowel) of r after vowels: The speaker above does this a lot. Vocalization is most clear in how he pronounces beer as biyuh. Basically, r sometimes is deleted.
  • deletion or vocalization of l after vowels: The speaker above does this a lot as well. An example is his pronunciation of fool as foow. Basically, l is sometimes deleted.

HOWEVER, there are ton of phenomena I've noticed but which are practically absent in the literature on AAE. For instance, the deletion of /v/ after vowels, which to my knowledge is only mentioned in one sentence in one article on AAE (Thomas, 2007). Most AAE speakers I know do this all the time, and the guy above is no exception: five points is fa poi (for the linguists: [fa.pɔ͡ɪʔ ]), believe is belee, etc.

Moreover, the discussion of the above syllable "coda phenomena" does not explain a lot of what the above speaker does. Entire syllable codas just disappear. The current literature on AAE states that people may delete the /t/ in just, but there's no real account for people who say things like jeh for just or gluh for gloves (in this case, I'm thinking of a famous-to-sociolinguists speaker from Philadelphia, recorded in 1981), or krima lih for Christmas List (e.g., everyone's favorite rapper, Plies.) Often, it's multiple morphemes (meaningful word 'pieces').

This is a topic I'm currently working on, and hope to have more to say later about the seeming dis-preference for codas in some varieties of AAE. For many, many words, it does not affect your ability to recover exactly what word was uttered. For instance, my fingers are cold because I forgot my gluh should be really easy to parse, because (1) there's no word gluh that would make you have to choose between possible words, and (2) context. We do this kind of thing all the time, since we don't always hear (or say) all the sounds in words. Spoken language does a really good job with a "noisy channel."

(For the linguists: While I'm writing about it, I might as well be the first to claim that: All obstruents higher on the sonority hierarchy than stops can be deleted syllable or word finally, and stops can all be realized as a glottal stop alone, for some varieties of AAE. Today, for instance, I heard [bli:ʔɪn] for bleeding)

The above speaker has pretty extreme reduction of codas, so let's hang out is leheygao:


but many viewers might be listening for something more like:



There seems to be a further vowel shift in progress in Atlanta AAE which has not been discussed much in the literature on AAE. Beyond what you would expect from southern AAE, a lot of Atlanta speakers have a couple of different vowels that what might be expected. A lot of linguists use what are called Lexical Sets to discuss accents. What this means is that we can talk about an entire class of words that all have the same vowel, and then state "the vowel in all of those words is thus-and-such in thus-and-such accent."  For instance, in most varieties of American English, the STRUT vowel (the vowel in words like strut, just, cuss, bus, cub, rub, hum, lunch) is written in IPA as [ʌ]. In the above clip:

  • the STRUT vowel is sometimes [ɛ].  So words like just and shut upsound a little like jest and shet epp. But bus is still [ʌ].
  • the TRAP vowel is sometimes [ɪ], which for most Americans is the vowel in words like ship, rip, dim. This is most pronounced in catch from catch the bus.

Overall difference:

There is a wealth of research on how we parse accents, and a couple of factors are at play here. First, AAE is heavily stigmatized in the US. The more it differs from middle class, 'standard', white speech, the more stigmatized it is. Second, because of the segregation in this country, many white folks simply do not understand AAE, even when we think we do (e.g., Rickford 1998, Rickford & King 2016, Jones & Kalbfeld 2017). Third, regardless the accent, when it's perceived to be difficult to understand, rather than improving with more exposure, experiments show that people basically shut down, and stop trying to parse it. Lastly, given racial/ethnic cues, people perceive accents where there aren't any. Here, there is clearly an accent, but the relevance of the last point is that people may already be predisposed to consider a black man in a jail detention center "hard to understand," or even "impossible to understand," and "not worth the effort."

A handful of my non-black friends assumed that the point of the scene was basically a gag -- that the guy was incomprehensible. I don't think that was the case, and that doesn't seem to be the impression my AAE speaking friends had either. He's just real Atlanta. That's part of why people love the show: there are tons of types of people that know from their daily lives that you just don't see on TV, but Atlanta gives them a spotlight, if only for a minute.

More broadly, though, the above points to a lot of interesting historical and sociological phenomena. Language change occurs when populations are separated. Generally, the way this is taught is by giving examples of European villages separated by mountains, where one town speaks differently than the next town over, because they don't interact often. However, as I'm going to argue in my dissertation, some populations in the US are separated by invisible mountains: residential and educational segregation. For some people, popular music, film, and television (including Atlanta) are now providing limited contact with people from "the other side of the mountain."




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Linguists have been discussing "Shit Gibbon." I argue it's not entirely about gibbons.


Earlier this week a Pennsylvania state senator called Donald Trump a "fascist, loofa-faced shit-gibbon."

There was an excellent post on Strong Language, a blog about swearing, discussing what makes "shit gibbon" so arresting, so fantastic, so novel, and yet... so right (for English swearing. Whether you believe "shit gibbon" is "right" as a characterization of Donald Trump is a personal assessment each person must make for themselves).

The post, The Rise of the ShitGibbon can be found here. I highly recommend reading it.

Most of the post was dedicated to tracing the origins and rise of "shitgibbon." The end of the post, however, catalogues insults in the same vein:

wankpuffin, cockwomble, fucktrumpet, dickbiscuit, twatwaffle, turdweasel, bunglecunt, shitehawk

And some variants: cuntpuffin, spunkpuffin, shitpuffin; fuckwomble, twatwomble; jizztrumpet, spunktrumpet; shitbiscuit, arsebiscuits, douchebiscuit; douchewaffle, cockwaffle, fartwaffle, cuntwaffle, shitwaffle (lots of –waffles); crapweasel, fuckweasel, pissweasel, doucheweasel.

I've actually been thinking about insults like this a surprising amount. Ben Zimmer points out about "Shitgibbon" that "...Metrically speaking, these words are compounds consisting of one element with a single stressed syllable and a second disyllabic element with a trochaic pattern, i.e., stressed-unstressed. As a metrical foot in poetry, the whole stressed-stressed-unstressed pattern is known as antibacchius."

I argue that this is correct, but that (1) there's a little bit more to say about it, and (2) there are exceptions.


First: I argue that the rule for making a novel insult of this type is a single syllable expletive (e.g., dick, cock, douche, cunt, slut, fart, splunk, splooge, piss, jizz, vag, fuck, etc.) plus a trochee. A trochee, as a reminder, is a word that's two syllables with stress on the first. Examples are puffin, womble, trumpet, biscuit, waffle, weasel, and of course, gibbon. Tons of words in English are trochees (have a relevant XKCD! In fact, have two! Wait, no, three! No one expects the Spanish Inquisition!). Because so many words are trochees, you'll have to pick wisely --- something like ninja might not be as humorously insulting as waffle.

That said, in principle, monosyllable expletive + trochee seems to give really good results. Behold:

fart basket, shit whistle, turd helmet, cock bucket, douche blanket, vag weasel, (I'm gonna be so much fun when I get old and have dementia. Good luck grandkids!), shit mandrill, piss gopher, jizz weevil, etc. etc. I can do this all day.

So, it's not the fact of being a gibbon per se. Various other monkeys would work: vervet, mandrill, etc. However, crucially, baboons, macaques, black howlers, and pygmy marmosets are out.

Moreover, it's not completely unlimited. Some words fit but don't make much sense as an insult: cock bookshelf, fart saucepan (which I quite like, actually), dick pension, belch welder.

Others sound like the kind of thing a child would say: fart person! poop human! turd foreman!

Yet others are too Shakespearean: fart monger! piss weasel!

Clearly some words (waffle, weasel, gibbon, pimple, bucket) are better than others (bookshelf, doctor, ninja, icebox), and some just depend on delivery (e.g., ironic twat hero, turd ruler, spunk monarch, dick duchess).


For a while, I've been discussing vowels in insults with fellow linguist Lauren Spradlin. Note that when we talk about vowels, we mean sounds, not letters. Don't worry about the spelling, try saying the below aloud. Spradlin has brought my attention to the importance of repeating vowels increasing the viability of a new insult of this form: crap rabbit, jizz biscuit, shit piston, spunk puffin, cock waffle, etc.

I would argue that having the right vowels actually gives you some leeway, so you can get away with following the first word with --- gasp! ---- a non-trochee! Be it an iamb (remember iambic pentameter?) as in douche-canoe, spluge caboose, or the delightfully British bunglecunt (h/t Jeff Lidz), or even more syllables: Kobey Schwayder's charming mofo-bonobo.

As you can see, this is a hot topic in the hallowed halls of the ivory tower. If the above simple formulae have motivated even one person to go out and exercise their own creativity to make a novel contribution to the English language, then I've done my job here as a linguist. Different people get into linguistics for different reasons, but this, this is what I live for. Get out there and make a difference!




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Linguistically, why does it sound like Trump thinks Frederick Douglass is still alive?

Normally, when I write a blog post, I am pretty confident I know exactly what I'm talking about. I think of this as a place to communicate interesting findings or facts about linguistics to a lay audience. However, today something happened that I can't quite explain, but I want to discuss. Today, in honor of the first day of Black History Month, Trump gave a talk in which, among other things, he said:

Frederick Douglass is an example of somebody who's done an amazing job and is being recognized more and more, I notice.

Plenty of people have already remarked that the phrasing sounds odd. It seems as though Trump is unaware that Frederick Douglass is not still alive. His press secretary made similar remarks:

I think he wants to highlight the contributions that he has made

My goal here is not to suss out whether Trump and Spicer know that Frederick Douglass is dead, and has been for over a century. Rather, I want to discuss why many people have the intuition from the above utterances that Trump and Spicer think Frederick Douglass is alive. The linguistosphere (note: not a real thing) is abuzz right now, and there's quite a bit of discussion on my facebook and twitter about this.

First thing's first: there's something about the fact that he used the present perfect instead of the simple past. That is, "Frederick Douglass has done an amazing job", not "Frederick Douglass did an amazing job." 

The present perfect indicates that something is completed, now. However, the fact that it's not morphologically past can't quite be it, because there are plenty of things that are over and done with and not likely to continue that can be marked this way. For instance:

I have eaten the plums that were in the icebox

There's something about the relation to the present, though, that makes it so it sounds very strange to use when discussing someone who died a century ago. But there's more to it. For instance, we could rebut what we take Trump's assumptions to be with:

Frederick Douglass has died (though). He did so in 1895.

Even that is odd, though. We'd expect something like "Frederick Douglass is dead."

Perhaps it's a requirement that the doer of the action marked with the present perfect be theoretically capable of still doing actions in the present? My current working hypothesis is that the subject of a (non-passive) clause marked with the present perfect requires a subject that exists in the present. For a passive, that goes out the window ("the plums which were in the icebox have been eaten"). 

BUT, and this is a huge caveat: I am very much not a semanticist.

I'm hoping for linguists who focus on semantics to weigh in, and will update here.  There's clearly some structural presupposition of relevance to the present that makes statements of historical fact sound bizarre with the present perfect. See for yourself, with other sentences:

Napoleon has taken over Europe.

Muhammad has had a vision.

Julius Caesar has been stabbed.

Cleopatra has taken a lover.

...Frederick Douglass has done an amazing job.

It sounds like he either has just now done an amazing job, or he's done an amazing job and will probably continue to do so.





©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

New working paper on Quotative "Talkin' 'bout"/"talmbout" published

I have a new paper out, on talkin' 'bout or talmbout in African American English, used as a verb of quotation similar to quotative be like. The paper can be found here. A blog post (or blog posts) on it will be forthcoming -- once I've finished my much more pressing dissertation proposal.



©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

Why the International Phonetic Alphabet (IPA) is the best thing ever

In this post, I'm going to explain why knowing the International Phonetic Alphabet is like seeing the matrix.

A common misconception that linguists often have to deal with, be it from students in Intro to Linguistics or from family members at holiday gatherings, is that Language (capital L) is basically written language. This has all kinds of ramifications, from people thinking that stylistic conventions for writing are somehow "rules" that languages follow, to thinking that people whose pronunciation differs significantly from what we think of as being "how it's spelled" are somehow dumb.

But linguists know a secret: writing is a secondary technology, and spoken language (as opposed to signed language) is all about sounds.

There are tons of different writing systems, and languages can be represented --- well or poorly --- by many different systems. For instance, Turkish was historically written using the Arabic abjad (basically an alphabet that only has consonants), but is now written with roman letters. Tajik is the same language as Farsi, but one's written in Cyrillic and one in a modified abjad. Hell, people get tattoos in English that are written in Tengwar (Tolkien's Elvish script). All these different scripts represent sounds in ways that are sometimes more and sometimes less faithful to the (abstract) sound units that a given language uses. For instance, standard varieties of English have two sounds that we totally suck at writing, so we just slap two letters together. For both of them. When you blow air between your tongue and front teeth, whether your vocal chords are moving or not, we just write it <th> in English and call it a day, even though <t> refers to one sound and <h> refers to another and this combination makes no damn sense.

Rather than wade into the ridiculous morass of writing systems every time we want to talk about a language, even if it's just to give one example before moving on to other data from other languages, linguists have developed basically the best tool ever: The International Phonetic Alphabet.

How is it different than other alphabets? First, it's based on all the different sounds that human mouths and vocal tracts can make, and second, it's got one (and only one) character for each sound.

The beauty of it is that it is independent of accents, and in fact, you can represent different accents clearly using the IPA. So every linguist's pet peeve is reading language learning books that say things like "i as in the i in kit." Well, that's great, except for different accents/dialects/varieties of English pronounce the word "kit" differently.

So how does it work? Let's take a brief tour of SOUNDS YOU MAKE WITH YOUR MOUTH AND STUFF:

Sounds You Make With Your Mouth (and Stuff)!

I'm going to simplify a lot of this discussion, so expect nitpicky comments from fellow linguists about what really a consonant is, but basically, you can divide the sound we make into two main classes, if you don't think too hard about it:

  1. Consonants
  2. Vowels

Consonants, for our purposes, are basically any sound where the flow of air out from your lungs is obstructed in some way. Vowels are sounds where air flows freely.

Notice: I did not say "vowels are [insert list of letters (and sometimes other letter!)]." I said "vowels are sounds where air flows freely."

The cool thing is that each of these two classes can be completely described (for our purposes, again), using 3 parameters.

For Consonants:

  1. place
  2. manner
  3. voicing

Let's take these one at a time.

Place is the location of the obstruction of airflow. This can be closure at the lips, the tongue at the teeth, at the alveolar ridge, at the hard palate, the back of the tongue at the velum, etcetera.

Manner is the way airflow is obstructed. If it's completely blocked off, it's a "stop." If it's just partially blocked and creates a turbulent airflow, it's a "fricative" (think "friction").

Voicing refers whether your vocal chords are vibrating.

 Places. In your mouth.

Places. In your mouth.

Armed with these three, we can start specifying sounds. For instance, the <t> in <stop> is a unvoiced coronal stop (it's not voiced, it's made with the "crown" of the tongue -- that is, the tip -- and airflow is completely stopped for a second. Technically, less than a second, but whatever).

Since we can characterize all the meaningful units of sound that a language uses in this way (#thatsThePoint), wouldn't it be nice if there were one and only one symbol for that sound? GOOD NEWS: THERE IS! And, since the IPA was made up by a bunch of Europeans, it's exactly what we'd expect: <t>. If it were voiced? <d>. If it was in the same place, unvoiced and a fricative? <s>. Voiced? <z>. Same manner and voicing (stop, unvoiced), but made at the lips? <p>.

What's really amazing about this, and will be the subject of a different post, is that the way languages work seems to be by reference not to spelling and stuff, but to these sub-classifications of sound, which we call distinctive features because (1) they're features that (2) distinguish sounds from one another. In fact, languages tend to change based on natural classes of these features. For instance, some sound change might affect all stops, but not fricatives.

The IPA chart for consonants can be found on Wikipedia here, and you can click on each symbol and hear the audio. In fact, each sound has its own wikipedia article.

Similarly, vowels can be classified along three parameters:

For Vowels:

  1. Tongue Height
  2. Tongue Backness
  3. lip rounding

Height refers to how high the body of your tongue is in your mouth.

Backness refers to how far front or back the highest point of your tongue is (again, a simplification, but basically right).

Rounding refers to whether you round your lips or not.

 Vowel chart, and schematic of tongue height for front vowels.

Vowel chart, and schematic of tongue height for front vowels.

So for instance, linguists interested in dialects of American English may talk about whether a particular variety, say California English, is fronting /u/ to [ʉ] or even [y] and this is meaningful -- we're describing an accent, but doing so in a more precise way than if you were to say "it sounds like 'kyewl' when they say 'cool'!"

Wikipedia, again, has a super useful chart, where you can hear the sounds. It's the same as the above chart --- "front" vowels are on the left, "high" vowels are on the top, and they come in pairs with the unrounded form on the left and rounded on the right. Add a little tilda on the top of the vowel (literally it's just a little n above the vowel) and you have a nasalized vowel --- a vowel where your velum is dropped a bit and you allow airflow out through your nose. French has a ton of these, so what's written as <on> is pronounced /ɔ̃/. Without the little squiggly, it's /ɔ/, the vowel in a New York accent's "coffee" (/kɔfi/), as opposed to a Canadian or Californian's "coffee" (/kɑfi/).

So how many vowels does English have? Well, most accents have 20, not 5. And you can discuss them all using the IPA: bead = /bid/, bed = /bɛd/, bad = /bæd/, and so on. If I boo someone, I /bu/ them, but someone from California might /bʉ/ them, and we have a precise way of telling, and discussing, the difference.

Some helpful observations

The IPA was designed to be intuitive, and useful. So most of the symbols are exactly what you would expect. The vowels are a little more complicated, but think about what would make sense for Europeans: /i/ is the vowel in 'beat', and we have a special character for the sound in 'bid' (it's /bɪd/). Consonants are basically what you'd expect, except <y> is /j/ (like in German Ja!). That shitty combination in English of <ng> for a voiced velar nasal is just one symbol, an <n> with the tail of a <g>, called engma: /ŋ/.

Brackets and slashes: Linguists use slashes to indicate a more abstract level, and brackets to deal with the sounds that are actually made. So In English /t/ can have a bunch of different realizations in speech:

  • [t] in 'stop' [stɑp]
  • [tʰ] in 'tap' [tʰæp]
  • [ɾ] in American English 'butter' [bʌɾəɹ]
  • [ʔ] (a glottal stop, the sound in the middle of 'uh-oh!') in Cockney 'butter' [bʌʔə]

and so on. Most normal people are just not aware of these differences, at all (especially the first two, but put your hand directly in front of your mouth an see how different the airflow is!).

Notice, too, that you can clearly represent accents using this tool. 'butter' is [bʌɾəɹ] in General American, but [bʌʔə] in Cockney, and [bəˈtœʁ] in a bad French accent. With the slashes, we can just talk about things that happen to (abstract) segments (of sound) in a language without having to specify all the different little things that happen in specific environments, and with the brackets we can get as specific as we want to, so I might say 'specific' /spəsɪfɪk/ as [spəsiɪfɪˀ]. And once you know the IPA, you can tell from that HOW I SOUND WHEN I SAY IT.


With the IPA, we don't have to resort to all kinds of weird ways of talking about things ("the vowel in a New York accent when they say 'coffee'" or "the ü sound in German, if you know German," or "That thing that some French people do when they say 'oui' in a nonstandard way.") We can just use the IPA and talk about place, manner, and voicing, or height, backness, and rounding (/ɔ/, /y/, and /ɕ/, for the previous long-winded and confusing examples).

While it takes a little (let's be real, very little) effort to learn the IPA, the payoff is immense for anyone who wants to learn another language, learn another accent, or understand any discussion of sounds that humans make in a clear and concise way. It's basically seeing the matrix.




©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

Gender, Gender, Gender

A good Question:

I'm still getting a surprising number of comments and emails about the short post I wrote on David Peterson's slip up with grammatical gender. While most are incoherent and silly (and have a seasonally and statistically unlikely preponderance of the use of the word "snowflake"), there is one in particular that seemed earnest, and that I think warrants a full response. Brele asked:


Taylor, can u help me understand how there are more than two genders? I ask just having watched J. Peterson on a YouTube show and hearing his thoughts on the matter.

I think it's important here to distinguish three related phenomena, and where they do and don't overlap: biological sex, gender (and gender expression), and grammatical gender. The conflation of biological sex with gender, and the subsequent conflation of grammatical gender with both, is where most of the confusion and anger comes from, I think.

Biological Sex:

Biological sex is what it sounds like: the biological properties we associate with sexual reproduction in a species. We assume that there are two sexes in humans: male, and female. This is not strictly true, as biological sex is determined by a constellation of factors, among them:

  • chromosomes: while we're familiar with XX as female and XY as male, there are people with XXY, or other unusual (but extant!) combinations. Roughly 1:1666 births have atypical chromosomal combinations. That's roughly 210,000 Americans.
  • gonads: most people have reproductive organs that fall broadly into one of the two expected categories, but again, not all people do. Roughly 1:1500 births have atypical gonads (which means about 230,000 Americans).
  • hormones: some people have atypical hormonal patterns. For instance, the sikh woman who has polycystic ovarian syndrome, and therefore has a full beard. 

The vast majority of people will have phenotypes that 'line up', but a sizeable minority don't. So what we think of as physically binary -- male/female -- is, in reality, a bit more complicated than that, but generally true. Not always true.

Gender and Sexuality:

Gender, in the social sciences, is distinct from biological sex. It is also a complicated constellation of factors, including:

  • who you are attracted to.
  • how you physically present yourself, and how you behave, according to (or going against) culturally defined patterns of behavior. For instance "boys wear blue, girls wear pink" is a completely arbitrary, culturally defined dichotomy with no basis in biology, and which is absolutely not universal.
  • How masculine or feminine (or neither, or both or whatever) you personally feel. That is, maybe I feel really girly (whatever that means, just go with it), but I don't present myself in accordance with that because it's easier to just follow my culture's rules about What Men Do than it is to deal with people's reactions if I start wearing dresses.
  • a bunch of other stuff I'm probably leaving out.

The key here is that gender is about how you feel, behave, and are attracted to, and is not about your chromosomes, gonads, and hormones.

For a more science-y take: there are multiple parameters, which may be either binary or have multiple levels, along which people can vary continuously. This is a high-dimensional space that we generally try to collapse to a single-dimensional sub-space to then classify with a binary score. Increasingly, people who are hard to classify on that one dimension (studs, bears, beardos, genderqueer, agender, genderfluid [do we need a time dimension?], flannel-heads, balloon-poppers -- yes, I made some of these up, but not the balloon one) are saying you can't collapse things to a single binary parameter, but you night a higher dimensional space to accurately categorize people without losing important information.

Grammatical Gender:

This is how languages group nouns. The name is an unfortunate misnomer, given the conflation of the above two things -- it's etymologically related to genre and that's probably a much better way to think of it. Some languages have two genders, which they call masculine and feminine because noun classes in those languages sort of line up with how actual masculine people and feminine people are classified grammatically. That said, in one such language, French, there's no clear reason a table should be semantically feminine. The genre of the noun just happens to be the same as for women, but in this case it's largely a phonological thing, not a semantic (i.e., meaning) thing. Moreover, some words come in gendered pairs: le tour 'the tour' (as in, the tour of france), versus la tour 'the tower' (as in, the Eiffel Tower). 

In other languages, there are two genders, but they don't line up with sex: Dutch has two genders, but they're common and neuter. Both man 'man' and vrauw 'woman' are common, and meisje 'girl' is neuter (along with all other diminitives, so mannetje 'little man' is also neuter).

In other languages, there are more than two genders. German and Russian have masculine, feminine, and neuter.

In yet other languages, there are many more genders: Zulu has 14, and none of them have anything to do with sex. Some are for humans, some are for long, stick-y things (although there's arguments about this), and one is for abstract concepts: umu-ntu is a person, aba-ntu is 'people' (whence "Bantu"), and ubu-ntu is the quality of being human (personhood, or humanity).

Finally, many languages mark all nouns and noun-y things with gender, but many don't. English, for instance, only explicitly marks gender on some pronouns (he and she, but not you), and a handful of nouns for kinds of people ("actress").

The Takeaway:

"Gender" is often interchangeably used to mean any of three things: biological sex, sexual(ity) gender, and grammatical gender. Moreover, each of these things is complex, and non-binary (although biological sex comes close to being binary in everyday life for most people).

English obligatorily marks gender on third person singular pronouns (and that's about it). This gender marking generally overlaps with biological sex and 'mainstream' gender expressions related to cultural assumptions about biological sex. People who do not feel like they are necessarily well described by he or she have been asking to be referred to with a different term -- many ask that we use they, which has the benefits of (1) already existing in English, and (2) being gender neutral already. Others ask for ze or something else. 

The point is, marking gender on third person singular pronouns (only) is a weird quirk of the grammatical structure of English, and not representative of objective biological reality, and certainly not reflective of culture. My comments on David Peterson's remarks were solely to laugh at the irony of someone claiming they refused to use gender-neutral pronouns while using gender neutral they to express that contrarianism.

Hopefully, the above answered the question of how there could be 'more than two genders.'



©Taylor Jones 2016

Have a question or comment? Share your thoughts below!