My work was just cited by a crank, here's a response

I recently came across an article written for Quillette by Heather Mac Donald which uses a research paper of mine published in American Speech in 2015 to defend a frankly stupid position. The article was shared by Stephen Pinker, which means increased visibility, so naturally I want to make sure the record is straight as far as concerns my research. The position she uses my work to justify is a position I disagree with not on political grounds, but on empirical grounds. I'm going to contextualize all of this for those unfamiliar with the players involved, before adding my response. [Note, this post uses a racial slur, a sex/gender slur, and some colorful Quebecois in citation form.]

Some Context:


Quillette is a for-profit 'safe space' from 'political correctness' and 'leftist bias' created by a grad school dropout (one who, in interviews, claims explicitly that she is "actually trained as a psychologist" despite not, you know, actually having finished her training). You may have come across it recently when they published an article written by an undergrad that purported to challenge Ta-Nehisi Coates' work (among others). 

This article serves as a pretty good explanation of what Quillette is, and what it's trying to be. (Highlights include: "Quillette makes tired alt-right talking points sound erudite", and "Instead of writing off the academic left — and, generally speaking, women and people of color — as crybabies or social justice warriors, Quillette’s writers use the classical liberal tradition of 'mature debate' to dismiss marginalized voices".)

Heather Mac Donald

The author of this particular piece, Heather Mac Donald, is most notable for authoring such works as The War on Cops, The Illegal Alien Crime Wave, In Defense of Fascism and The Diversity Delusion. (Ok, one of those is fake, but the other three are real). I think her works speak for themselves.


Steven Pinker is a well-known cognitive psychologist who does some work in linguistics, and who has been relatively influential. He also has gone off the rails on Twitter lately, so, for instance, in tweeting the link to this Quillette article, he complains about "PC/SJW." That is, Political Correctness (which is, more or less, trying not to intentionally say mean, hurtful, or offensive things by thinking about your choice of words before speaking) and Social Justice Warriors. I'm not 100% clear on what's wrong with social justice, but it's clear from use that SJW is intended as derisive, and directed toward people who --- I don't know. Want equality? Anyway, the point is Pinker is well known and is amplifying Quillette's signal, using in-group signals for the alt-right (whether intentionally or not). In this case, it's the writings of a woman who believes that "phantom police racism" is a cover to keep people from discussing the "uncomfortable problem" of "black on black crime". One who then cites my research out of context, evidently to defend her desire to say nigger (no, really, this is not an embellishment; see below). 

The Quillette Article

I will reiterate that Quillette is for profit, so keep that in mind when deciding to click through. The article in question can be found here, if you, dear reader, wish to read it for yourself (perhaps use an ad blocker?). 

The article is ostensibly a defense of the poet Anders Carlson-Wee, who was the subject of a minor online tiff last week, after The Nation published a poem of his written in an approximation of African American English. John McWhorter, with whom I do not always agree, wrote an excellent, thoughtful piece in defense of Carlson-Wee, which can be found here

Heather Mac Donald, however, has taken the controversy as a jumping-off point to dive into her feelings. In this case, her feelings about censorship. My goal here is not to catalogue all the things wrong with her article, as I simply don't have the time to do so, and others have done so better (especially with regards to her bizarre reading of Plato). I do want to touch, however, on a few points. 

First, she refers to African American English as "black street dialect". I object to this not on "SJW" grounds (that is, that it is clearly offensive dogwhistle: what is the function of "street" in this description? It's not location; it's judgment. Is whatever Heather speaks only spoken indoors?), but rather I object to it on scientific grounds. There is a wealth of literature on the speech of African Americans going back at least 60 years, and that is simply not the term used by anyone who knows even the slightest bit about the subject. You may have feelings about AAE versus AAL versus AAVE, but if you're discussing a language variety it would behoove you to use really any of the actual names for it. It would be like me discussing "Iranian town dialect" instead of Persian/Farsi. I just look dumb and unecessarily prejudiced.

Second, she argues strongly that there is some boogeyman mob that will ruin your life if you ever mention a taboo word, in citation form or otherwise. As a linguist who researches and says taboo words, this is total nonsense. People are generally extremely good at, well, context. I am a cis/het white man, and part of my job is to discuss taboo words publicly. And you know what? No ill has come of it yet, because I do so in (1) appropriate contexts, (2) with academic rigor, and (3) with respect for both the communities that hold those taboos and respect for the people described by those words (when those words describe people). 

It's the third point that's going to take a little work. The paragraph that cites my work, is, well, absurd. In that paragraph, Mac Donald writes a lot of garbage: 

"The elaborate rituals around the ‘n-word’ evince the same double standard regarding authorial intention. According to existing conventions, whites may never use the full word without elision, even if they are doing so not to refer to anyone but as reported speech. Its mere presence in the mouth of a white person launches a nuclear bomb against blacks; the transgressor will be punished accordingly, as the founder of Papa John’s pizza discovered after using the full word as an embedded quote from chicken impresario Colonel Sanders. Blacks, however, can use the word in toto to refer to actual people, because their intentions matter and it is assumed that blacks are incapable of racist intent. Black Twitter users used the n-word 6.2 million times in one month, according to a 2015 study; it is ubiquitous in urban vernacular and in rap music, with black entertainers like Jay Z, Beyoncé, and Kanye West tossing it off with impunity."

Let's unpack this.

  1. "According to existing conventions..."  --- What conventions? In what contexts? This has the appearance of social science without any of the social science. 
  2. "whites may never use the full word without elision, even if they are doing so not to refer to anyone but as reported speech."  --- This is untrue, but as I've written elsewhere, not a bad rule of thumb if you want to avoid pissing people off. 
  3. "Its mere presence in the mouth of a white person launches a nuclear bomb against blacks; the transgressor will be punished accordingly, as the founder of Papa John’s pizza discovered after using the full word as an embedded quote from chicken impresario Colonel Sanders" --- This is patently, obviously untrue, and just wildly hyperbolic. A nuclear bomb? As I've written elsewhere, there's context for when it is possible to say the n-word (and that's a separate question from whether you should say it). ALSO, it's important to note that while the founder of Papa John's did use "nigger" in citation form, he did so while complaining about how he can't say it, but someone else got away with it! That's like getting mad that people call you misogynist when you complain that 'feminazis' are preventing you from calling women 'bitch'. It's just a sneaky way of trying to say it anyway. It's like me saying "Why can't I tell everyone that 'Heather Mac Donald is an idiot.'?" Just because it's embedded doesn't mean it loses its force, right Heather? (this was actually the subject of an academic talk at this year's Annual Meeting of the Linguistic Society of America).
  4. "Blacks,"  --- really? Listen. You can call people that, but I'm pretty sure it pisses of black people to be called "blacks" just as much as it pisses off white people to be called "whites". In general, taking an adjective and then using it as a noun for a group of people you think it describes is not well received. This is basic stuff here.
  5. "Blacks, however, can use the word in toto to refer to actual people, because their intentions matter and it is assumed that blacks are incapable of racist intent. " --- Justify this statement. According to whom? Under what circumstances? This is the sentence before citing my work, and the implication is that my work in some way justifies this stupid statement. If you want to draw on arguments that prejudice and racism are different and that black people can be prejudiced, but not systemically racist against white people, then make that explicit and attribute the argument. This is weak writing that I wouldn't tolerate from undergrads (but then again, we know where Quillette stands on publishing academically lazy, poorly written articles by undergrads).
  6. Black Twitter users used the n-word 6.2 million times in one month, according to a 2015 study" --- This is a bait-and-switch using my work to justify something other than it says. As Christopher S. Hall and I have written extensively about: there is not just one n-word in African American English(note: NOT Christopher J. Hall, although I assume he's lovely).  Mac Donald, here, is attempting to justify using a racist slur in one dialect by saying a similar word exists in another dialect. It's like saying tabernak is not a swear word in Quebec French because "tabernacle" is totally mundane word in Quebec English. They're not pronounced the same, they refer to different things, and they're not used in the same linguistic or social contexts.
  7. "it is ubiquitous in urban vernacular and in rap music, with black entertainers like Jay Z, Beyoncé, and Kanye West tossing it off with impunity." --- Define "urban vernacular". More importantly, again, you're comparing apples to slurs. Also, Beyoncé? When?

You've cherry picked a line from my research that isn't actually applicable to your argument in the hopes that my academic reputation will somehow add a veneer of respectability to your weak reasoning. 

More broadly, the point is reasonable people generally don't have a problem with other reasonable people discussing a slur when it's clear that they are doing so with rigor and from a place of respect. When you disingenuously demand to know why "blacks" get to say "the n-word"  but you don't, it's clear you just want to say offensive shit. Then people (correctly) call you an asshole and tell you to stop. When they do things like protest your speaking engagements, or say mean things to you on Twitter, THAT'S NOT CENSORSHIP. That's other people also exercising their freedom of speech, and is a natural result of your exercise of your freedom of speech. You are playing the victim in an attempt to silence other people's free speech, because evidently you want to say "nigger" at people without repercussions.

The takeaway:

You can say "the n-word" and nobody can stop you. However, there will be social ramifications. That's how pretty much all of language works. The real question is why do you want to say it so badly, Heather?



©Taylor Jones 2018


A Malefactive in African American English

This is a quick post about something I've heard all my life in AAE speech communities but haven't seen discussed, well, really anywhere. 

Benefactives and Malefactives in English

A lot of languages can take a verb and mark whether it was done with kind or harmful intent toward someone else. In ('standard') English, the benefactive marker is a separate word that introduces the recipient, and that word is for. For example:

  • She baked a cake for me. (meaning either, she baked a cake with the intention that I eat it, or she baked a cake so I wouldn't have to).
  • He made a phone call for me. (meaning he made a phone call so I wouldn't have to, or on my behalf).

Other languages may mark this differently (for instance, Zulu adds the infix -el- just before the end of the verb). 

English also has a very limited malefactive marker: on. For instance:

  • She hung up on me
  • She walked out on me
  • He told on me

But you can't just use it with anything:

  • ??? She baked a cake on me

That said, some non-standard varieties allow for much more productive use of malefactive on. For instance, my (somewhat Southern) grammar lets me say it so long as the verb is prefaced with up and, as in:

  • She up and baked a cake on me (meaning: She surprised me by baking a cake, contrary to my expectations and possibly with some negative effect on me...but not physically on me in any sense).

An AAE only Malefactive

I've been thinking about this recently, and noticed something that's not grammatical in other varieties of English: to (tell a) lie on someone. Examples:

  • She told a lie on him
  • He would never tell a lie on her

I've asked a few people who use this, and they agree it's equivalent in meaning (but not in mood!) to telling a lie about someone, and doesn't mean to tell a lie to someone. 



©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What, really, is a word?

I just received the new issue of Language in the mail, and the first paper is a paper I'd read a draft of a while back, and loved. Not only for it's snazzy title ("The Lexicalist Hypothesis: Both Superfluous and Wrong"), but because it gets at the heart of a really interesting issue in linguistics.

The core argument Benjamin Breuning proposes is that if we assume there's such a thing as a "word" and that words are fundamentally different from "phrases" in the grammar (that is, the rules in the mind of a native speaker), then we end up with a lot of difficulty and more assumed grammatical structure that doesn't really give us much. Not only that, but we have to explain some weird observed behavior that this model doesn't predict. This contention, that wordhood is assumed by people who speak European languages, has been an issue for people working on agglutinating languages (like Zulu, Iroquois, etc.) fora while. The genius of his paper is, in part, that he uses all English examples.

So the argument is more-or-less that "word" is a phonological object, and not a grammatical/syntactic object. That is, we manipulate syntactic structure in our minds, but where we put the breaks in speech is really a property of sound and not the structure we are manipulating. Some examples should help clarify.

First, however, I want to point out that we already all know that words (in the traditional sense) themselves often have structure. So for instance, we can look at a word like:

  • unlockable

and know that it has three pieces that carry meaning, un-lock, and -able. One of those can stand on its own, the others can't. AND, interestingly, we can think of them as representing two different structures with two different meanings:

  • [un [ lock-able ] ]  == not able to be locked
  • [ [ un-lock] able ] == able to be unlocked

So what Bruening does is give a ton of examples where the adjective modifying a noun is really an entire phrase:

  • she gave me a don't-you-dare! look

He argues that if you have a model of syntax that assumes the existence of words as atomic pieces, and these words have categories (like "adjective," "noun", "verb") then you can't really account for don't-you-dare! looking both like an imperative and an adjective. 

The paper itself is a lot more complicated, since he gets into some really thorny issues in syntax that are probably not appropriate for a blog post, but the paper itself is delightful and its example sentences are great. 

One of my favorite things in reading linguistics literature is the special schadenfreude I get when reading someone point out that another linguist's example sentences are wrong, and it's even got that, where he points out the grammaticality (contra another linguist's analysis) of utterances like:

  • I have to go re-tuck in my kids.
  • he was re-sworn in as governor.

I have been thinking about this word/phrase distinction for a long time, but evidently not on the same level as Bruning. I have, however, been collecting examples of sentences like these for years, and now have a good reason to share them. I have generally put the phrase in brackets, and in some instances if there's an unsaid element (like "I would wear it" in example 2), I have left an underscore where we might expect another syntactic element. So without further ado:

  1. The one I had at tale was [I can't even handle it] sweet.
  2. It was totally an [I would wear____] style.
  3. You put your computer in the [my computer] spot.
  4. It's a really [hard to open] door.
  5. It's not entirely a [nigga, we made it] moment. (Childish Gambino in an interview)
  6. Did you swallow a [too big] piece?
  7. Sometimes when I cough it sound like it's a sickness cough but it's really a [my-lungs-aren't-ok] cough.
  8. It's always bad when it's *too* [too big].
  9. Please return for a [left behind] item. (over the intercom at JFK airport)
  10. Is 250 texts a good number, or a [not enough____] number?
  11. As a [[i don't have to be there for very long] ____] I don't really mind it.
  12. It was almost [knock you over] wind.
  13. I don't have a specific [it has to look like this] idea.
  14. I'm sure you'll be past the [a thousand] mark.
  15. It's a vacation house, it's not a [___ live there] house.
  16. It was a [my lungs are tight] kind of cough.
  17. I don't mind the [making my own lunches] part of it...
  18. I'll find, like, *old* [my hair], and be like, "how did this happen?"
  19. It's a writing desk, not a [leave a pile of books and papers on it] desk.
  20. Go see Ailey. It's [change your life] good. (Advertisement in the subway for Alvin Ailey Dance Theater)
  21. It was too [not enough time].
  22. Wow, this is a really [___ sink into it] couch!
  23. It's [if you're desperate you'll eat it] bread.
  24. Now we have a [thank you] reason to send that card.
  25. I need an overnight flight, not a [during the day] flight.
  26. It's stupid. I wore a pair of boots on a [slightly too warm] day and it gave me a rash.
  27. That's the [be careful because if you sit on it wrong the chair might break] chair.

I really, thoroughly enjoyed this paper, which can be found [here].



©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

Know it all

Trump recently tweeted something that was linguistically interesting (shocking, right?). Tweeting about James Comey, he wrote:

"Comey knew it all, and much more!"

This is the kind of thing that would be starred as an infelicitious utterance in an intro to pragmatics. The reason is what we call "scalar implicature." If I said:

"Comey knew some of it."

That carries the implicature that he, well, knows some. I can cancel that implicature by stating:

" fact, he knew all of it!"

The same goes with everything less than allFor instance:

"Comey knew most if it. In fact, he knew all of it."

However, because all inherently means "everything" it makes no sense to say he knew all "and more".

It's the kind of thing you might see as an insult with negation:

"Comey knew nothing. In fact, he knew less than nothing."

But it still doesn't quite make sense in that sentence frame:

*"Comey knew none (of it), and much less!"

I'm not entirely sure what to make of this, since it's not an off the cuff remark that can be attributed to a speech production error or brainfart. Perhaps it's further evidence that people tend to "tweet how they speak."




©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

New Working Paper on Zulu published

I recently gave a talk on Zulu morphosyntax in which I (hopefully politely an respectfully) challenged some of the mainstream approaches to Zulu syntax. The working paper is now out, in the Proceedings of the Linguistic Society of America, available here (pdf download under "full text").

It's not a fun read for a layperson, but the general gist is that (1) a lot of previous syntax work doesn't pay enough attention to the phonology, (2) the justifications for arguing that the noun augment is really a determiner are a little shaky, and (3) if we just treat the 'linking vowel' as a determiner, everything is simpler. This has the unexpected outcome of also suggesting that Zulu has construct state, something known (and controversial) in Semitic languages, but not known to exist in Bantu languages. To paraphrase a colleague at Penn, I've reduced a seemingly unique thorny problem to an already known thorny problem, which is about as good as you can hope for in syntax.




©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What would Wakanda sound like?

Today, Marvel's Black Panther is released. The Black Panther, aka T'Challa (played by Chadwick Boseman), is the king of Wakanda, a fictional country in Africa (neighbored by other fictional countries like Azania and Narobia (but not Nambia). While I'm extremely excited for the movie (NO SPOILERS PLEASE), I don't have high hopes for a surprise fictional language in the movie, given the pre-film hype about the inspiration for design elements, costume, and even T'Challa's accent. In previous films, T'Challa's father was played by a Xhosa speaking actor, and it now seems that Xhosa being spoken in Wakanda is now Marvel Cinematic Universe head-canon.

Geographic improbability aside, I don't have a problem with this, as Chadwick Boseman does a great Xhosa accent --- far better than, say, Morgan Freeman in Invictus. But, given that Wakanda is supposedly 5,000km away from South Africa (where the non-Wakandan Xhosa people are), what would the languages of Wakanda sound like? This is just a short blog post to (shallowly) explore that question with some links for the interested.

Location, Location, Location

Wakanda is situated somewhere in East Africa, by either Lake Victoria, or Lake Turkana. That means it's somewhere around Uganda, Kenya, Rwanda, Ethiopia, and South Sudan. What's great about this is that it's an area where a lot of languages from different language families are spoken. So the five major ethnic groups in Wakanda could all potentially have their own very different languages.

What about the comics?

The character and country were created by Stan Lee and Jack Kirby in 1966. Both white guys, neither linguists. So there are a lot of elements of the Black Panther mythos that have names that sound, well, like what a white guy would make up to sound exotic and African (or look it on the page). That said, certain things are just part of the canon. So The kings have an (evidently) ejective /t'/ as the first part of their names. The all female fighting force, the Dora Milaje are called what they're called. Anyone contracted to construct languages for the MCU will have to work with the existing material, much like how Marc Oakrand developed Klingon by building around what was already uttered on-screen in Star Trek.  And, that will have an effect on the backstory and character development. To my knowledge, Ta-Nehisi Coates and other recent writers have not done a deep dive into the linguistic side of Wakanda, but we can't really expect Ta-Nehisi to solve everything for us.

What's spoken in that area?

As I mentioned above, that particular (vague) part of East Africa has representation from a few of the major families: Afro-Asiatic, Niger-Congo A, Niger-Congo B ("Bantu" languages), and Nilo-Saharan languages.

In Kenya, there is, of course, Swahili --- a Bantu language spoken by 50 to 100 million people and a lingua franca for the region. Swahili's huge number of speakers means you can hear it on internet radio if you want. It also means that it has lost lexical tone (when the pitch of a word or syllable changes the meaning), and because it's used for trade by so many people who speak so many other languages as their native language, it is relatively regular, meaning there's not a lot of unpredictable grammatical stuff.

But there's also a lot else spoken there. Kenya alone is home to 68 languages. The most prominent of which are Kikuyu, with 8 million speakers, and Dholuo, or Luo, with 4 to 5 million speakers.

The latter, Dholuo, is not a Bantu language, but a Nilo-Saharan language. What's the difference? The main difference is that all the Bantu languages group nouns into types (think gender in European languages, except there's 10-17 of them). Every noun has a prefix for its noun class, and the prefixes generally com in pairs (singular vs plural). So in Zulu (and Swahili!) the base form for the noun 'person' is ntu. But this doesn't just show up on its own. Rather, it has one of these noun class prefixes, as in :

  • umuntu 'person'
  • abantu 'people' (hence the name for the languages...they all call people some form of "bantu")
  • ubuntu 'humanity, humanness' (whence also the operating system).

So you can get phrases like umuntu ngumuntu ngabantu: "a person is a person through other people".

Bantu languages also generally have a LOT of sounds, but simple syllable types, almost always CV --- Consonant Vowel. You'll never see a word like English strengths. This is obscured by the writing a bit, so for instance, <ng> in Zulu is one sound, not two (the sound of <ng> in sing). Swahili also has syllabic nasals, so for instance, the <m> in mzungu 'white person' is it's own syllable: m-zu-ngu.

Back to Luo: Luo has vowel harmony, meaning all the vowels in a word have to share the same feature. What's the separating factor? How advanced your tongue root is. So words with the vowels in (an American pronunciation of) bean, bait, bot, boat, and boot, are one class, and words with the vowels in bin, bet, bat, bought, and foot are in another. A single word will not have vowels from both groups, only one.

Even cooler, Luo grammatically distinguishes between alienable and inalienable posession, so for instance, the word for a dog's bone has different forms depending on whether you mean the bone is part of the dog's skeleton, or a cow bone it's chewing on. If it can be taken away, it's got a suffix marking that fact.

Wakanda is also close to Ethiopia and South Sudan, where Afro-Asiatic languages are spoken. The most well-known subset of these are the Semitic languages, which include Arabic and Hebrew, but also languages Americans are often less familiar with, like Amharic, spoken in Ethiopia.

Amharic, like other Semitic languages, has what's called non-concatenative morphology, meaning that words aren't always built by adding prefixes, suffixes, or infixes, but are instead built with a system of (unpronouncable) roots that combine with vowels in between. The standard example linguists use is from Arabic (also spoken in that region), where k-t-b is always in things related to books and writing, but the vowels make it mean different things: kitaab 'book', kataba 'he wrote', kutib 'was written', etc. Amharic, like Swahili, has a massive number of speakers: roughly 22 million. It also has an objectively cool writing system.

Semitic languages like Amharic and Ge'ez are not the only Afro-Asiatic languages, though. To the south of Lake Victoria (so, somewhere sort of near Wakanda?) Iraqw, a Cushitic language, is spoken by approximately 460,000 people (because it's spoken by a much smaller number of people, the best video I could find was about porcine cysticerosis --- tapeworm in pigs).

And of course, we've established that Xhosa is MCU head canon (I really want to know the back story of how they first arrived in Wakanda, reversing the Bantu Migration, and how they rose to power!), which means that one could expect to hear clicks in Wakanda, too.

Wakanda Forever!

Given pre-release ticket sales alone, it seems like Hollywood has been sleeping on Black Panther's type of pan-African magic just the way the rest of the world has been sleeping on Wakanda's advanced technological civilization. If we're lucky, BP is going to be a smash hit with future films, TV series, Spinoffs...and maybe we'll get to hear the sounds of Wakanda just as we hear the sounds of Essos and Valyria, Middle Earth, and Qo'noS.

A great resource for the IPA

One of the best tools a linguist uses is the International Phonetic Alphabet, however learning it can feel daunting. I have historically referred students to the wikipedia page on the IPA, because it has links to individual pages for each sound, with descriptions of how the sound is produced, and audio recordings.

Now, there's another tool: an interactive IPA chart with a cross-sectional MRI so you can see the position of the tongue, lips, velum, etc. while a sound is being produced.

It's courtesy of the UCLA Speech Production and Articulation Knowledge Group, and can be found here.

One caveat: of the five available speakers, John Esling is the only one who pronounces the alveolar click /!/ correctly. Everything else seems to be great across all speakers.



Fun With Morphology!

Causative Smallening

Friends and family members have recently said some morphologically interesting things, and I decided to take a quick second to put them down here, for posterity, because they're so freaking cool.

The context for the first was manipulating images for a slideshow. The sentence used was:

I smallened it

Everyone clearly understood it as "I made it smaller," and also knew that it was non-standard. But why?

Well, some adjectives can be made into inchoative verbs. This means if you have some adjective X you can make a verb that means 'to become X'. It's super easy: just add an -en to the word:

  • darken: to become dark
  • redden: to become red
  • liven: to become more alive/lively
  • quicken: to become quick.
  • leaven: to raise (from an older word in English we no longer have, ultimately from Latin levare 'raise')
  • toughen: to become tough
  • smarten (up): to become smart

These can also then be made transitive and are then causative verbs, meaning someone causes something to become X.

The thing is, it's normally taken to apply only to what linguists call a "closed set" which is a fancy way of saying you can only do it to some adjectives and not others. That is, it sounds weird to say "dumben it" (instead of "dumb it down") or "absurden the story" or "spicen the food."

And yet, we all have the grammatical competence to be able to (playfully) generalize to new instances, so everyone knew what "I smallened it" meant.


When linguists get to the morphology segment of Intro to Linguistics, we teach "bracketing" as a tool for recognizing the internal structure of words. It's literally drawing brackets around word-pieces (let's call them morphemes). For example:

  • [ nation ]
  • [[ nation ] al ]
  • [ inter [[ nation ] al ]]

Some kinds of ambiguity are then easy to explain, as in, the door is:

  • [ un [ [ lock ] able] ]  == unable to be locked ~ un-lockable
  • [ [ un [ lock ] ] able ] == able to be unlocked ~ unlock-able

Similarly, we can bracket words that go together in sentences:

  • [ [that ridiculous man ] [ looks [ dumb ] ] ]

Sometimes, though, things break free. A classic example is the suffix -ish, which for many people now can modify much more than adjectives:

  • It was a yellowish color.
  • I guess I was excited about it, ish

All of that was to get to a family member recently saying:

There's no point in waiting to leave, it's not going to get any not dark er

That is, it's not going to get any [ [ not [ dark ] ] er ], where -er is modifying the complex structure not dark.

Often, linguists will treat these kinds of examples as mistakes, play, or somehow not part of the object of study (and make pronouncements like "inchoatives derived from adjectives are a closed set" and sometimes even claim that words like smallen are "impossible"). I think it's important that we take these kinds of novel forms --- forms that sometimes challenge theory we've learned in grad school --- seriously. In part, because if you start listening for them, they happen all. the. time.

Happy listening!



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Habitual hiring

I recently came across a couple of images of African American English use in hiring signs. I think they could be an excellent tool for teaching about AAE in Introduction to Linguistics, or Intro to Sociolinguistics, since

  1. neither is 'standard' English
  2. they have a difference in meaning
  3. The difference in meaning affects strategy for someone on a job hunt.

So without further ado, let's say you've been out hunting for work, and you're down to your last resume. Which of these two places do you take your last resume?


If you don't speak AAE and don't know about its system of tense and aspect (which is more complex than mainstream American English), you may think it's a toss-up between the two.

However, you'd be wrong.

  • we hiring features what's called copula deletion, which is common in many languages (including Russian, Arabic, and others). It means "we're hiring (right now)".
  • we be hiring makes use of habitual 'be' which is a grammatical marker of, well, habitualness. It means "we are usually/habitually/often hiring."

Therefore, if we're to trust the signs, you've got a better chance of being hired right now going to the first store.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Bare Subject Relatives and the Sophisticated Complexity of AAE

[Trigger warning: My focus here is on a syntactic phenomenon, but a video example I'll be focusing on includes the threat of police violence and a man hypothesizing about his death at the hands of the police. The man in question is alive and unharmed.]

I've been thinking a lot lately about the complexity and sophistication of AAE syntax. Much of the work and outreach around AAE in the last 50 years has been trying to demonstrate that AAE is neither deficient nor wrong. There's a big jump, however, between not wrong, which is the end goal for many linguists, and marvelously rich, which seemingly hasn't percolated through the field much beyond AAE specialists. Christopher Hall, a colleague and friend, often implores lay people to flip the script and think about language with the starting point that AAE is the default against which other dialects should be judged.

My focus here is a syntactic feature of some varieties of African American English that doesn't get as much attention, but that is surprisingly common (especially in the South) is referred to as null subject relatives, or bare subject relative (clauses).

A recent, salient example can be found in this video a motorist took of himself chastising a cop for approaching his car with his (the officer's) service weapon drawn, about 15 seconds from the end. It probably goes without saying, but the video may be triggering for some.

[Link to video here]


The subtitles say "Dad shot dead by a cop who made a mistake," however this is yet another case of reporters "translating" AAE. The gentleman actually said:

"Dad shot dead by a cop made a mistake".

What's going on here?

Well, first, a relative clause is like a little mini sentence or sentence fragment that adds more information about a part of the main sentence. For instance:

  • That is the man [who I saw yesterday]
  • That is the man [who saw me yesterday]
  • The book [that I recommended to you] is on sale now.

In most varieties of English, you can delete -- that is, not say -- the relative marker (who, which, that), if it refers to the object of the relative clause.

To take the first example, the man who I saw yesterday, we can rework the relative clause as meaning something like "I saw him yesterday." In fact, many varieties of English make use of such resumptive pronouns, so it would be perfectly natural to say "That's the man who I saw him yesterday." And unsurprisingly, this kind of things is cross-linguistically common, and in some languages it's obligatory.

So if it's:

  • I (subject) saw him (object)

Most varieties of English allow you to do away with the relative marker:

  • That's the man who I saw yesterday
  • That's the man ___ I saw yesterday

AAE is interesting in that it also allows deletion of the relativizer if it marks the subject.

  • That's the man who saw me yesterday.
  • That's the man ___ saw me yesterday.

This is pretty well described in the literature, so for instance, Stefan Martin and Walt Wolfram have a chapter in Salikoko Mufwene's book African American English: Structure, History, and Use that gives a ton of excellent examples:

  • He the ___ man got all the old records
  • Wally the teacher ___ wanna retire next year
  • Jill like the man ___ met her brother last week

The above example in the video was particularly interesting because syntactic structure of the full utterance is extremely complex.

There's a pernicious and widespread view that AAE, or "ebonics" is somehow inferior or defective. It's widely regarded as both "simpler" than "standard" English, and simpler in ways that are "broken" or "wrong." However, not only does it have more complex grammar in some respects, but AAE speakers deploy sophisticated combinations of syntactic structures even under extreme stress. The sentence the motorist in the above video uttered makes use of:

  1. An "imposter" construction in which the speaker is understood to mean himself when using a name/title ("Daddy") instead of a first person pronoun ("I").
  2. Copula deletion ("Daddy shot" instead of "Daddy was shot"). This is very common cross linguistically, and is standard in Arabic, Chinese, Russian, etc.
  3. A resultative compliment to the verb ("shot dead")
  4. Passive voice --- with copula deletion --- which we understand because of the resultative. Compare "Daddy shot a gun" vs. "Daddy shot dead."
  5. A bare subject relative ("a cop ___ made a mistake").

This is a sophisticated interlocking clockwork of syntactic structures, produced under extreme stress. A tree diagram of this sentence would show all kinds of movement and deletion. And there's some evidence that people who speak other dialects do not have the complex grammatical knowledge to correctly parse this kind of utterance. And yet, people like this motorist are routinely treated as though their language is deficient.

It's a starting point for us linguists to point out that AAE is rule-governed and syntactically well-formed. However, I don't think this goes nearly far enough. "Technically not inferior" is a far cry from the truth: AAE is a varied, complex, sophisticated language variety that makes use of many complex grammatical rules that "standard" English lacks. AAE speakers are doing things other people don't understand, and not because the AAE speakers are wrong, but because they have a fuller syntactic toolbox.




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Lately, I've been noticing a particular phenomenon in speech, in which the word "particularly" is pronounced more or less as "partickerly" or "partickly."

It turns out I'm not the only one to notice this, as Mark Liberman has an excellent, and much more in-depth description of the phenomenon at Language Log, with a ton of excellent audio.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

A linguist's take on the Great GIF Controversy

The Conflict:

For years, the English-speaking internet has been divided. We cannot agree on how to pronounce gif, the acronym for graphics interchange format. Much as with the dress, each side thinks their own position is the only correct one, and that the other side is absolutely crazy. And much as with the dress, it's probably a little more complicated.

People write articles with titles like you are 100 percent wrong about how to pronounce gif. People share mocking gifs with arguments bolstering their point of view. People yell at one another. Things get entirely too heated.

I intend to shed some light on this situation.

The Options:

There are technically three ways you could pronounce gif in English, although the conflict is over the first two. The three are:

  1. so-called "hard g" which linguists represent with /g/. This is <g> as in "gift".
  2. so-called "soft g" which linguists represent with /d͡ʒ/. This is <g> as in "George." It is also sometimes represented with <j> as in "Jazz".
  3. The "French" or "super soft g", which linguists represent with /ʒ/. It is in (some) pronunciations of "rouge". (Note that some English speakers "nativize" words with this to have the /d/ sound in the "soft g", so what I call "baton rouge" they may call "baton roudge".

While I relish in ironically using the third option and watching people on both sides of the hard/soft g debate lose their minds, I recognize that nobody is going to take seriously the argument that "French g" is correct.

The Arguments:

Arguments for "hard g":

  1. It's an acronym, and the word the <g> comes from is one where it is pronounced "hard" (namely, "graphical").
  2. We often pronounce acronyms differently than we would pronounce a word spelled the same way (CIA is "see eye aye" and not "kia").
  3. Feelings. People have really strong feelings that this is the only correct way.

Arguments for "soft g":

  1.  Lots of words spelled with <gi> are pronounced with a "soft g": ginger, gin, giraffe, giant...
  2. It's easier to pronounce gif as a word and not as an acronym. Nobody is actually saying "gee eye eff". If you're going to make it a word, then make it a word!
  3. "Foreign" words often have a "soft g" (giraffe...).
  4. Feelings. People have really strong feelings this is the only correct way.

A dash of science:

I decided to take a look at this list of over 58,000 (relatively common) English words, and see what the patterns are for g-words.

There are 1836 words that start with <g> in this list, and there's not a clear rhyme or reason to the choice of "hard" versus "soft" g, so one would have to look at each of them to get a sense of the overall pattern. That's a pain in the ass. However, there is a helpful fun fact from linguistics that can constrain this problem a bit more:

"soft g" often comes from a combination of sounds, historically: a "hard g" followed by a non-low front vowel. What does that mean? That means that for the vowels /i/ "bead", /e/ "bade", /ɪ/ "bid", and /ɛ/ "bed", your tongue is actually higher in your mouth, and closer to the front of the mouth than it is for the vowels /u/ "booed", /o/ "bode", etc. The "hard g" sound is made by the back of the tongue forming a closure at the back of your mouth. These high front vowels tend to cause people to move their tongues slightly forward, and over time (we're talking hundreds of years) the sound changes to one made intentionally further forward. "Soft g" is created by a tongue closure further forward in your mouth than "hard g". Try saying words with them and pay attention to where your tongue is. (Try it! It's fun!)

This fact is part of why Italian spelling is so weird, for anyone who's tried to learn Italian.

All of that means I don't need to bother with words like "goof" because nobody is going to pronounce that with a "soft g."

So I chose to limit myself to words that start with <gi>. It turns out there are 102 of them, which meant I could simply read them and split them into "hard" and "soft". Of those, 30 are "soft" and almost all of this are of foreign origin.

30/102 (29.4%) of words that start with <gi> have a "soft g."

It's not entirely unreasonable then to thing that gif should perhaps be pronounced with a "soft g." People will argue There are more with a hard g, and that's true, but the same people will say that "soft g" is crazy, which is clearly not true.

BUT WAIT. What about words with <ge> you ask? I'm glad you asked. There were 223 of those. Of them, 197 were pronounced with a "soft g" (e.g., gene, gender, geriatric, geology, gelatinous...).


197/223 (88.3%) of words that start with <ge> have a "soft g."

This means that:

Of all of the words with <g> where it could be pronounced hard or soft, 227/325 (69.8%) are pronounced with a "soft g".

It's also worth noting that in the particular list I have, fully 38% of the words are <g> either <i> or <e> and then <n>. This is important, because many people have what is referred to as the PIN-PEN merger, meaning that <i> and <e> before <n> are pronounced the same. That means Jim and gem are both pronounced the same (namely, as Jim). This is a feature of Southern American English, pretty much the entirety of the West, most of Canadian English, and most of African American English. A LOT of people do this.

This means that even if they're limiting themselves to only words that are pronounced <gi>, there are 109 more words in this list that they believe are pronounced with the "ih" vowel than if they don't have the PIN-PEN merger.


For people with the PIN-PEN merger, 139/211 (65.8%) of <gi> words are pronounced with a "soft g."

The Takeaway:

Even if people are being completely rational about their decision about how to pronounce gif, it's informed by their dialect, and their personal pronunciations of other words. While it is rational to say "it's from graphics which has a 'hard g'" Nobody is saying "gee eye eff" (which coincidentally, has a "soft g"). While it's rational to say that foreign words are often nativized with a "soft g" (like giraffe), nobody says "gift" with a "soft g".

Finally, even if people are thinking statistically about it (even if it's sort of "fuzzy" math based on what they have heard in their life and not hard numbers), The conclusions they come to are dependent on their dialect, speech community, and vocabulary.

This is why I ironically go with the "French g": if you have strong feelings about the pronunciation of gif, no matter what they are, you're probably wrong. And if you're having the argument, it's because someone tried to share an image with you. Why not just be nice, instead of pedantically (and no matter what side you choose, wrongly) lecturing your acquaintances on how to say words?




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Bill Maher, the N-word, and that pesky R

[Trigger warning: n-words]

Bill Maher is in the news right now for dropping the n-bomb on his show in a context that many, many people found offensive. Predictably, people are coming to his defense with two arguments: (1) he was referring to himself, and (2) he "didn't say the /r/."

As a linguist, and as one of the handful of us who has given serious thought to the n-word(s), (shout out to Christopher Hall, to Arthur Spears, and to Geneva Smitherman) I want to weigh in with a (socio)linguistic perspective. My argument is:

  1. It was not ok for him to say either, and,
  2. White folks (in general) should not say either if they don't want to offend, because
  3. It is an artificial distinction for most white people, if they are borrowing from a dialect they do not speak...and the vast majority of white people do not speak (or understand) African American English, natively or otherwise. And also,
  4. In most white people's native dialect, the only n-word is a slur.

Elsewhere, Christopher Hall and I have written about the grammatical and social functions of the n-words in some varieties of AAE. We argued that there are multiple words that all include the "n-word" that fulfill various grammatical and pragmatic functions: from first person pronouns to social distance markers, to politeness (yes, politeness) forms. If you are not a native speaker of AAE, it is easy to misunderstand these uses because they are what Arthur Spears coined the term "camouflage constructions" to describe. That is, they look like they might mean something else, and so people assume they understand when they don't. Recent pilot work on cross-dialect comprehension that I worked on with a team at U Penn and NYU confirms that in general, white folks don't understand the range of uses of the n-words.

More importantly, these are uses that occur in African American English, which is a dialect that has its own accent (really, range of accents, but we'll set that aside for now). Crucially, most forms of AAE are what linguists call non-rhotic, meaning /r/s after vowels are often not pronounced. Many white dialect varieties are not non-rhotic, including Bill Maher's normal speech. So Maher will make the argument that nigga and nigger are different words, and that he said the "acceptable" one.

HOWEVER, Maher, I would argue, only has nigga in his vocabulary as a taboo deformation of the word nigger. It's the same as claiming he didn't call someone bitch, he called them betch, or bish. The point is to say "I technically didn't say the word" while still saying the word.

Here's the crux of my argument: If you don't speak AAE, whether you borrow AAE sounds or not to say nigger doesn't change what you're saying. For people to be comfortable (or less uncomfortable) with Maher's use of nigga, he'd have to (1) use it in the appropriate social context, which this was not, and (2) back it up with literally any other features of AAE... and this would still probably not make it ok. As is, he was just "being edgy" by saying a taboo word he knew would offend.

That is, we white folks don't get to say "I was using that word like you people do!" without actually being able to use any other words like AAE speakers. If the accent is right, if the word choice is right, if the grammar is right (yes, you can butcher AAE grammar --- it is as systematic and rule governed as any other language variety), and if the cultural context is right you can maybe get away with speaking AAE as a white person. Notice I didn't say "saying the n-word". That's still pretty much off the table. Even if you understand the grammar, social function, and pragmatics of use. 

Here are some tips and general rules of thumb around the n-words if you don't want to offend, and you're white in America:

When you can say "nigger" without offending:

  1. maybe in citation, either directly quoting old racist stuff, or discussing the word itself, best if at a linguistics conference or conference on race, and even then you might encounter pushback.
  2. never in casual conversation.

So basically, you can't.

When you can say "nigga" without offending:

  1. To a POC who has specifically said to you "yo, we cool, you can call me nigga. You get a pass." To that person ONLY. Probably not within earshot of anyone else. I've never heard of this situation occurring, but who knows. Also, even if you find yourself in that situation, if you actually do it, I'm not saying it's gonna go great, or that I endorse that path. 
  2. discussing the word nigga in citation form at a linguistics conference. And even then, not everyone will agree.
  3. Never in casual speech.

So theoretically it's possible, but maybe just don't.

The distinction between r-full and r-less forms has a long history, and linguists are not remotely settled as to the history of the word (for instance, Hiram Smith argues the semantically neutral r-less form goes back 200 years or more). While it's interesting, it's completely orthogonal to the question of whether it's appropriate for white people to say it. Because it has been a slur in white English from its beginnings to literally right now, in both r-full and r-less varieties of white English, people like Bill Maher don't get to decide that it no longer has all that historical baggage.

And even if you deeply understand its use in AAE speaking communities, and participate in those communities, if you actually care about the people in those communities, you still won't say it.  Even when it's linguistically appropriate. Because our language use is culturally and socially situated.




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!



I ran all Trump's tweets through a neural net to try and figure out the meaning of Covfefe. Here is what I learned.

Unless you've been living under a rock, by now you probably know that just after midnight two days ago, Donald Trump tweeted:

Despite the negative press covfefe

Twitter went wild, predictably. For two days now, there has been heated debate about (1) how to pronounce covfefe, and (2) what covfefe means. Yesterday, Trump's press secretary Sean Spicer declared that Trump meant to type covfefe, and that its meaning was known to Trump and a select few others.

In an attempt to get to the bottom of this mystery, I decided that semantic Word Embedding Models might be useful. I have written about such models elsewhere. The (extreme oversimplification) general gist is that if you treat some document or documents as a big bag of words, you can start to treat individual words as being related to one another by their position in a (high dimensional) space. Like words cluster together, dissimilar words are far apart in this vector space. The actual implementation is technically a "Feed forward neural net" that "fine tunes through back propagation," but this is all linear algebra and code and ignores the fun of it.

In order to try to get at the Mystery of Covfefe, I decided to train a word2vec (that is, word to vector) model in R, using Ben Schmidt's wonderful package in R. In order to do so, I first needed to gather all 30,999 Trump tweets (at the time I gathered them). I did so by cloning the Trump Twitter Data Archive (note: if you have a cool coding idea, chances are someone did most of the work already. I'm learning half of coding well and fast is just finding the appropriate already collected data, already written module/library, or already worked recipes).

Once I gathered all 30,999 Trump tweets, I needed to clean them. I did minimal cleaning on the data set, so I just made all words lowercase, eliminated punctuation, and eliminated common "stopwords" -- words like "and, are, in, at, be, there, no, such" etc. This has the effect of normalizing a bit, so sad and SAD! are treated as the same word. I have not yet gotten around to lemmatization: grouping words like ran, run, running all under "run", but I'm not sure to what extent that will really affect the output.

Having run the results through Word2Vec, I did some quick sanity checks by investigating which words are the closest to a handful of given words. Closest to could? would, honestly, and can. Closest to america? safe, again, outsider, make, lets. Closest to new? york, hampshire, albany, yorkers

Clearly, it's working the way we would want it to, but are these really Trump's tweets? Closest to hillary? clinton, email, unfit, crooked, judgement, 33000, temperament. Closest to rosie? odonnell, theview, unprofessional, rude, bully.  IT WORKS!

As I did before, I chose to visualize the word embedding space by using t-SNE (for t-distributed stochastic neighbor embedding). This does not preserve relationships exactly, but keeps near things near to one another and far things far. I present the full results for your enjoyment:

 Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Some really fun/interesting/hilarious clusters emerge. There's read book art deal. There's barackobama obama obamas china iran. There's my favorite: totally sad bad terrible wrong. There's the small cluster of bush cruz. There's scotland golf course.

What's missing? Covfefe.

So I decided to up the size of the model and include more words. Normally, you want 200-500 vectors in a model like this. I gave it 1000. The results are even better.



This model results in a cluster: realdonaldtrump mr awesome 2016. And, as a quality check, crooked is still right next to hillary.

But where's covfefe?

STILL not in the model. When I manually search for it, it shows up as excluded from these findings, and is returned next to realdonaldtrump, you, and i. Which is, frankly, perfect. Perhaps Covfefe is the word for all of us together with realdonaldtrump.

I know that's kind of a cop-out, but in the process, I learned a few other interesting things. In no particular order:

First, pick almost any word and the top 10-20 nearest words in either of the resulting vector spaces will include some negative sentiment. GOP? Establishment. Christian? Jailed. Beheading. Media? Fake.   He's even hard on Russia in tweets: Russia? Traitor, laughs, taunting.

Second, closest to Ivanka? Daughter. For Barron, you have to wait till number 5 for "son" (most of the top 10 are family related words, or the names of family members).

Third, closest words to usa are miss, pageant, missuniverse, and perplexingly moscow. If you subtract pageant the closest word to usa is...balls. Checks out. Also, further down the list needs, trump, and businessman.

This brings me to one of my favorite findings. A classic example of word embeddings capturing something about semantics is that on other data sets these models have been trained on, you can add and subtract vectors meaningfully. So for instance,

paris - france + italy = rome

...which is intuitively correct. The classic example is:

king - man + woman = queen

Trump doesn't use the words man or woman all that much, actually, so in Trump's world:

king - man + woman = larry

I'm certain there are other relationships in the data that I've missed, but if there's anything that's clear from the above, it's that word embedding models really, really, really work (even if adding or subtracting "man" and "woman" are basically adding and subtracting zero, in Trump's tweets). I love the examples from cookbooks, historical newspapers, and RateMyProfessor reviews, but there's something really validating about these results, in part because Trump's speech (and twitter speech) is so colorful, and the above so clearly accurately captures it.

Finally, it looks like covfefe is off the charts, even for the surprisingly regular logic of Trump's twitter.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Cablese and Wirespeak

I'm always interested in jargons, cants, patois(es?) and codes, and recently learned that my father-in-law, a career newsman, didn't just have a problem with his throat this whole time, but rather has been communicating with me and in Cablese (and movie references).  I knew it wasn't that he had a problem with his throat, but I had no idea what was going on. Allow me to explain:

For about a hundred years after the invention of the telegraph, the main way news was shared was using the telegraph. Converting a story written in regular old English to Morse code was time consuming, expensive, and crucially, charged by the word. And you couldn't just stick words together, like say joining phrasal verbs like WRITEUP for "write up." That's obviously two words smushed together to get around being charged by the word. However, you could get away with making a new word like UPWRITE. Newsmen developed a complex system they used on the cables ("Cablese") as well as their own codes for the news wire ("wirespeak") and each news agency also had their own secret codes (so they couldn't get scooped). Even though they don't use the telegraph anymore, Cablese and Wirespeak live on. 

This last week, my father-in-law gave me the Rosetta Stone: the book Wirespeak: Codes and Jargon of the News Business. It is fantastic.

The book has chapters on Cablese, Wirespeak, and various news agency codes. It's chapter on Cablese is entitled "backwards run the words."

So how does it work? Well, first, anything that can be joined is. But backwards, so it's clearly a different word. So for instance, DOWNHOLD for "hold down." There's a story that when British writer Evelyn Waugh was asked to investigate a rumor a British nurse had been killed in an air raid he received the cable from his editor: SEND TWO HUNDRED WORDS UPBLOWN NURSE. Waugh investigated, found the rumors were untrue, and wrote back NURSE UNUPBLOWN.

That brings me to the second part of how it works: prefixes. everywhere. Most of them are Latin, but some are French, or other.


  1. CUM = with
  2. EX = from
  3. ET = and (e.g., MOM ETDAD)
  4. PAR = by
  5. PRO = for
  6. AD = to
  7. ANTI = against
  8. DANS = in (e.g., DANSRIVER 'in the river')
  9. UN = no, not
  10. POST = after
  11. PRE = before
  12. SUPER = on, over
  13. OMNI = all (e.g., OMNICHEERED 'everyone cheered')
  14. UNI = one
  15. SANS = without
  16. SUR = on

There are also suffixes:

  1. WARD = toward
  2. WISE = manner adverb
  3. EST = most. (Why we don't just say "est" for all superlatives will get its own blog post, to come later)
  4. ING = makes a verb from a noun, or light verb construction (This will also get its own post).
  5. SOME = full of (e.g., GLADSOME TIDINGS)

There is an apocryphal story that an international correspondent quit their job with the cable:


There are also a ton of one offs, like SMORNING for "this morning" and SNIGHT for 'last night.'


(taken from the excellent blog post on the subject Onwriting: Unearthing a lost language, which explains some of the more specific terms as well, like "thumbsucker" for "news analysis" and "art" for "photographer".)

So when my wife etme downwent DCward sweek advisit mother etfather inlaw, it outturns father unupmade weirdtalking. It was preupmade parnewsmen prehim. And now, postwise, I understand his texts meward.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Benedict Cumberbatch: Ye Shall Know Him By His Dactyls

[EDIT: I've been beaten to the punch, unsurprisingly by Gretchen McCulloch. I'm only 4 years late to this party, apparently.]

For some time, people on the internet have been playing with British actor Benedict Cumberbatch's name. They call him all manner of other names. And yet we all know who is being discussed. This post is about the simple reason why.

First, an example:

 This image is legit saved to my computer as "Crindlesnatch.jpg"

This image is legit saved to my computer as "Crindlesnatch.jpg"

Now, a few more examples. And why not throw in Reddit's favorites?

So what is going on here? A few things:

  1. Meter
  2. Vowels
  3. Bs and Cs???
  4. Context?

Meter is the most important. Cinnamon Thundercat has a distinctive name in that (1) he's almost always referred to by both first and last name, and (2) both his first and last name are dactyls. That is, they are a stressed syllable followed by two unstressed syllables. Once you know the context, any string of STRESSED-unstressed-unstressed STRESSED-unstressed-unstressed referred to as a person can be easily recovered as actually referring to Brandywine Crumplepuss.

Second, the replacements often have the same kinds of vowels in the same places. Most important seems to be that the last vowel be an /æ/ as in "batch."

Third, people often, but not always, use replacements that start with B and C.

Lastly, there's often a picture, or an introduction like "British actor ______." From here, it's clear who Battleship Crustybrunch refers to.

I don't have the time at the moment, but a true overkill analysis for the Hashtag SCIENCE fans out there would be something like:

  1. collect a corpus of name replacements
  2. have study participants rank them on felicity: how good are they at being "Benedict Cumberbatch names"?

Once you've got some large number of good ones:

  1. count how many start with B--- C---, B---- X----, or X---- C---- (where X is any other letter).
  2. count how many conform to a two dactyl pattern.
  3. run them all through some tokenizer, and associate each part with a pronouncing dictionary pronunciation (e.g., the CMUDICT pronouncing dictionary).
  4. Evaluate how well each maps its vowels to those in the original name.

The real question is how can his name and the game people are playing with it be so distinctive that when I talk about Enterprise Custardshirt, you think of Khan:

 The guy on the right...

The guy on the right...



...and not Kirk?!

For good measure, here's a Benedict Cumberbatch name generator. Enjoy!



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Why you probably didn't understand that one guy from Atlanta

When the first few episodes of Donald Glover's show Atlanta on FX aired, a lot of people were blown away by the writing and acting on the show and praised the authenticity of the characters. Others, however, were blown away by the fact that they simply did not understand what some of the characters were saying.




In particular, there's a short scene in which Donald Glover's character is in a detention center and is subjected to a short monologue by a man he met there, about how the man came to be in jail. I have included it here (under educational "fair use" --- FX still retains all rights. The scene is from Season 1, Episode 2. Season 1 can be purchased on YouTube, iTunes, etc.).

Much of my research is on regional variation, and on to what extent people understand different accents and dialects. I should say first, and most importantly, there is nothing wrong with this man. The character is not in any way intended as impaired (yes, I'm addressing this because it has been expressed to me). The way he speaks is a regional dialect, not any sort of personal idiosyncrasy. Plenty of other people from Atlanta, including some very famous people from Atlanta have the same accent. Some of them make a living as wordsmiths (Rich Homie Quan, Plies).

I have taken the liberty of transcribing the clip, and analyzing what the main triggers for misunderstanding are. I've been led to believe that his speech is typical of a certain subset of the population in Atlanta, and while there are more and more characterizations of regional varieties of African American English, to my knowledge Atlanta AAE is still under studied and under described.

The Transcript, for the curious:

I don’t believe this shit.

Ridiculous, man.

What'd you, uh, what'd you do to get in here?


Damn, man!

I should've just went home, boy.

Instead I’m in here locked up 'cuz of this fool I ain’t seen in about eleven years, man.

Boy I was at Five Points, bout to catch the bus, you feel me?

and this nigga I ain’t seen in eleven years come here talkin' 'bout “man, hey, listen here, hey boy I ain’t seen you in about eleven years boy, let’s hang out. go get a beer.”

So I follow him to the god damn gas station.

We get two beers.

We aint get but two of them, but they was the big ones, though.

They were the big ones.

Mmm, anyway, so, nigga like “man, come on let’s go-- go-- go to the house and drink ‘em.”

So we get to the house he’s like “man. My old lady.”

And so we just gonna drink ‘em on the porch. Feel me?

I’m like “boy, APD be rollin through here boy.”

And he, and he done talked me into it.

So sure enough, APD done rolled up and seen the god damn two cans out there, locked me up for public intoxication.

You know what I’m talking about?

Man, I’m in here man cuz this nigga, man, I ain’t seen (in) eleven years. Man, I’m gonna be in here till tuesday cuz I ain’t cashed my check.`

[That’s messed up]

Oh man, I should’ve went home, boy. Shit!

[Damn, man, I said I was sorry. I just ain’t seent you in like twelve years—]

Man! Fuck you Grady! Shut up!   

While some of the things people may misunderstand are questions of morphology (what word forms he uses) and syntax (how they fit together), I think far, far more important is his accent. Not only is the US still quite segregated, but --- some rappers notwithstanding --- much of the mainstream has almost no exposure to African American English from Atlanta. The triggers for misunderstanding are:

  1. AAE vowels: He has a number of features that are common across many regions, including the PIN-PEN merger (both words sound like "pin", and this extends to all words with en or em), monophthongization of ay so five might be more like fahv, etc.
  2. A shift in AAE vowels unique to the south: Just as white people from Chicago have vowels that have "rotated" from where many other Americans pronounce them ("I ride the boss to my jab!"), this guy's vowels are different than what you might expect if you're not familiar with his accent. For instance, his catch sounds like kitsch to most people, and his just sounds like jest to most people (the vowel is [ɛ], not [ʌ]).
  3. A strong preference for "open syllables." If we represent consonants with C, vowels with V, and syllable breaks with "." then there's a strong preference to reduce syllables to CV.CV.CV This generally means deleting anything after the vowel, unless it's an n or m, in which case that ends up just making the vowel nasalized, as in French. This means that believe (CV.CVC) is pronounced belee (in IPA: [bəli:]), fiveis pronounced fa, let's is pronunced leh,  and just (CVCC) is pronounced as jeh (CV) (in IPA: [d͡ʒɛ]). For many people this kills their ability to understand what they're hearing, although interestingly, it shouldn't necessarily.
  4. Deletion of unstressed syllables: public intoxication is pub toxication, eleven is lebm.
  5. AAE specific syntax: talmbout to introduce quotes (this nigga I ain’t seen in eleven years come here talmbout “man, hey, listen here, hey boy I ain’t seen you in about eleven years boy, let’s hang out. go get a beer.” to mean "this guy I hadn't seen in 11 years was like..."); nigga to refer to specific people; habitual be to mark usual or habitual behavior (APD be rollin' through here meaning "APD often comes this way"); perfective done to mark completed actions (And he, and he done talked me into it meaning "he (successfully) talked me into it"); and so on.
  6. Word final devoicing: sounds like b,d,g,v,z, are realized as p,t,k,f,s respectively, if they are at the end of a word. Moreover, b,d,g and p,t,k can be come a glottal stop (the sound in the middle of 'uh-oh') at the ends of words.
  7. Atlanta specific knowledge: APD is the Atlanta Police Department, Five Points is a place. If you understand the rest, you can figure this out from context. If not, you're cooked.
  8. The use of bwa ("boy") as a term of address, along with other (reduced) filler, like his very fast, very reduced "you know what I'm talking about."

All of these factors interact, so boy, I was at five points about to catch the bus ends up sounding to many white folks like bwa awa a' fapoi bouda kitschabuh yafih me? So many viewers who have never been exposed to Atlanta AAE could not even begin to figure out where the word boundaries are, let alone what the words themselves were. And even if you do figure out the word boundaries, many people might still be confused: I should a jeh went home bwa is just different enough for some people to think "man, I'm not sure what that was."

Some Notes

Syllable Codas:

Lots of research on AAE discusses deletion or reduction of things that happen at the ends of syllables or the ends of words, but they're all taken (justifiably) as different phenomena. So there's a rich literature on AAE that discusses:

  • possessive -s deletion: this is how you get things like baby mama for baby's mama. Or my best friend apartment door for my best friend's apartment's door. Basically, sometimes word final -s is deleted.
  • consonant cluster reduction: if a word ends in consonants that are pronounced with the tongue in the same location, you can drop the second if both are voiced or both are unvoiced: e.g., hand -> han', just -> juss. Basically, clusters of consonants sometimes lose some of those consonants.
  • deletion or vocalization ( = making a vowel) of r after vowels: The speaker above does this a lot. Vocalization is most clear in how he pronounces beer as biyuh. Basically, r sometimes is deleted.
  • deletion or vocalization of l after vowels: The speaker above does this a lot as well. An example is his pronunciation of fool as foow. Basically, l is sometimes deleted.

HOWEVER, there are ton of phenomena I've noticed but which are practically absent in the literature on AAE. For instance, the deletion of /v/ after vowels, which to my knowledge is only mentioned in one sentence in one article on AAE (Thomas, 2007). Most AAE speakers I know do this all the time, and the guy above is no exception: five points is fa poi (for the linguists: [fa.pɔ͡ɪʔ ]), believe is belee, etc.

Moreover, the discussion of the above syllable "coda phenomena" does not explain a lot of what the above speaker does. Entire syllable codas just disappear. The current literature on AAE states that people may delete the /t/ in just, but there's no real account for people who say things like jeh for just or gluh for gloves (in this case, I'm thinking of a famous-to-sociolinguists speaker from Philadelphia, recorded in 1981), or krima lih for Christmas List (e.g., everyone's favorite rapper, Plies.) Often, it's multiple morphemes (meaningful word 'pieces').

This is a topic I'm currently working on, and hope to have more to say later about the seeming dis-preference for codas in some varieties of AAE. For many, many words, it does not affect your ability to recover exactly what word was uttered. For instance, my fingers are cold because I forgot my gluh should be really easy to parse, because (1) there's no word gluh that would make you have to choose between possible words, and (2) context. We do this kind of thing all the time, since we don't always hear (or say) all the sounds in words. Spoken language does a really good job with a "noisy channel."

(For the linguists: While I'm writing about it, I might as well be the first to claim that: All obstruents higher on the sonority hierarchy than stops can be deleted syllable or word finally, and stops can all be realized as a glottal stop alone, for some varieties of AAE. Today, for instance, I heard [bli:ʔɪn] for bleeding)

The above speaker has pretty extreme reduction of codas, so let's hang out is leheygao:


but many viewers might be listening for something more like:



There seems to be a further vowel shift in progress in Atlanta AAE which has not been discussed much in the literature on AAE. Beyond what you would expect from southern AAE, a lot of Atlanta speakers have a couple of different vowels that what might be expected. A lot of linguists use what are called Lexical Sets to discuss accents. What this means is that we can talk about an entire class of words that all have the same vowel, and then state "the vowel in all of those words is thus-and-such in thus-and-such accent."  For instance, in most varieties of American English, the STRUT vowel (the vowel in words like strut, just, cuss, bus, cub, rub, hum, lunch) is written in IPA as [ʌ]. In the above clip:

  • the STRUT vowel is sometimes [ɛ].  So words like just and shut upsound a little like jest and shet epp. But bus is still [ʌ].
  • the TRAP vowel is sometimes [ɪ], which for most Americans is the vowel in words like ship, rip, dim. This is most pronounced in catch from catch the bus.

Overall difference:

There is a wealth of research on how we parse accents, and a couple of factors are at play here. First, AAE is heavily stigmatized in the US. The more it differs from middle class, 'standard', white speech, the more stigmatized it is. Second, because of the segregation in this country, many white folks simply do not understand AAE, even when we think we do (e.g., Rickford 1998, Rickford & King 2016, Jones & Kalbfeld 2017). Third, regardless the accent, when it's perceived to be difficult to understand, rather than improving with more exposure, experiments show that people basically shut down, and stop trying to parse it. Lastly, given racial/ethnic cues, people perceive accents where there aren't any. Here, there is clearly an accent, but the relevance of the last point is that people may already be predisposed to consider a black man in a jail detention center "hard to understand," or even "impossible to understand," and "not worth the effort."

A handful of my non-black friends assumed that the point of the scene was basically a gag -- that the guy was incomprehensible. I don't think that was the case, and that doesn't seem to be the impression my AAE speaking friends had either. He's just real Atlanta. That's part of why people love the show: there are tons of types of people that know from their daily lives that you just don't see on TV, but Atlanta gives them a spotlight, if only for a minute.

More broadly, though, the above points to a lot of interesting historical and sociological phenomena. Language change occurs when populations are separated. Generally, the way this is taught is by giving examples of European villages separated by mountains, where one town speaks differently than the next town over, because they don't interact often. However, as I'm going to argue in my dissertation, some populations in the US are separated by invisible mountains: residential and educational segregation. For some people, popular music, film, and television (including Atlanta) are now providing limited contact with people from "the other side of the mountain."




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Linguists have been discussing "Shit Gibbon." I argue it's not entirely about gibbons.


Earlier this week a Pennsylvania state senator called Donald Trump a "fascist, loofa-faced shit-gibbon."

There was an excellent post on Strong Language, a blog about swearing, discussing what makes "shit gibbon" so arresting, so fantastic, so novel, and yet... so right (for English swearing. Whether you believe "shit gibbon" is "right" as a characterization of Donald Trump is a personal assessment each person must make for themselves).

The post, The Rise of the ShitGibbon can be found here. I highly recommend reading it.

Most of the post was dedicated to tracing the origins and rise of "shitgibbon." The end of the post, however, catalogues insults in the same vein:

wankpuffin, cockwomble, fucktrumpet, dickbiscuit, twatwaffle, turdweasel, bunglecunt, shitehawk

And some variants: cuntpuffin, spunkpuffin, shitpuffin; fuckwomble, twatwomble; jizztrumpet, spunktrumpet; shitbiscuit, arsebiscuits, douchebiscuit; douchewaffle, cockwaffle, fartwaffle, cuntwaffle, shitwaffle (lots of –waffles); crapweasel, fuckweasel, pissweasel, doucheweasel.

I've actually been thinking about insults like this a surprising amount. Ben Zimmer points out about "Shitgibbon" that "...Metrically speaking, these words are compounds consisting of one element with a single stressed syllable and a second disyllabic element with a trochaic pattern, i.e., stressed-unstressed. As a metrical foot in poetry, the whole stressed-stressed-unstressed pattern is known as antibacchius."

I argue that this is correct, but that (1) there's a little bit more to say about it, and (2) there are exceptions.


First: I argue that the rule for making a novel insult of this type is a single syllable expletive (e.g., dick, cock, douche, cunt, slut, fart, splunk, splooge, piss, jizz, vag, fuck, etc.) plus a trochee. A trochee, as a reminder, is a word that's two syllables with stress on the first. Examples are puffin, womble, trumpet, biscuit, waffle, weasel, and of course, gibbon. Tons of words in English are trochees (have a relevant XKCD! In fact, have two! Wait, no, three! No one expects the Spanish Inquisition!). Because so many words are trochees, you'll have to pick wisely --- something like ninja might not be as humorously insulting as waffle.

That said, in principle, monosyllable expletive + trochee seems to give really good results. Behold:

fart basket, shit whistle, turd helmet, cock bucket, douche blanket, vag weasel, (I'm gonna be so much fun when I get old and have dementia. Good luck grandkids!), shit mandrill, piss gopher, jizz weevil, etc. etc. I can do this all day.

So, it's not the fact of being a gibbon per se. Various other monkeys would work: vervet, mandrill, etc. However, crucially, baboons, macaques, black howlers, and pygmy marmosets are out.

Moreover, it's not completely unlimited. Some words fit but don't make much sense as an insult: cock bookshelf, fart saucepan (which I quite like, actually), dick pension, belch welder.

Others sound like the kind of thing a child would say: fart person! poop human! turd foreman!

Yet others are too Shakespearean: fart monger! piss weasel!

Clearly some words (waffle, weasel, gibbon, pimple, bucket) are better than others (bookshelf, doctor, ninja, icebox), and some just depend on delivery (e.g., ironic twat hero, turd ruler, spunk monarch, dick duchess).


For a while, I've been discussing vowels in insults with fellow linguist Lauren Spradlin. Note that when we talk about vowels, we mean sounds, not letters. Don't worry about the spelling, try saying the below aloud. Spradlin has brought my attention to the importance of repeating vowels increasing the viability of a new insult of this form: crap rabbit, jizz biscuit, shit piston, spunk puffin, cock waffle, etc.

I would argue that having the right vowels actually gives you some leeway, so you can get away with following the first word with --- gasp! ---- a non-trochee! Be it an iamb (remember iambic pentameter?) as in douche-canoe, spluge caboose, or the delightfully British bunglecunt (h/t Jeff Lidz), or even more syllables: Kobey Schwayder's charming mofo-bonobo.

As you can see, this is a hot topic in the hallowed halls of the ivory tower. If the above simple formulae have motivated even one person to go out and exercise their own creativity to make a novel contribution to the English language, then I've done my job here as a linguist. Different people get into linguistics for different reasons, but this, this is what I live for. Get out there and make a difference!




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!