Bare Subject Relatives and the Sophisticated Complexity of AAE

September 13, 2017 by Taylor Jones

[Trigger warning: My focus here is on a syntactic phenomenon, but a video example I'll be focusing on includes the threat of police violence and a man hypothesizing about his death at the hands of the police. The man in question is alive and unharmed.]

I've been thinking a lot lately about the complexity and sophistication of AAE syntax. Much of the work and outreach around AAE in the last 50 years has been trying to demonstrate that AAE is neither deficient nor wrong. There's a big jump, however, between not wrong, which is the end goal for many linguists, and marvelously rich, which seemingly hasn't percolated through the field much beyond AAE specialists. Christopher Hall, a colleague and friend, often implores lay people to flip the script and think about language with the starting point that AAE is the default against which other dialects should be judged.

My focus here is a syntactic feature of some varieties of African American English that doesn't get as much attention, but that is surprisingly common (especially in the South) is referred to as null subject relatives, or bare subject relative (clauses).

A recent, salient example can be found in this video a motorist took of himself chastising a cop for approaching his car with his (the officer's) service weapon drawn, about 15 seconds from the end. It probably goes without saying, but the video may be triggering for some.

[Link to video here]

The subtitles say "Dad shot dead by a cop who made a mistake," however this is yet another case of reporters "translating" AAE. The gentleman actually said:

"Dad shot dead by a cop made a mistake".

What's going on here?

Well, first, a relative clause is like a little mini sentence or sentence fragment that adds more information about a part of the main sentence. For instance:

That is the man [who I saw yesterday]
That is the man [who saw me yesterday]
The book [that I recommended to you] is on sale now.

In most varieties of English, you can delete -- that is, not say -- the relative marker (who, which, that), if it refers to the object of the relative clause.

To take the first example, the man who I saw yesterday, we can rework the relative clause as meaning something like "I saw him yesterday." In fact, many varieties of English make use of such resumptive pronouns, so it would be perfectly natural to say "That's the man who I saw him yesterday." And unsurprisingly, this kind of things is cross-linguistically common, and in some languages it's obligatory.

So if it's:

I (subject) saw him (object)

Most varieties of English allow you to do away with the relative marker:

That's the man who I saw yesterday
That's the man ___ I saw yesterday

AAE is interesting in that it also allows deletion of the relativizer if it marks the subject.

That's the man who saw me yesterday.
That's the man ___ saw me yesterday.

This is pretty well described in the literature, so for instance, Stefan Martin and Walt Wolfram have a chapter in Salikoko Mufwene's book African American English: Structure, History, and Use that gives a ton of excellent examples:

He the ___ man got all the old records
Wally the teacher ___ wanna retire next year
Jill like the man ___ met her brother last week

The above example in the video was particularly interesting because syntactic structure of the full utterance is extremely complex.

There's a pernicious and widespread view that AAE, or "ebonics" is somehow inferior or defective. It's widely regarded as both "simpler" than "standard" English, and simpler in ways that are "broken" or "wrong." However, not only does it have more complex grammar in some respects, but AAE speakers deploy sophisticated combinations of syntactic structures even under extreme stress. The sentence the motorist in the above video uttered makes use of:

An "imposter" construction in which the speaker is understood to mean himself when using a name/title ("Daddy") instead of a first person pronoun ("I").
Copula deletion ("Daddy shot" instead of "Daddy was shot"). This is very common cross linguistically, and is standard in Arabic, Chinese, Russian, etc.
A resultative compliment to the verb ("shot dead")
Passive voice --- with copula deletion --- which we understand because of the resultative. Compare "Daddy shot a gun" vs. "Daddy shot dead."
A bare subject relative ("a cop ___ made a mistake").

This is a sophisticated interlocking clockwork of syntactic structures, produced under extreme stress. A tree diagram of this sentence would show all kinds of movement and deletion. And there's some evidence that people who speak other dialects do not have the complex grammatical knowledge to correctly parse this kind of utterance. And yet, people like this motorist are routinely treated as though their language is deficient.

It's a starting point for us linguists to point out that AAE is rule-governed and syntactically well-formed. However, I don't think this goes nearly far enough. "Technically not inferior" is a far cry from the truth: AAE is a varied, complex, sophisticated language variety that makes use of many complex grammatical rules that "standard" English lacks. AAE speakers are doing things other people don't understand, and not because the AAE speakers are wrong, but because they have a fuller syntactic toolbox.

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Partickly

September 13, 2017 by Taylor Jones

Lately, I've been noticing a particular phenomenon in speech, in which the word "particularly" is pronounced more or less as "partickerly" or "partickly."

It turns out I'm not the only one to notice this, as Mark Liberman has an excellent, and much more in-depth description of the phenomenon at Language Log, with a ton of excellent audio.

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

A linguist's take on the Great GIF Controversy

July 01, 2017 by Taylor Jones

The Conflict:

For years, the English-speaking internet has been divided. We cannot agree on how to pronounce gif, the acronym [edit: “initialism” is the technical term] for graphics interchange format. Much as with the dress, each side thinks their own position is the only correct one, and that the other side is absolutely crazy. And much as with the dress, it's probably a little more complicated.

People write articles with titles like you are 100 percent wrong about how to pronounce gif. People share mocking gifs with arguments bolstering their point of view. People yell at one another. Things get entirely too heated.

I intend to shed some light on this situation.

The Options:

There are technically three ways you could pronounce gif in English, although the conflict is over the first two. The three are:

so-called "hard g" which linguists represent with /g/. This is <g> as in "gift".
so-called "soft g" which linguists represent with /d͡ʒ/. This is <g> as in "George." It is also sometimes represented with <j> as in "Jazz".
The "French" or "super soft g", which linguists represent with /ʒ/. It is in (some) pronunciations of "rouge". (Note that some English speakers "nativize" words with this to have the /d/ sound in the "soft g", so what I call "baton rouge" they may call "baton roudge".

While I relish in ironically using the third option and watching people on both sides of the hard/soft g debate lose their minds, I recognize that nobody is going to take seriously the argument that "French g" is correct.

The Arguments:

Arguments for "hard g":

It's an acronym [initialism], and the word the <g> comes from is one where it is pronounced "hard" (namely, "graphical").
We often pronounce acronyms and initialisms differently than we would pronounce a word spelled the same way (CIA is "see eye aye" and not "kia").
Feelings. People have really strong feelings that this is the only correct way.

Arguments for "soft g":

Lots of words spelled with <gi> are pronounced with a "soft g": ginger, gin, giraffe, giant...
It's easier to pronounce gif as a word and not as an acronym/initialism. Nobody is actually saying "gee eye eff". If you're going to make it a word, then make it a word!
"Foreign" words often have a "soft g" (giraffe...).
Feelings. People have really strong feelings this is the only correct way.

A dash of science:

I decided to take a look at this list of over 58,000 (relatively common) English words, and see what the patterns are for g-words.

There are 1836 words that start with <g> in this list, and there's not a clear rhyme or reason to the choice of "hard" versus "soft" g, so one would have to look at each of them to get a sense of the overall pattern. That's a pain in the ass. However, there is a helpful fun fact from linguistics that can constrain this problem a bit more:

"soft g" often comes from a combination of sounds, historically: a "hard g" followed by a non-low front vowel. What does that mean? That means that for the vowels /i/ "bead", /e/ "bade", /ɪ/ "bid", and /ɛ/ "bed", your tongue is actually higher in your mouth, and closer to the front of the mouth than it is for the vowels /u/ "booed", /o/ "bode", etc. The "hard g" sound is made by the back of the tongue forming a closure at the back of your mouth. These high front vowels tend to cause people to move their tongues slightly forward, and over time (we're talking hundreds of years) the sound changes to one made intentionally further forward. "Soft g" is created by a tongue closure further forward in your mouth than "hard g". Try saying words with them and pay attention to where your tongue is. (Try it! It's fun!)

This fact is part of why Italian spelling is so weird, for anyone who's tried to learn Italian.

All of that means I don't need to bother with words like "goof" because nobody is going to pronounce that with a "soft g."

So I chose to limit myself to words that start with <gi>. It turns out there are 102 of them, which meant I could simply read them and split them into "hard" and "soft". Of those, 30 are "soft" and almost all of this are of foreign origin.

30/102 (29.4%) of words that start with <gi> have a "soft g."

It's not entirely unreasonable then to thing that gif should perhaps be pronounced with a "soft g." People will argue There are more with a hard g, and that's true, but the same people will say that "soft g" is crazy, which is clearly not true.

BUT WAIT. What about words with <ge> you ask? I'm glad you asked. There were 223 of those. Of them, 197 were pronounced with a "soft g" (e.g., gene, gender, geriatric, geology, gelatinous...).

So...

197/223 (88.3%) of words that start with <ge> have a "soft g."

This means that:

Of all of the words with <g> where it could be pronounced hard or soft, 227/325 (69.8%) are pronounced with a "soft g".

It's also worth noting that in the particular list I have, fully 38% of the words are <g> either <i> or <e> and then <n>. This is important, because many people have what is referred to as the PIN-PEN merger, meaning that <i> and <e> before <n> are pronounced the same. That means Jim and gem are both pronounced the same (namely, as Jim). This is a feature of Southern American English, pretty much the entirety of the West, most of Canadian English, and most of African American English. A LOT of people do this.

This means that even if they're limiting themselves to only words that are pronounced <gi>, there are 109 more words in this list that they believe are pronounced with the "ih" vowel than if they don't have the PIN-PEN merger.

So...

For people with the PIN-PEN merger, 139/211 (65.8%) of <gi> words are pronounced with a "soft g."

The Takeaway:

Even if people are being completely rational about their decision about how to pronounce gif, it's informed by their dialect, and their personal pronunciations of other words. While it is rational to say "it's from graphics which has a 'hard g'" Nobody is saying "gee eye eff" (which coincidentally, has a "soft g"). While it's rational to say that foreign words are often nativized with a "soft g" (like giraffe), nobody says "gift" with a "soft g".

Finally, even if people are thinking statistically about it (even if it's sort of "fuzzy" math based on what they have heard in their life and not hard numbers), The conclusions they come to are dependent on their dialect, speech community, and vocabulary.

This is why I ironically go with the "French g": if you have strong feelings about the pronunciation of gif, no matter what they are, you're probably wrong. And if you're having the argument, it's because someone tried to share an image with you. Why not just be nice, instead of pedantically (and no matter what side you choose, wrongly) lecturing your acquaintances on how to say words?

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Bill Maher, the N-word, and that pesky R

June 03, 2017 by Taylor Jones

[Trigger warning: n-words]

Bill Maher is in the news right now for dropping the n-bomb on his show in a context that many, many people found offensive. Predictably, people are coming to his defense with two arguments: (1) he was referring to himself, and (2) he "didn't say the /r/."

As a linguist, and as one of the handful of us who has given serious thought to the n-word(s), (shout out to Christopher Hall, to Arthur Spears, and to Geneva Smitherman) I want to weigh in with a (socio)linguistic perspective. My argument is:

It was not ok for him to say either, and,
White folks (in general) should not say either if they don't want to offend, because
It is an artificial distinction for most white people, if they are borrowing from a dialect they do not speak...and the vast majority of white people do not speak (or understand) African American English, natively or otherwise. And also,
In most white people's native dialect, the only n-word is a slur.

Elsewhere, Christopher Hall and I have written about the grammatical and social functions of the n-words in some varieties of AAE. We argued that there are multiple words that all include the "n-word" that fulfill various grammatical and pragmatic functions: from first person pronouns to social distance markers, to politeness (yes, politeness) forms. If you are not a native speaker of AAE, it is easy to misunderstand these uses because they are what Arthur Spears coined the term "camouflage constructions" to describe. That is, they look like they might mean something else, and so people assume they understand when they don't. Recent pilot work on cross-dialect comprehension that I worked on with a team at U Penn and NYU confirms that in general, white folks don't understand the range of uses of the n-words.

More importantly, these are uses that occur in African American English, which is a dialect that has its own accent (really, range of accents, but we'll set that aside for now). Crucially, most forms of AAE are what linguists call non-rhotic, meaning /r/s after vowels are often not pronounced. Many white dialect varieties are not non-rhotic, including Bill Maher's normal speech. So Maher will make the argument that nigga and nigger are different words, and that he said the "acceptable" one.

HOWEVER, Maher, I would argue, only has nigga in his vocabulary as a taboo deformation of the word nigger. It's the same as claiming he didn't call someone bitch, he called them betch, or bish. The point is to say "I technically didn't say the word" while still saying the word.

Here's the crux of my argument: If you don't speak AAE, whether you borrow AAE sounds or not to say nigger doesn't change what you're saying. For people to be comfortable (or less uncomfortable) with Maher's use of nigga, he'd have to (1) use it in the appropriate social context, which this was not, and (2) back it up with literally any other features of AAE... and this would still probably not make it ok. As is, he was just "being edgy" by saying a taboo word he knew would offend.

That is, we white folks don't get to say "I was using that word like you people do!" without actually being able to use any other words like AAE speakers. If the accent is right, if the word choice is right, if the grammar is right (yes, you can butcher AAE grammar --- it is as systematic and rule governed as any other language variety), and if the cultural context is right you can maybe get away with speaking AAE as a white person. Notice I didn't say "saying the n-word". That's still pretty much off the table. Even if you understand the grammar, social function, and pragmatics of use.

Here are some tips and general rules of thumb around the n-words if you don't want to offend, and you're white in America:

When you can say "nigger" without offending:

maybe in citation, either directly quoting old racist stuff, or discussing the word itself, best if at a linguistics conference or conference on race, and even then you might encounter pushback.
never in casual conversation.

So basically, you can't.

When you can say "nigga" without offending:

To a POC who has specifically said to you "yo, we cool, you can call me nigga. You get a pass." To that person ONLY. Probably not within earshot of anyone else. I've never heard of this situation occurring, but who knows. Also, even if you find yourself in that situation, if you actually do it, I'm not saying it's gonna go great, or that I endorse that path.
discussing the word nigga in citation form at a linguistics conference. And even then, not everyone will agree.
Never in casual speech.

So theoretically it's possible, but maybe just don't.

The distinction between r-full and r-less forms has a long history, and linguists are not remotely settled as to the history of the word (for instance, Hiram Smith argues the semantically neutral r-less form goes back 200 years or more). While it's interesting, it's completely orthogonal to the question of whether it's appropriate for white people to say it. Because it has been a slur in white English from its beginnings to literally right now, in both r-full and r-less varieties of white English, people like Bill Maher don't get to decide that it no longer has all that historical baggage.

And even if you deeply understand its use in AAE speaking communities, and participate in those communities, if you actually care about the people in those communities, you still won't say it. Even when it's linguistically appropriate. Because our language use is culturally and socially situated.

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

I ran all Trump's tweets through a neural net to try and figure out the meaning of Covfefe. Here is what I learned.

June 01, 2017 by Taylor Jones

Unless you've been living under a rock, by now you probably know that just after midnight two days ago, Donald Trump tweeted:

Despite the negative press covfefe

Twitter went wild, predictably. For two days now, there has been heated debate about (1) how to pronounce covfefe, and (2) what covfefe means. Yesterday, Trump's press secretary Sean Spicer declared that Trump meant to type covfefe, and that its meaning was known to Trump and a select few others.

In an attempt to get to the bottom of this mystery, I decided that semantic Word Embedding Models might be useful. I have written about such models elsewhere. The (extreme oversimplification) general gist is that if you treat some document or documents as a big bag of words, you can start to treat individual words as being related to one another by their position in a (high dimensional) space. Like words cluster together, dissimilar words are far apart in this vector space. The actual implementation is technically a "Feed forward neural net" that "fine tunes through back propagation," but this is all linear algebra and code and ignores the fun of it.

In order to try to get at the Mystery of Covfefe, I decided to train a word2vec (that is, word to vector) model in R, using Ben Schmidt's wonderful package in R. In order to do so, I first needed to gather all 30,999 Trump tweets (at the time I gathered them). I did so by cloning the Trump Twitter Data Archive (note: if you have a cool coding idea, chances are someone did most of the work already. I'm learning half of coding well and fast is just finding the appropriate already collected data, already written module/library, or already worked recipes).

Once I gathered all 30,999 Trump tweets, I needed to clean them. I did minimal cleaning on the data set, so I just made all words lowercase, eliminated punctuation, and eliminated common "stopwords" -- words like "and, are, in, at, be, there, no, such" etc. This has the effect of normalizing a bit, so sad and SAD! are treated as the same word. I have not yet gotten around to lemmatization: grouping words like ran, run, running all under "run", but I'm not sure to what extent that will really affect the output.

Having run the results through Word2Vec, I did some quick sanity checks by investigating which words are the closest to a handful of given words. Closest to could? would, honestly, and can. Closest to america? safe, again, outsider, make, lets. Closest to new? york, hampshire, albany, yorkers.

Clearly, it's working the way we would want it to, but are these really Trump's tweets? Closest to hillary? clinton, email, unfit, crooked, judgement, 33000, temperament. Closest to rosie? odonnell, theview, unprofessional, rude, bully. IT WORKS!

As I did before, I chose to visualize the word embedding space by using t-SNE (for t-distributed stochastic neighbor embedding). This does not preserve relationships exactly, but keeps near things near to one another and far things far. I present the full results for your enjoyment:

Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Some really fun/interesting/hilarious clusters emerge. There's read book art deal. There's barackobama obama obamas china iran. There's my favorite: totally sad bad terrible wrong. There's the small cluster of bush cruz. There's scotland golf course.

What's missing? Covfefe.

So I decided to up the size of the model and include more words. Normally, you want 200-500 vectors in a model like this. I gave it 1000. The results are even better.

This model results in a cluster: realdonaldtrump mr awesome 2016. And, as a quality check, crooked is still right next to hillary.

But where's covfefe?

STILL not in the model. When I manually search for it, it shows up as excluded from these findings, and is returned next to realdonaldtrump, you, and i. Which is, frankly, perfect. Perhaps Covfefe is the word for all of us together with realdonaldtrump.

I know that's kind of a cop-out, but in the process, I learned a few other interesting things. In no particular order:

First, pick almost any word and the top 10-20 nearest words in either of the resulting vector spaces will include some negative sentiment. GOP? Establishment. Christian? Jailed. Beheading. Media? Fake. He's even hard on Russia in tweets: Russia? Traitor, laughs, taunting.

Second, closest to Ivanka? Daughter. For Barron, you have to wait till number 5 for "son" (most of the top 10 are family related words, or the names of family members).

Third, closest words to usa are miss, pageant, missuniverse, and perplexingly moscow. If you subtract pageant the closest word to usa is...balls. Checks out. Also, further down the list needs, trump, and businessman.

This brings me to one of my favorite findings. A classic example of word embeddings capturing something about semantics is that on other data sets these models have been trained on, you can add and subtract vectors meaningfully. So for instance,

paris - france + italy = rome

...which is intuitively correct. The classic example is:

king - man + woman = queen

Trump doesn't use the words man or woman all that much, actually, so in Trump's world:

king - man + woman = larry

I'm certain there are other relationships in the data that I've missed, but if there's anything that's clear from the above, it's that word embedding models really, really, really work (even if adding or subtracting "man" and "woman" are basically adding and subtracting zero, in Trump's tweets). I love the examples from cookbooks, historical newspapers, and RateMyProfessor reviews, but there's something really validating about these results, in part because Trump's speech (and twitter speech) is so colorful, and the above so clearly accurately captures it.

Finally, it looks like covfefe is off the charts, even for the surprisingly regular logic of Trump's twitter.

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Cablese and Wirespeak

May 26, 2017 by Taylor Jones

I'm always interested in jargons, cants, patois(es?) and codes, and recently learned that my father-in-law, a career newsman, didn't just have a problem with his throat this whole time, but rather has been communicating with me and in Cablese (and movie references). I knew it wasn't that he had a problem with his throat, but I had no idea what was going on. Allow me to explain:

For about a hundred years after the invention of the telegraph, the main way news was shared was using the telegraph. Converting a story written in regular old English to Morse code was time consuming, expensive, and crucially, charged by the word. And you couldn't just stick words together, like say joining phrasal verbs like WRITEUP for "write up." That's obviously two words smushed together to get around being charged by the word. However, you could get away with making a new word like UPWRITE. Newsmen developed a complex system they used on the cables ("Cablese") as well as their own codes for the news wire ("wirespeak") and each news agency also had their own secret codes (so they couldn't get scooped). Even though they don't use the telegraph anymore, Cablese and Wirespeak live on.

This last week, my father-in-law gave me the Rosetta Stone: the book Wirespeak: Codes and Jargon of the News Business. It is fantastic.

The book has chapters on Cablese, Wirespeak, and various news agency codes. It's chapter on Cablese is entitled "backwards run the words."

So how does it work? Well, first, anything that can be joined is. But backwards, so it's clearly a different word. So for instance, DOWNHOLD for "hold down." There's a story that when British writer Evelyn Waugh was asked to investigate a rumor a British nurse had been killed in an air raid he received the cable from his editor: SEND TWO HUNDRED WORDS UPBLOWN NURSE. Waugh investigated, found the rumors were untrue, and wrote back NURSE UNUPBLOWN.

That brings me to the second part of how it works: prefixes. everywhere. Most of them are Latin, but some are French, or other.

Examples:

CUM = with
EX = from
ET = and (e.g., MOM ETDAD)
PAR = by
PRO = for
AD = to
ANTI = against
DANS = in (e.g., DANSRIVER 'in the river')
UN = no, not
POST = after
PRE = before
SUPER = on, over
OMNI = all (e.g., OMNICHEERED 'everyone cheered')
UNI = one
SANS = without
SUR = on

There are also suffixes:

WARD = toward
WISE = manner adverb
EST = most. (Why we don't just say "est" for all superlatives will get its own blog post, to come later)
ING = makes a verb from a noun, or light verb construction (This will also get its own post).
SOME = full of (e.g., GLADSOME TIDINGS)

There is an apocryphal story that an international correspondent quit their job with the cable:

UPSHOVE JOB ASSWISE

There are also a ton of one offs, like SMORNING for "this morning" and SNIGHT for 'last night.'

So you get cables like: SOS ETWIFE HEADS TOKYOWARD SMORNING SANSTOP. MUCHLY APC EYEBALL ARRIVAL. URGENTEST NEED THUMBSUCKER CUM ART.

(taken from the excellent blog post on the subject Onwriting: Unearthing a lost language, which explains some of the more specific terms as well, like "thumbsucker" for "news analysis" and "art" for "photographer".)

So when my wife etme downwent DCward sweek advisit mother etfather inlaw, it outturns father unupmade weirdtalking. It was preupmade parnewsmen prehim. And now, postwise, I understand his texts meward.

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Benedict Cumberbatch: Ye Shall Know Him By His Dactyls

May 01, 2017 by Taylor Jones

[EDIT: I've been beaten to the punch, unsurprisingly by Gretchen McCulloch. I'm only 4 years late to this party, apparently.]

For some time, people on the internet have been playing with British actor Benedict Cumberbatch's name. They call him all manner of other names. And yet we all know who is being discussed. This post is about the simple reason why.

First, an example:

This image is legit saved to my computer as "Crindlesnatch.jpg"

Now, a few more examples. And why not throw in Reddit's favorites?

So what is going on here? A few things:

Meter
Vowels
Bs and Cs???
Context?

Meter is the most important. Cinnamon Thundercat has a distinctive name in that (1) he's almost always referred to by both first and last name, and (2) both his first and last name are dactyls. That is, they are a stressed syllable followed by two unstressed syllables. Once you know the context, any string of STRESSED-unstressed-unstressed STRESSED-unstressed-unstressed referred to as a person can be easily recovered as actually referring to Brandywine Crumplepuss.

Second, the replacements often have the same kinds of vowels in the same places. Most important seems to be that the last vowel be an /æ/ as in "batch."

Third, people often, but not always, use replacements that start with B and C.

Lastly, there's often a picture, or an introduction like "British actor ______." From here, it's clear who Battleship Crustybrunch refers to.

I don't have the time at the moment, but a true overkill analysis for the Hashtag SCIENCE fans out there would be something like:

collect a corpus of name replacements
have study participants rank them on felicity: how good are they at being "Benedict Cumberbatch names"?

Once you've got some large number of good ones:

count how many start with B--- C---, B---- X----, or X---- C---- (where X is any other letter).
count how many conform to a two dactyl pattern.
run them all through some tokenizer, and associate each part with a pronouncing dictionary pronunciation (e.g., the CMUDICT pronouncing dictionary).
Evaluate how well each maps its vowels to those in the original name.

The real question is how can his name and the game people are playing with it be so distinctive that when I talk about Enterprise Custardshirt, you think of Khan:

...and not Kirk?!

For good measure, here's a Benedict Cumberbatch name generator. Enjoy!

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Why you probably didn't understand that one guy from Atlanta

February 17, 2017 by Taylor Jones

When the first few episodes of Donald Glover's show Atlanta on FX aired, a lot of people were blown away by the writing and acting on the show and praised the authenticity of the characters. Others, however, were blown away by the fact that they simply did not understand what some of the characters were saying.

Watching Atlanta, can't tell what's a character name and what's a slang word I don't understand.
— Cameron Chapman (@cameronchapman) September 8, 2016

I know ppl who not from Atlanta/Georgia don't understand a single word he saying 😂 https://t.co/3rasrFQdxv
— Mike 💂🏾🇬🇾💉 (@Rocky_Blu_) September 7, 2016

Watching the new comedy on the block, Atlanta, and even though I don't understand half the stuff in it, its amazing
— Usama (@usamazaki99) September 7, 2016

In particular, there's a short scene in which Donald Glover's character is in a detention center and is subjected to a short monologue by a man he met there, about how the man came to be in jail. I have included it here (under educational "fair use" --- FX still retains all rights. The scene is from Season 1, Episode 2. Season 1 can be purchased on YouTube, iTunes, etc.).

Much of my research is on regional variation, and on to what extent people understand different accents and dialects. I should say first, and most importantly, there is nothing wrong with this man. The character is not in any way intended as impaired (yes, I'm addressing this because it has been expressed to me). The way he speaks is a regional dialect, not any sort of personal idiosyncrasy. Plenty of other people from Atlanta, including some very famous people from Atlanta have the same accent. Some of them make a living as wordsmiths (Rich Homie Quan, Plies).

I have taken the liberty of transcribing the clip, and analyzing what the main triggers for misunderstanding are. I've been led to believe that his speech is typical of a certain subset of the population in Atlanta, and while there are more and more characterizations of regional varieties of African American English, to my knowledge Atlanta AAE is still under studied and under described.

The Transcript, for the curious:

I don’t believe this shit.
Ridiculous, man.
What'd you, uh, what'd you do to get in here?
[Um]
Damn, man!
I should've just went home, boy.
Instead I’m in here locked up 'cuz of this fool I ain’t seen in about eleven years, man.
Boy I was at Five Points, bout to catch the bus, you feel me?
and this nigga I ain’t seen in eleven years come here talkin' 'bout “man, hey, listen here, hey boy I ain’t seen you in about eleven years boy, let’s hang out. go get a beer.”
So I follow him to the god damn gas station.
We get two beers.
We aint get but two of them, but they was the big ones, though.
They were the big ones.
Mmm, anyway, so, nigga like “man, come on let’s go-- go-- go to the house and drink ‘em.”
So we get to the house he’s like “man. My old lady.”
And so we just gonna drink ‘em on the porch. Feel me?
I’m like “boy, APD be rollin through here boy.”
And he, and he done talked me into it.
So sure enough, APD done rolled up and seen the god damn two cans out there, locked me up for public intoxication.
You know what I’m talking about?
Man, I’m in here man cuz this nigga, man, I ain’t seen (in) eleven years. Man, I’m gonna be in here till tuesday cuz I ain’t cashed my check.`
[That’s messed up]
Oh man, I should’ve went home, boy. Shit!
[Damn, man, I said I was sorry. I just ain’t seent you in like twelve years—]
Man! Fuck you Grady! Shut up!

While some of the things people may misunderstand are questions of morphology (what word forms he uses) and syntax (how they fit together), I think far, far more important is his accent. Not only is the US still quite segregated, but --- some rappers notwithstanding --- much of the mainstream has almost no exposure to African American English from Atlanta. The triggers for misunderstanding are:

AAE vowels: He has a number of features that are common across many regions, including the PIN-PEN merger (both words sound like "pin", and this extends to all words with en or em), monophthongization of ay so five might be more like fahv, etc.
A shift in AAE vowels unique to the south: Just as white people from Chicago have vowels that have "rotated" from where many other Americans pronounce them ("I ride the boss to my jab!"), this guy's vowels are different than what you might expect if you're not familiar with his accent. For instance, his catch sounds like kitsch to most people, and his just sounds like jest to most people (the vowel is [ɛ], not [ʌ]).
A strong preference for "open syllables." If we represent consonants with C, vowels with V, and syllable breaks with "." then there's a strong preference to reduce syllables to CV.CV.CV This generally means deleting anything after the vowel, unless it's an n or m, in which case that ends up just making the vowel nasalized, as in French. This means that believe (CV.CVC) is pronounced belee (in IPA: [bəli:]), fiveis pronounced fa, let's is pronunced leh, and just (CVCC) is pronounced as jeh (CV) (in IPA: [d͡ʒɛ]). For many people this kills their ability to understand what they're hearing, although interestingly, it shouldn't necessarily.
Deletion of unstressed syllables: public intoxication is pub toxication, eleven is lebm.
AAE specific syntax: talmbout to introduce quotes (this nigga I ain’t seen in eleven years come here talmbout “man, hey, listen here, hey boy I ain’t seen you in about eleven years boy, let’s hang out. go get a beer.” to mean "this guy I hadn't seen in 11 years was like..."); nigga to refer to specific people; habitual be to mark usual or habitual behavior (APD be rollin' through here meaning "APD often comes this way"); perfective done to mark completed actions (And he, and he done talked me into it meaning "he (successfully) talked me into it"); and so on.
Word final devoicing: sounds like b,d,g,v,z, are realized as p,t,k,f,s respectively, if they are at the end of a word. Moreover, b,d,g and p,t,k can be come a glottal stop (the sound in the middle of 'uh-oh') at the ends of words.
Atlanta specific knowledge: APD is the Atlanta Police Department, Five Points is a place. If you understand the rest, you can figure this out from context. If not, you're cooked.
The use of bwa ("boy") as a term of address, along with other (reduced) filler, like his very fast, very reduced "you know what I'm talking about."

All of these factors interact, so boy, I was at five points about to catch the bus ends up sounding to many white folks like bwa awa a' fapoi bouda kitschabuh yafih me? So many viewers who have never been exposed to Atlanta AAE could not even begin to figure out where the word boundaries are, let alone what the words themselves were. And even if you do figure out the word boundaries, many people might still be confused: I should a jeh went home bwa is just different enough for some people to think "man, I'm not sure what that was."

Some Notes

Syllable Codas:

Lots of research on AAE discusses deletion or reduction of things that happen at the ends of syllables or the ends of words, but they're all taken (justifiably) as different phenomena. So there's a rich literature on AAE that discusses:

possessive -s deletion: this is how you get things like baby mama for baby's mama. Or my best friend apartment door for my best friend's apartment's door. Basically, sometimes word final -s is deleted.
consonant cluster reduction: if a word ends in consonants that are pronounced with the tongue in the same location, you can drop the second if both are voiced or both are unvoiced: e.g., hand -> han', just -> juss. Basically, clusters of consonants sometimes lose some of those consonants.
deletion or vocalization ( = making a vowel) of r after vowels: The speaker above does this a lot. Vocalization is most clear in how he pronounces beer as biyuh. Basically, r sometimes is deleted.
deletion or vocalization of l after vowels: The speaker above does this a lot as well. An example is his pronunciation of fool as foow. Basically, l is sometimes deleted.

HOWEVER, there are ton of phenomena I've noticed but which are practically absent in the literature on AAE. For instance, the deletion of /v/ after vowels, which to my knowledge is only mentioned in one sentence in one article on AAE (Thomas, 2007). Most AAE speakers I know do this all the time, and the guy above is no exception: five points is fa poi (for the linguists: [fa.pɔ͡ɪʔ ]), believe is belee, etc.

Moreover, the discussion of the above syllable "coda phenomena" does not explain a lot of what the above speaker does. Entire syllable codas just disappear. The current literature on AAE states that people may delete the /t/ in just, but there's no real account for people who say things like jeh for just or gluh for gloves (in this case, I'm thinking of a famous-to-sociolinguists speaker from Philadelphia, recorded in 1981), or krima lih for Christmas List (e.g., everyone's favorite rapper, Plies.) Often, it's multiple morphemes (meaningful word 'pieces').

This is a topic I'm currently working on, and hope to have more to say later about the seeming dis-preference for codas in some varieties of AAE. For many, many words, it does not affect your ability to recover exactly what word was uttered. For instance, my fingers are cold because I forgot my gluh should be really easy to parse, because (1) there's no word gluh that would make you have to choose between possible words, and (2) context. We do this kind of thing all the time, since we don't always hear (or say) all the sounds in words. Spoken language does a really good job with a "noisy channel."

(For the linguists: While I'm writing about it, I might as well be the first to claim that: All obstruents higher on the sonority hierarchy than stops can be deleted syllable or word finally, and stops can all be realized as a glottal stop alone, for some varieties of AAE. Today, for instance, I heard [bli:ʔɪn] for bleeding)

The above speaker has pretty extreme reduction of codas, so let's hang out is leheygao:

[lɛ.hæ̃.gɑ͡ʊ]

but many viewers might be listening for something more like:

[lɛts.hæŋ.ɑ͡ʊt]

Vowels:

There seems to be a further vowel shift in progress in Atlanta AAE which has not been discussed much in the literature on AAE. Beyond what you would expect from southern AAE, a lot of Atlanta speakers have a couple of different vowels that what might be expected. A lot of linguists use what are called Lexical Sets to discuss accents. What this means is that we can talk about an entire class of words that all have the same vowel, and then state "the vowel in all of those words is thus-and-such in thus-and-such accent." For instance, in most varieties of American English, the STRUT vowel (the vowel in words like strut, just, cuss, bus, cub, rub, hum, lunch) is written in IPA as [ʌ]. In the above clip:

the STRUT vowel is sometimes [ɛ]. So words like just and shut upsound a little like jest and shet epp. But bus is still [ʌ].
the TRAP vowel is sometimes [ɪ], which for most Americans is the vowel in words like ship, rip, dim. This is most pronounced in catch from catch the bus.

Overall difference:

There is a wealth of research on how we parse accents, and a couple of factors are at play here. First, AAE is heavily stigmatized in the US. The more it differs from middle class, 'standard', white speech, the more stigmatized it is. Second, because of the segregation in this country, many white folks simply do not understand AAE, even when we think we do (e.g., Rickford 1998, Rickford & King 2016, Jones & Kalbfeld 2017). Third, regardless the accent, when it's perceived to be difficult to understand, rather than improving with more exposure, experiments show that people basically shut down, and stop trying to parse it. Lastly, given racial/ethnic cues, people perceive accents where there aren't any. Here, there is clearly an accent, but the relevance of the last point is that people may already be predisposed to consider a black man in a jail detention center "hard to understand," or even "impossible to understand," and "not worth the effort."

A handful of my non-black friends assumed that the point of the scene was basically a gag -- that the guy was incomprehensible. I don't think that was the case, and that doesn't seem to be the impression my AAE speaking friends had either. He's just real Atlanta. That's part of why people love the show: there are tons of types of people that know from their daily lives that you just don't see on TV, but Atlanta gives them a spotlight, if only for a minute.

More broadly, though, the above points to a lot of interesting historical and sociological phenomena. Language change occurs when populations are separated. Generally, the way this is taught is by giving examples of European villages separated by mountains, where one town speaks differently than the next town over, because they don't interact often. However, as I'm going to argue in my dissertation, some populations in the US are separated by invisible mountains: residential and educational segregation. For some people, popular music, film, and television (including Atlanta) are now providing limited contact with people from "the other side of the mountain."

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Linguists have been discussing "Shit Gibbon." I argue it's not entirely about gibbons.

February 09, 2017 by Taylor Jones

BACKGROUND: LINGUISTS CARE ABOUT SHITGIBBONS TOO

Earlier this week a Pennsylvania state senator called Donald Trump a "fascist, loofa-faced shit-gibbon."

There was an excellent post on Strong Language, a blog about swearing, discussing what makes "shit gibbon" so arresting, so fantastic, so novel, and yet... so right (for English swearing. Whether you believe "shit gibbon" is "right" as a characterization of Donald Trump is a personal assessment each person must make for themselves).

The post, The Rise of the ShitGibbon can be found here. I highly recommend reading it.

Most of the post was dedicated to tracing the origins and rise of "shitgibbon." The end of the post, however, catalogues insults in the same vein:

wankpuffin, cockwomble, fucktrumpet, dickbiscuit, twatwaffle, turdweasel, bunglecunt, shitehawk

And some variants: cuntpuffin, spunkpuffin, shitpuffin; fuckwomble, twatwomble; jizztrumpet, spunktrumpet; shitbiscuit, arsebiscuits, douchebiscuit; douchewaffle, cockwaffle, fartwaffle, cuntwaffle, shitwaffle (lots of –waffles); crapweasel, fuckweasel, pissweasel, doucheweasel.

I've actually been thinking about insults like this a surprising amount. Ben Zimmer points out about "Shitgibbon" that "...Metrically speaking, these words are compounds consisting of one element with a single stressed syllable and a second disyllabic element with a trochaic pattern, i.e., stressed-unstressed. As a metrical foot in poetry, the whole stressed-stressed-unstressed pattern is known as antibacchius."

I argue that this is correct, but that (1) there's a little bit more to say about it, and (2) there are exceptions.

HOW TO MAKE A SHITGIBBON IN TWO EASY STEPS

First: I argue that the rule for making a novel insult of this type is a single syllable expletive (e.g., dick, cock, douche, cunt, slut, fart, splunk, splooge, piss, jizz, vag, fuck, etc.) plus a trochee. A trochee, as a reminder, is a word that's two syllables with stress on the first. Examples are puffin, womble, trumpet, biscuit, waffle, weasel, and of course, gibbon. Tons of words in English are trochees (have a relevant XKCD! In fact, have two! Wait, no, three! No one expects the Spanish Inquisition!). Because so many words are trochees, you'll have to pick wisely --- something like ninja might not be as humorously insulting as waffle.

That said, in principle, monosyllable expletive + trochee seems to give really good results. Behold:

fart basket, shit whistle, turd helmet, cock bucket, douche blanket, vag weasel, (I'm gonna be so much fun when I get old and have dementia. Good luck grandkids!), shit mandrill, piss gopher, jizz weevil, etc. etc. I can do this all day.

So, it's not the fact of being a gibbon per se. Various other monkeys would work: vervet, mandrill, etc. However, crucially, baboons, macaques, black howlers, and pygmy marmosets are out.

Moreover, it's not completely unlimited. Some words fit but don't make much sense as an insult: cock bookshelf, fart saucepan (which I quite like, actually), dick pension, belch welder.

Others sound like the kind of thing a child would say: fart person! poop human! turd foreman!

Yet others are too Shakespearean: fart monger! piss weasel!

Clearly some words (waffle, weasel, gibbon, pimple, bucket) are better than others (bookshelf, doctor, ninja, icebox), and some just depend on delivery (e.g., ironic twat hero, turd ruler, spunk monarch, dick duchess).

VOWELS MATTER

For a while, I've been discussing vowels in insults with fellow linguist Lauren Spradlin. Note that when we talk about vowels, we mean sounds, not letters. Don't worry about the spelling, try saying the below aloud. Spradlin has brought my attention to the importance of repeating vowels increasing the viability of a new insult of this form: crap rabbit, jizz biscuit, shit piston, spunk puffin, cock waffle, etc.

I would argue that having the right vowels actually gives you some leeway, so you can get away with following the first word with --- gasp! ---- a non-trochee! Be it an iamb (remember iambic pentameter?) as in douche-canoe, spluge caboose, or the delightfully British bunglecunt (h/t Jeff Lidz), or even more syllables: Kobey Schwayder's charming mofo-bonobo.

As you can see, this is a hot topic in the hallowed halls of the ivory tower. If the above simple formulae have motivated even one person to go out and exercise their own creativity to make a novel contribution to the English language, then I've done my job here as a linguist. Different people get into linguistics for different reasons, but this, this is what I live for. Get out there and make a difference!

-----

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Linguistically, why does it sound like Trump thinks Frederick Douglass is still alive?

February 02, 2017 by Taylor Jones

Normally, when I write a blog post, I am pretty confident I know exactly what I'm talking about. I think of this as a place to communicate interesting findings or facts about linguistics to a lay audience. However, today something happened that I can't quite explain, but I want to discuss. Today, in honor of the first day of Black History Month, Trump gave a talk in which, among other things, he said:

Frederick Douglass is an example of somebody who's done an amazing job and is being recognized more and more, I notice.

Plenty of people have already remarked that the phrasing sounds odd. It seems as though Trump is unaware that Frederick Douglass is not still alive. His press secretary made similar remarks:

I think he wants to highlight the contributions that he has made

My goal here is not to suss out whether Trump and Spicer know that Frederick Douglass is dead, and has been for over a century. Rather, I want to discuss why many people have the intuition from the above utterances that Trump and Spicer think Frederick Douglass is alive. The linguistosphere (note: not a real thing) is abuzz right now, and there's quite a bit of discussion on my facebook and twitter about this.

First thing's first: there's something about the fact that he used the present perfect instead of the simple past. That is, "Frederick Douglass has done an amazing job", not "Frederick Douglass did an amazing job."

The present perfect indicates that something is completed, now. However, the fact that it's not morphologically past can't quite be it, because there are plenty of things that are over and done with and not likely to continue that can be marked this way. For instance:

I have eaten the plums that were in the icebox

There's something about the relation to the present, though, that makes it so it sounds very strange to use when discussing someone who died a century ago. But there's more to it. For instance, we could rebut what we take Trump's assumptions to be with:

Frederick Douglass has died (though). He did so in 1895.

Even that is odd, though. We'd expect something like "Frederick Douglass is dead."

Perhaps it's a requirement that the doer of the action marked with the present perfect be theoretically capable of still doing actions in the present? My current working hypothesis is that the subject of a (non-passive) clause marked with the present perfect requires a subject that exists in the present. For a passive, that goes out the window ("the plums which were in the icebox have been eaten").

BUT, and this is a huge caveat: I am very much not a semanticist.

I'm hoping for linguists who focus on semantics to weigh in, and will update here. There's clearly some structural presupposition of relevance to the present that makes statements of historical fact sound bizarre with the present perfect. See for yourself, with other sentences:

Napoleon has taken over Europe.

Muhammad has had a vision.

Julius Caesar has been stabbed.

Cleopatra has taken a lover.

...Frederick Douglass has done an amazing job.

It sounds like he either has just now done an amazing job, or he's done an amazing job and will probably continue to do so.

Thoughts?

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

New working paper on Quotative "Talkin' 'bout"/"talmbout" published

January 09, 2017 by Taylor Jones

I have a new paper out, on talkin' 'bout or talmbout in African American English, used as a verb of quotation similar to quotative be like. The paper can be found here. A blog post (or blog posts) on it will be forthcoming -- once I've finished my much more pressing dissertation proposal.

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

Why the International Phonetic Alphabet (IPA) is the best thing ever

December 24, 2016 by Taylor Jones

In this post, I'm going to explain why knowing the International Phonetic Alphabet is like seeing the matrix.

A common misconception that linguists often have to deal with, be it from students in Intro to Linguistics or from family members at holiday gatherings, is that Language (capital L) is basically written language. This has all kinds of ramifications, from people thinking that stylistic conventions for writing are somehow "rules" that languages follow, to thinking that people whose pronunciation differs significantly from what we think of as being "how it's spelled" are somehow dumb.

But linguists know a secret: writing is a secondary technology, and spoken language (as opposed to signed language) is all about sounds.

There are tons of different writing systems, and languages can be represented --- well or poorly --- by many different systems. For instance, Turkish was historically written using the Arabic abjad (basically an alphabet that only has consonants), but is now written with roman letters. Tajik is the same language as Farsi, but one's written in Cyrillic and one in a modified abjad. Hell, people get tattoos in English that are written in Tengwar (Tolkien's Elvish script). All these different scripts represent sounds in ways that are sometimes more and sometimes less faithful to the (abstract) sound units that a given language uses. For instance, standard varieties of English have two sounds that we totally suck at writing, so we just slap two letters together. For both of them. When you blow air between your tongue and front teeth, whether your vocal chords are moving or not, we just write it <th> in English and call it a day, even though <t> refers to one sound and <h> refers to another and this combination makes no damn sense.

Rather than wade into the ridiculous morass of writing systems every time we want to talk about a language, even if it's just to give one example before moving on to other data from other languages, linguists have developed basically the best tool ever: The International Phonetic Alphabet.

How is it different than other alphabets? First, it's based on all the different sounds that human mouths and vocal tracts can make, and second, it's got one (and only one) character for each sound.

The beauty of it is that it is independent of accents, and in fact, you can represent different accents clearly using the IPA. So every linguist's pet peeve is reading language learning books that say things like "i as in the i in kit." Well, that's great, except for different accents/dialects/varieties of English pronounce the word "kit" differently.

So how does it work? Let's take a brief tour of SOUNDS YOU MAKE WITH YOUR MOUTH AND STUFF:

Sounds You Make With Your Mouth (and Stuff)!

I'm going to simplify a lot of this discussion, so expect nitpicky comments from fellow linguists about what really a consonant is, but basically, you can divide the sound we make into two main classes, if you don't think too hard about it:

Consonants
Vowels

Consonants, for our purposes, are basically any sound where the flow of air out from your lungs is obstructed in some way. Vowels are sounds where air flows freely.

Notice: I did not say "vowels are [insert list of letters (and sometimes other letter!)]." I said "vowels are sounds where air flows freely."

The cool thing is that each of these two classes can be completely described (for our purposes, again), using 3 parameters.

For Consonants:

place
manner
voicing

Let's take these one at a time.

Place is the location of the obstruction of airflow. This can be closure at the lips, the tongue at the teeth, at the alveolar ridge, at the hard palate, the back of the tongue at the velum, etcetera.

Manner is the way airflow is obstructed. If it's completely blocked off, it's a "stop." If it's just partially blocked and creates a turbulent airflow, it's a "fricative" (think "friction").

Voicing refers whether your vocal chords are vibrating.

Armed with these three, we can start specifying sounds. For instance, the <t> in <stop> is a unvoiced coronal stop (it's not voiced, it's made with the "crown" of the tongue -- that is, the tip -- and airflow is completely stopped for a second. Technically, less than a second, but whatever).

Since we can characterize all the meaningful units of sound that a language uses in this way (#thatsThePoint), wouldn't it be nice if there were one and only one symbol for that sound? GOOD NEWS: THERE IS! And, since the IPA was made up by a bunch of Europeans, it's exactly what we'd expect: <t>. If it were voiced? <d>. If it was in the same place, unvoiced and a fricative? <s>. Voiced? <z>. Same manner and voicing (stop, unvoiced), but made at the lips? <p>.

What's really amazing about this, and will be the subject of a different post, is that the way languages work seems to be by reference not to spelling and stuff, but to these sub-classifications of sound, which we call distinctive features because (1) they're features that (2) distinguish sounds from one another. In fact, languages tend to change based on natural classes of these features. For instance, some sound change might affect all stops, but not fricatives.

The IPA chart for consonants can be found on Wikipedia here, and you can click on each symbol and hear the audio. In fact, each sound has its own wikipedia article.

Similarly, vowels can be classified along three parameters:

For Vowels:

Tongue Height
Tongue Backness
lip rounding

Height refers to how high the body of your tongue is in your mouth.

Backness refers to how far front or back the highest point of your tongue is (again, a simplification, but basically right).

Rounding refers to whether you round your lips or not.

Vowel chart, and schematic of tongue height for front vowels.

So for instance, linguists interested in dialects of American English may talk about whether a particular variety, say California English, is fronting /u/ to [ʉ] or even [y] and this is meaningful -- we're describing an accent, but doing so in a more precise way than if you were to say "it sounds like 'kyewl' when they say 'cool'!"

Wikipedia, again, has a super useful chart, where you can hear the sounds. It's the same as the above chart --- "front" vowels are on the left, "high" vowels are on the top, and they come in pairs with the unrounded form on the left and rounded on the right. Add a little tilda on the top of the vowel (literally it's just a little n above the vowel) and you have a nasalized vowel --- a vowel where your velum is dropped a bit and you allow airflow out through your nose. French has a ton of these, so what's written as <on> is pronounced /ɔ̃/. Without the little squiggly, it's /ɔ/, the vowel in a New York accent's "coffee" (/kɔfi/), as opposed to a Canadian or Californian's "coffee" (/kɑfi/).

So how many vowels does English have? Well, most accents have 20, not 5. And you can discuss them all using the IPA: bead = /bid/, bed = /bɛd/, bad = /bæd/, and so on. If I boo someone, I /bu/ them, but someone from California might /bʉ/ them, and we have a precise way of telling, and discussing, the difference.

Some helpful observations

The IPA was designed to be intuitive, and useful. So most of the symbols are exactly what you would expect. The vowels are a little more complicated, but think about what would make sense for Europeans: /i/ is the vowel in 'beat', and we have a special character for the sound in 'bid' (it's /bɪd/). Consonants are basically what you'd expect, except <y> is /j/ (like in German Ja!). That shitty combination in English of <ng> for a voiced velar nasal is just one symbol, an <n> with the tail of a <g>, called engma: /ŋ/.

Brackets and slashes: Linguists use slashes to indicate a more abstract level, and brackets to deal with the sounds that are actually made. So In English /t/ can have a bunch of different realizations in speech:

[t] in 'stop' [stɑp]
[tʰ] in 'tap' [tʰæp]
[ɾ] in American English 'butter' [bʌɾəɹ]
[ʔ] (a glottal stop, the sound in the middle of 'uh-oh!') in Cockney 'butter' [bʌʔə]

and so on. Most normal people are just not aware of these differences, at all (especially the first two, but put your hand directly in front of your mouth an see how different the airflow is!).

Notice, too, that you can clearly represent accents using this tool. 'butter' is [bʌɾəɹ] in General American, but [bʌʔə] in Cockney, and [bəˈtœʁ] in a bad French accent. With the slashes, we can just talk about things that happen to (abstract) segments (of sound) in a language without having to specify all the different little things that happen in specific environments, and with the brackets we can get as specific as we want to, so I might say 'specific' /spəsɪfɪk/ as [spəsiɪfɪˀ]. And once you know the IPA, you can tell from that HOW I SOUND WHEN I SAY IT.

YOU CAN FREAKING SPELL ACCENTS.

With the IPA, we don't have to resort to all kinds of weird ways of talking about things ("the vowel in a New York accent when they say 'coffee'" or "the ü sound in German, if you know German," or "That thing that some French people do when they say 'oui' in a nonstandard way.") We can just use the IPA and talk about place, manner, and voicing, or height, backness, and rounding (/ɔ/, /y/, and /ɕ/, for the previous long-winded and confusing examples).

While it takes a little (let's be real, very little) effort to learn the IPA, the payoff is immense for anyone who wants to learn another language, learn another accent, or understand any discussion of sounds that humans make in a clear and concise way. It's basically seeing the matrix.

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

Gender, Gender, Gender

December 01, 2016 by Taylor Jones

A good Question:

I'm still getting a surprising number of comments and emails about the short post I wrote on David Peterson's slip up with grammatical gender. While most are incoherent and silly (and have a seasonally and statistically unlikely preponderance of the use of the word "snowflake"), there is one in particular that seemed earnest, and that I think warrants a full response. Brele asked:

Taylor, can u help me understand how there are more than two genders? I ask just having watched J. Peterson on a YouTube show and hearing his thoughts on the matter.

I think it's important here to distinguish three related phenomena, and where they do and don't overlap: biological sex, gender (and gender expression), and grammatical gender. The conflation of biological sex with gender, and the subsequent conflation of grammatical gender with both, is where most of the confusion and anger comes from, I think.

Biological Sex:

Biological sex is what it sounds like: the biological properties we associate with sexual reproduction in a species. We assume that there are two sexes in humans: male, and female. This is not strictly true, as biological sex is determined by a constellation of factors, among them:

chromosomes: while we're familiar with XX as female and XY as male, there are people with XXY, or other unusual (but extant!) combinations. Roughly 1:1666 births have atypical chromosomal combinations. That's roughly 210,000 Americans.
gonads: most people have reproductive organs that fall broadly into one of the two expected categories, but again, not all people do. Roughly 1:1500 births have atypical gonads (which means about 230,000 Americans).
hormones: some people have atypical hormonal patterns. For instance, the sikh woman who has polycystic ovarian syndrome, and therefore has a full beard.

The vast majority of people will have phenotypes that 'line up', but a sizeable minority don't. So what we think of as physically binary -- male/female -- is, in reality, a bit more complicated than that, but generally true. Not always true.

Gender and Sexuality:

Gender, in the social sciences, is distinct from biological sex. It is also a complicated constellation of factors, including:

who you are attracted to.
how you physically present yourself, and how you behave, according to (or going against) culturally defined patterns of behavior. For instance "boys wear blue, girls wear pink" is a completely arbitrary, culturally defined dichotomy with no basis in biology, and which is absolutely not universal.
How masculine or feminine (or neither, or both or whatever) you personally feel. That is, maybe I feel really girly (whatever that means, just go with it), but I don't present myself in accordance with that because it's easier to just follow my culture's rules about What Men Do than it is to deal with people's reactions if I start wearing dresses.
a bunch of other stuff I'm probably leaving out.

The key here is that gender is about how you feel, behave, and are attracted to, and is not about your chromosomes, gonads, and hormones.

For a more science-y take: there are multiple parameters, which may be either binary or have multiple levels, along which people can vary continuously. This is a high-dimensional space that we generally try to collapse to a single-dimensional sub-space to then classify with a binary score. Increasingly, people who are hard to classify on that one dimension (studs, bears, beardos, genderqueer, agender, genderfluid [do we need a time dimension?], flannel-heads, balloon-poppers -- yes, I made some of these up, but not the balloon one) are saying you can't collapse things to a single binary parameter, but you night a higher dimensional space to accurately categorize people without losing important information.

Grammatical Gender:

This is how languages group nouns. The name is an unfortunate misnomer, given the conflation of the above two things -- it's etymologically related to genre and that's probably a much better way to think of it. Some languages have two genders, which they call masculine and feminine because noun classes in those languages sort of line up with how actual masculine people and feminine people are classified grammatically. That said, in one such language, French, there's no clear reason a table should be semantically feminine. The genre of the noun just happens to be the same as for women, but in this case it's largely a phonological thing, not a semantic (i.e., meaning) thing. Moreover, some words come in gendered pairs: le tour 'the tour' (as in, the tour of france), versus la tour 'the tower' (as in, the Eiffel Tower).

In other languages, there are two genders, but they don't line up with sex: Dutch has two genders, but they're common and neuter. Both man 'man' and vrauw 'woman' are common, and meisje 'girl' is neuter (along with all other diminitives, so mannetje 'little man' is also neuter).

In other languages, there are more than two genders. German and Russian have masculine, feminine, and neuter.

In yet other languages, there are many more genders: Zulu has 14, and none of them have anything to do with sex. Some are for humans, some are for long, stick-y things (although there's arguments about this), and one is for abstract concepts: umu-ntu is a person, aba-ntu is 'people' (whence "Bantu"), and ubu-ntu is the quality of being human (personhood, or humanity).

Finally, many languages mark all nouns and noun-y things with gender, but many don't. English, for instance, only explicitly marks gender on some pronouns (he and she, but not you), and a handful of nouns for kinds of people ("actress").

The Takeaway:

"Gender" is often interchangeably used to mean any of three things: biological sex, sexual(ity) gender, and grammatical gender. Moreover, each of these things is complex, and non-binary (although biological sex comes close to being binary in everyday life for most people).

English obligatorily marks gender on third person singular pronouns (and that's about it). This gender marking generally overlaps with biological sex and 'mainstream' gender expressions related to cultural assumptions about biological sex. People who do not feel like they are necessarily well described by he or she have been asking to be referred to with a different term -- many ask that we use they, which has the benefits of (1) already existing in English, and (2) being gender neutral already. Others ask for ze or something else.

The point is, marking gender on third person singular pronouns (only) is a weird quirk of the grammatical structure of English, and not representative of objective biological reality, and certainly not reflective of culture. My comments on David Peterson's remarks were solely to laugh at the irony of someone claiming they refused to use gender-neutral pronouns while using gender neutral they to express that contrarianism.

Hopefully, the above answered the question of how there could be 'more than two genders.'

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

More on Pronouns: Are Gender Creative People Really All That Creative?

November 02, 2016 by Taylor Jones

Controversy is still swirling around trans, non-binary, and other "gender creative" people's occasional insistence on being referred to with pronouns of their choice. I have been thinking about this lately, and while some people are very upset that others are asking for specific pronouns the speaker may disagree with ("But you're a he, not a she!") I've come to the conclusion that the gender creatives are really not being all that creative at all.

As far as I can tell, the vast majority of people who are requesting "special" pronouns are doing one of two things:

Asking to be referred to with the pronouns appropriate to the gender they identify as (whether it's immediately apparent to others or not). That is, hypothetically, someone born female asking to be referred to in the third person with he, him, his, himself. No other changes to the pronominal system.
Asking to be referred to with a gender neutral third person pronoun, usually either they (which has a long history of use for gender neutral, but nonspecific, third person), or some variation on Xe, Ze, or something else pronounced with a voiced coronal sibilant. (/z/). No other changes to the pronominal system.

The thing is, the languages of the world do a lot of really interesting things with pronouns, and these so-called gender creatives are clearly not being creative enough. It's almost as though they're not playing with language at all, but are actually trying to conform to the rules of English while insisting others respect their gender identity.

Here are some things they could be doing, and places where I think they're really dropping the ball:

Gendering pronouns other than the third person. Arabic has gendered second person singular and plural pronouns. Instead of just "you" referring to anyone you're talking with, Modern Standard Arabic has anti, anta, antum, antunna, for "you (male)", "you (female)", "you men," and "you women" respectively.
Proximal and Distal third person pronouns. Algonkian languages tend to differentiate between, say, 'he (who is nearby)' and 'he (who is far from us),' which can then send social signals -- if I talk about you in front of you, but use the him (distal) form, I'm pretty rudely implying that this is an A/B conversation (and you can C your way out).
More case marking. English really only has nominative/oblique/possessive pronouns. Other languages do a lot more. I'd love to be able to say that I identify as male, but my pronouns are he/him/his/hig/hif/hird for nominative, oblique, possessive, ablative (motion towards me), instrumental (using me to do something or doing something accompanied by me), and locative (doing something where I am). Russian and Latin have us beat by like 3 cases, and Hungarian is blowing us out of the water.
Marking tense on the pronoun. Wolof, for instance, marks differences in tense not on the verb, but on the pronoun. This allows meeting the gender...uncreative?... half way: "You can use the pronoun for males when referring to me only if it's got past tense morphology on it."

So yes, at your request I will call you ze/zim/zis, but know that I'm silently judging you for your cliché, unimaginative pronouns, and wishing you'd give me a real challenge. It's almost like this isn't about language at all, but just about asking for me to respect your life choices and identity.

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

Fast speech and hallucinated sounds

November 02, 2016 by Taylor Jones

We humans are really good at language. If you think about it, what we hear is often quite messy: there can be other noises interfering with the speech signal we're listening to and decoding, and that signal is made by a bunch of sloppy moving parts that tends to lead to the signal overlapping in a number of ways rather than crisp, clear separation of phonemes.

This is a short post about one such interaction, this time in Mandarin Chinese. Just over a year ago, Aletheia Cui and I wrote a paper that was accepted for a talk at the International Congress of Phonetic Sciences and published as a working paper, in which we examined casual speech reduction in Mandarin. You can read the whole thing here, if you're interested.

The impetus for the study was a remark by one of my professors that we tend to do an excellent job of filling in sounds that weren't really heard, and that when students first learn to read a spectrogram -- a transformation of an audio wave that gives much more precision to interpreting sounds -- we are often surprised. Either, we're shocked at how messy it all is, or more interestingly, we're shocked at what's absent. We "hear" sounds that weren't said, and we swear up and down we heard them. He told us a story about a former student who was a native Chinese speaker, who thought there was bait and switch going on with a recording of the word bijiao ('kind of' as in "that's kind of interesting.").

So Aletheia and I, for a final project, went through a corpus of telephone speech and investigated pronunciation of the sound in the middle of bijiao, related words, like jia. In words like bijiao, the middle is only actually pronounced about 10 percent of the time. The rest of the time, it's something like bshao or even biao (perhaps 'kinda' is a better translation...).

It may not be earth shattering, but it's kind of a neat finding, especially since deletion of the -ji- sound in bijiao is a little less expected than the /v/ sound in kind of (and yes, it's a /v/, not a /f/). Also, there's a literature on Taiwanese Mandarin speakers replacing some words, where it's argued that a casual form of bijiao is just biao, and our findings support that for mainland speakers, this seems not to be the case -- they're mostly just saying bijiao really fast, so the middle gets reduced, but it's not clear that they're trying to say a different word.

I think most of us English speakers are aware that we reduce things (e.g., gonna, or my go-to: Imunna), but when you start learning foreign languages, it becomes immediately clear the first time you speak with a native speaker that they are definitely not saying the words that our teachers taught us. This is in part what's happening: they're not speaking "too fast" or saying the words "wrong." It's just that native speakers are really good at filling in what's missing, and it's something we all do, all the time.

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

U of Toronto professor Jordan Peterson on preferred pronouns: idiot, or troll genius?

September 29, 2016 by Taylor Jones

[EDIT: This post has gotten me a surprising amount of hate mail and 'splaining. The point is that JP has the grammatical competence to 'be nice' but chooses not to, while claiming that it's just too hard to intentionally do what he clearly does unintentionally with ease. It's not a big jump from nonspecific to specific referent.]

I've been meaning to write a post about pronouns and gender issues in language, because there's a wide range of interesting phenomena to discuss and I think that much of the current discussion around gender in language, and especially "preferred pronouns" boils down to people not really knowing much about how language works. For instance, does gender in your language shape how you think about sexuality? What if your language has two genders, but they're "neuter" and "common"? What if your language has 17 genders, that fall into categories like "humans," "animals," "plants," "long objects," and "abstract concepts"?

But...this is not that post.

Instead, I want to talk about what happens when, people try to legitimize prejudice using flimsy linguistic pretexts, but know so little about language that the result is hilarious.

Enter: Jordan Peterson.

Jordan Peterson is a professor who is stirring up controversy at my alma mater, the University of Toronto. The National Post ran an article about him entitled "U of T professor attacks political correctness, says he refuses to use genderless pronouns." I highly recommend reading the article. The gist is that Canada is considering banning discriminating against people based on their gender expression, and the professor is against this, and thinks that it's political correctness, and that political correctness is a Bad Thing. Now, here's the kicker. Jordan Peterson actually said the following, and the National Post published this absolute gem of a paragraph:

Peterson said that if a student asked him to be referred to by a non-binary pronoun, he would not recognize their request: “I don’t recognize another person’s right to determine what pronouns I use to address them. I won’t do it.”

Did you catch it? I'm not talking about the obvious fact that all the pronouns he uses are determined by other people -- he didn't emerge from the womb and declare "wumpus shall only call all of tharble 'wiggle-waggle'," using pronouns of his own invention. No, let's try again:

Peterson said that if a student asked him to be referred to by a NON-BINARY PRONOUN, he would not recognize their request: “I don’t recognize another person’s right to determine what pronouns I use to address THEM. I won’t do it.”

Did you catch it this time? He used a gender neutral third person pronoun (they/them/their), derived from the third person plural in English. It is clear from the context that he is referring to a single, hypothetical person, of unknown (or unstated) gender. And he did this fluently, fluidly, naturally, and evidently without conscious awareness.

He didn't say "I don't recognize another person's right to determine what pronouns I use to address him or her." Not only that, but he correctly declined it for case. That is, he didn't say "I don't recognize another person's right to determine what pronouns I use to address they."

Now, as someone who occasionally indulges in Recreational Feigning of Extreme Stupidity for Personal Entertainment (RecFeESFPE, pronounced "reckfessfapee"), I am willing to entertain that this is all a hilarious ruse. For instance, an acquaintance told me once that they were planning a stay-cation and I enthusiastically asked "Where to?" But this. This is spectacular. Next level. Really brilliant stuff. If anyone asks me about Jordan Peterson, I have no choice but to respond "they're a comedic genius."

The rest of the article is also hilarious, and I'm not entirely convinced Jordan Peterson isn't actually Andy Kaufman. For instance he goes on to say, "if someone asked me to take anti-bias training, I think I am agreeing that I am sufficiently racist or biased to need training." That logic is unassailable, which is why I never wear a seatbelt, because wearing one is conceding that I drive poorly enough to need one. Keep fighting the good fight against "over-inclusive" (his words) legislation, Jordan Peterson! We (weird how everything but 3rd person is gender neutral, isn't it?)...where was I...oh yeah: We salute you!

EDIT: In the comments, a colleague pointed out that singular they has existed in English for hundreds of years when discussing referents of unknown gender, but argued that using singular they for a known referent is uncommon -- and made claims about what most people would find ungrammatical. The morning I wrote this piece, I had an entire conversation with a friend about a known third friend, in which we both referred to that friend using they -- not out of gender considerations, but for the sake of plausible deniability. I really do not believe it is as hard to parse as some are claiming, and think that it boils down to whether Professor Peterson is willing to accept that people can "be" different "genders" than he chooses to assign them in his head. That is, the whole point is that it is not inherently linguistic, as it would be if, for instance, they insisted on having pronouns that have the same declensions as English, but also an ablative ("My pronouns are "Ze, Zim, Zis, and when the conversation involves motion toward me, Zab." Jordan Peterson is making a "linguistic" argument to justify non-linguistic behavior, and I argue it doesn't hold up.

-----

©Taylor Jones 2016

Have a question or comment? Share your thoughts below!

Refining "Microaggression": a linguistic perspective

September 14, 2016 by Taylor Jones

For a few years I’ve been very interested in a phenomenon that is a topic of a lot of research in the social psychology literature, and which is increasingly discussed in social media and in the news media, but which has not yet gained much attention in linguistics. The phenomenon, microaggression, is somewhat controversial. I think this is, in part, because it is a social phenomenon that often has language and language use central to it, but the discussion around microaggression is often predicated on a number of (sometimes easily challenged) assumptions, and the definitions of microaggression that are common, while intuitive, are not terribly formally rigorous — so interpretation of what is and is not a microaggression is not satisfyingly ‘objective’.

Here I’m going to discuss some of my ongoing research project, and specifically I want to propose a tighter definition of microaggression in a linguistic domain and then argue that the unsatisfying ‘subjectivity’ of microaggression is precisely why it’s an interesting topic of study and a valid — and necessary — concept.

WHAT IS MICROAGGRESSION?

There are a number of varying, slightly different definitions, but microaggression is usually defined as “brief, everyday exchanges that send denigrating messages to certain individuals because of their group membership” (from this book).

Earlier formulations were a bit more explicit, so people may be familiar with microaggression as the “…brief and commonplace daily verbal, behavioral, or environmental indignities, whether intentional or unintentional, that communicate hostile, derogatory, or negative racial slights and insults toward people of color.” Pretty quickly, people challenged this formulation, and I think that it's generally accepted that microaggression is about power structures and not inherently a bad thing white people do to people of color (as was more-or-less the original formulation by Sue et al). So now, people will accept that microaggression is something that say, white women can experience. I think most would accept that white men can (and do) experience microaggression in certain contexts (for instance, when I'm a non-native Chinese speaker in an exclusively Chinese space). The point is that whenever there is a power dynamic at play, microaggression is possible.

There are a number of different ways of constraining the idea, but as a linguist, I'm naturally more inclined to look at linguistic behavior in conversation, in part because that's what I feel most capable of analyzing. So here, I won't be discussing facial expressions or institutional decisions like naming a dormitory for a slave owner and insisting black students call their residence advisors "master." Rather, I will discuss verbal behavior only.

So that's a lot of words and still it's not that clear what microaggression actually is. Most discussion of it fall back on (1) giving a bunch of examples and (2) interpreting those examples. The problem is, most take the interpretation to be obvious, uncontroversial, and the only way to interpret what was said. This is a mistake, not just because it opens up the theory to criticism like Racial Microaggression? How Do You Know?, but because it misses what makes microaggression a valuable concept in the first place.

But first, some classic examples. One of the most common examples of microaggression is when an Asian American is asked "where are you from," and then pressed "where are you really from." The traditional explanation is that it's clear, especially from the second sentence, that the meaning is you can't be from here, and more broadly you're not American and even more broadly you don't belong. I'll take another example from my own experience: I went to a bookstore with a black friend, who was asked 3 times in 5 minutes to perform employee tasks at the store ("can you change the music please?" "where do I find...", "Can you just reshelf this for me, I'm not going to buy it."). The staff at that bookstore wear uniforms, and he was dressed nothing like their uniform. The traditional microaggression interpretation would be that these people mistook him for an employee despite him clearly shopping in street clothes because he is black, and they acted out of a number of assumptions based in stereotypes about black people (e.g., that they don't read, and possibly even that they shirk responsibility -- so the fact that he was obviously not working didn't necessarily suggest he wasn't an employee). Derald Sue and his colleagues have a typology of microaggressions and their interpretation in the article I linked above.

REFINING THE DEFINITION:

Many people, especially white people and especially men, react to these kinds of stories with what I (as a white man...) think is a completely reasonable question: but couldn't it have just been a misunderstanding? This is often brushed aside or ridiculed, which I think is a mistake.

The key strength of microaggresion, and what gives the concept its 'bite' is that yes, it always could have just been a misunderstanding.

That is, my argument is that microaggression is a valid and useful tool for discussion precisely because it is not cut-and-dry. Otherwise, it's just overt aggression. Nothing 'micro' about it. If I say "your outfit looks dumb" that's just an insult. If I say "oh, your outfit is so...ethnic," that's much trickier to categorize. Is "ethnic" a codeword? Am I insinuating it's inappropriate? That I don't like it? That I think it's unprofessional or looks bad? That I think it doesn't belong? Well...maybe.

So the first part of my argument is:

Microaggression deals with a class of utterances that, given the context of their production, are ambiguous: they are potentially insulting or invalidating, but the insult is plausibly deniable.

What this means is that microaggression is strategic.

More importantly, it also means that listeners, faced with potential verbal microaggression are faced with a classification problem: is this an insult, or a mistake?

Elsewhere I've written about the concept in the field of Pragmatics of Gricean implicature, and specifically compared it to the Semantic notions of presupposition and entailment. Implicature is when you say something that -- when listeners assume you're trying to maximize quality, relevance, etc. -- insinuates something beyond the literal content of the utterance. Presupposition and entailment, however, things you can read directly off the utterance.

For example, take the sentence:

The first democratically elected leader of the Congo was assassinated.

The presuppositions are things like "there is a democratically elected leader of the Congo," (namely, Patrice Lumumba). Presuppositions can cause problems, like when you say things like "the king of France is bald." It presupposes a king of France, which doesn't actually exist in the real world. The entailments are things like the first democratically elected leader of the Congo is dead and that they did not commit suicide.

Now think about:

Congrats, you've won five dollars!

The implicature is that you've won exactly five dollars, but this is cancellable. That means I can say: "Congrats, you've won five dollars! In fact, you've won ten!" (note that it's a "scalar" implicature, meaning I can cancel the implicature that you won exactly five dollars, but I can't say "in fact, you've won only four!" without canceling the entire utterance).

I review all this so I can now claim:

Microaggressions are unambiguously identifiable when the offensive material is encoded structurally via presupposition or entailment.

This has a more interesting inverse:

Microaggressions are NOT unambiguously identifiable when the offensive material is merely implicated.

CONVERSATION AS STRATEGY:

I am a huge fan of using the tools of Game Theory to investigate (some) linguistic phenomena. Game Theory grew out of Decision Theory, and in a nutshell is a formal (mathematical) approach to figuring out how rational, thinking 'agents' should behave when the value of their decisions depends on both (1) their own choice, and more importantly (2) the choices of other agents.

Applied to pragmatics, this means people -- if we were totally rational -- would think about both what we want to say and how it will be interpreted. I think we generally do do this, but tons of research backs up that we're only moderately good at thinking about what other people are thinking. That means conversation is strategy, but we kind of suck at strategizing.

I've got a whole complicated flow chart for speaker decisions that models the process, but it basically boils down to: you have a choice of being nice or not being nice, and if you're not nice you have a choice of either saying it outright ("your outfit looks dumb"), encoding your meanness directly in the utterance through presupposition or entailment ("Only idiots wear herringbone"), or encoding it through cancellable implicature ("your outfit is so...interesting.")

The benefit of the latter is that it's deniable and you can save face if you're challenged: "what do you mean 'interesting'?" "I mean I love how intricate it is!"

But here's a problem: as I mentioned above, we kind of suck at figuring out what other people are thinking. We're also kind of lazy. So it's entirely possible you choose "be nice" at the first branch of your decision-making process, and then basically misfire, and say "you look interesting." You meant something like "you look good," but it just kind of came out wrong because you're socially awkward.

The other possibility is that you misfire, but it's because you have implicit bias that you're unaware of, but which affects your worldview and which other people can recover from your speech. That is, you choose to be nice, and you say something like:

You're a credit to your race.

Right there you've got presuppositions that could be challenged (e.g., "race is biologically real"), and entailments that are perhaps unsavory (e.g., "people of your race are usually not good in some way."). And here you thought you were being nice.

[EDIT: Stephan Hurtubise points out in the comments below that the latter is actually implicature! The entailment is that you are a credit, the implicature we take away from that is that your race needs 'a credit,' but this is, in fact, cancelable. Another post, for another day, is going to be the conventionalization of so-called 'dog whistle.' I would argue that using terms that are conventionally used to insult (e.g., "welfare queen") affects how we interpret the utterance -- even when, as was the case a few years back with a friend of mine, the speaker does not know the conventional meaning.]

Unfortunately, here we're moving away from Game Theory, since GT is all about agents who are completely rational, so having unconscious bias doesn't fit well -- at best, you can model it as some sort of error (these are called "shaky hand" models, usually).

THE DECISION PROBLEM:

So let's say someone says something that doesn't stand out as negative or hostile due to any presuppositions or entailments you can point to that are inherent to the utterance, but you still feel like it's a negative remark because of some implicature. You are then faced with a classification problem: was this comment aggression toward me?

This is where most previous analyses of microaggression really fall apart, and why people get into shouting matches when the subject of microaggression comes up. Most explanations of microaggression boil down to that was obviously an aggressive statement, are you blind or are you playing dumb?! The other side, of course, being that was obviously NOT an aggressive statement, why are you playing the victim?!

My argument is that classifying an ambiguous utterance as either "microaggression" or "not microaggression" is a Bayesian classification problem.

Without getting into Bayes' Rule in depth, the basic idea is that you classify the statement based on situating it in the context, and by evaluating prior beliefs about the speaker and the situation. These priors inform your final classification.

As an example, there's a video going around this week, of a black mother eviscerating a cashier at Target for insulting her children's clothing. In describing the situation to my colleagues, one black female, one white male (note: this is a gross oversimplification of their complex, intersectional identities), they interpreted the situation differently, because they had different prior beliefs about the context.

So here's what happened: one child in question was wearing traditional African clothing, and was evidently shopping without their mother present. The cashier asked the child:

"Are you trick or treating?"

Some potentially relevant context: This took place in early September. Target has Halloween decorations for sale already.

In order to determine how that statement was meant, without further context, we have to start evaluating a handful of possibilities: was she using the narrative present for the future (i.e., did she mean "are you going to go trick or treating?") Did she mean something like "are you trick or treating this year?" so she could helpfully point out the halloween section? Or was she implicating "your clothes look like a costume" and therefore "you look ridiculous" ?

From the utterance alone, it is impossible to tell ... but people have very strong intuitions about what was meant and what is plausible. These are formed, in part, by past experience, so my black female colleague immediately reacted with the interpretation that it was probably intended as an insult (this is my intuition, too), while my white male colleague thought it was an odd thing to say, but potentially wholly unrelated to the child's current outfit. The intuitions are also formed by context. I would expect that on October 31st the statement is minimally offensive, and on April 30th it's approaching maximally insulting. But September 12th in a store that has put up Halloween decorations absurdly early -- that's harder to argue objectively. I have my intuitions, but other people may disagree.

If it was an insult, what makes it an effective microaggression, though, is the fact that the cashier can say "I don't know what you're talking about -- I was just going to point out our halloween section!" Or even slyly double down on microaggression with "I just thought you were in the spirit!" If it wasn't intended as an insult, the cashier made an unfortunate mistake, and while completely well-meaning, accidentally said something that looks like an insult. And there's an old saying about ducks that basically sums up Bayes' Rule in this case (if it looks like a duck, and quacks like a duck...).

THE TAKEAWAY:

In summary:

I propose we limit verbal 'microaggression' to things that cannot be clearly classified as overt aggression.
I suggest that ambiguity is not a bug, but is in fact a key feature
I argue that people think strategically, albeit imperfectly about how we communicate with others.
I argue that when confronted with an ambiguous and potentially 'microaggressive' utterance, the listener is confronted with a classification problem.
I argue we use Bayesian thinking -- where our prior beliefs about the speaker and the context inform our interpretation -- to figure out what was meant
Ultimately, I argue the above is probabilistic and, infuriatingly for those who may have been insulted, can never be 100% proven when the microaggression is implicature and not a 'structural' microaggression (that is, one where presupposition or entailment can be clearly demonstrated).

The key here is that it's important to think about context, and that whenever we get lazy it's entirely possible to assume someone will understand "how you meant it" and be wrong about that assumption. It's also important to note that context is constantly evolving, and that interactions add to people's prior beliefs for the next interaction. This is why when Donald Trump says something ambiguous people assume it probably is microaggression, but when anti-racist activist Tim Wise says something ambiguous, he may get a pass.

Moreover, I think it's hugely important that we don't throw the baby out with the bathwater, so to speak, when challenging less rigorous discussions of microaggression. People are experiencing something that is a real phenomenon. Coded racist speech exists. Dogwhistle exists. But its strength is precisely that it is so maddeningly hard to prove.

In general, my advice would be to listen when people claim they've experience microaggression, and to think about what informs their interpretation. And if you want to avoid accusations of microaggression, that means you're going to have to think about context, history, and the other person's perspective (yes, this is difficult. Sorry.), every time you go to speak.

-----

Have a question or comment? Share your thoughts below!

Kids these days are getting "mindblown"

August 02, 2016 by Taylor Jones

This is going to be a very short post, but it's too good not to share. Yesterday, I heard someone describe another person as mindblown. It was clearly a single prosodic word, and was used as a preterite adjective. It seems to have arisen the way some other adjectives do, from a participle (like "burnt" or "downtrodden"). "He was totally mindblown by what I showed him."

Keeping with English's historical roots, it's also delightfully Germanic. Phrasal verbs -- verbs like "wake up" or "sit down" are sometimes separate words and sometimes "smushed" together in Germanic languages. An example from Dutch is the word for "participate," which can be literally translated as take part: deelnemen. From deel 'part' and nemen 'take.' When you participated, you would use the single word form conjugated for tense: deelgenomen. And here's a new English one.

There are a bunch of ways this could have happened. My pet hypothesis is people reinterpreting memes with "MIND BLOWN" to be a single adjective for the image rather than a statement, from the older (in the internet sense of old) "mind: blown," meaning "my mind is blown."

Not content to just love the shit out of this new word, I decided to look for its obvious relatives. And, lo and behold, people are saying things on social media like:

"What mindblew me the most was..."

and:

"I've got video that will mindblow all of y'all."

But don't just take my word for it! See. It. For. Yourself!

Interestingly, I see a lot of people on Twitter using #mindblown in an ambiguous way -- for many, it seems most natural to posit that it's an adjective and not a whole phrase.

It's probably also important to note that people coin and learn new words all the time. What makes this particularly interesting to me is that it's a total grammatical reanalysis, and it's at least plausibly because of ambiguous input from -- Dun dun DUN! -- THE INTERNET.

All of this has left me feeling more than a little, well:

-----

Have a question or comment? Share your thoughts below!

The linguistics of #BLM: Scalar Implicature and Social Controversy

July 08, 2016 by Taylor Jones

A linguistic controversy is raging in the US, with arguments taking place on the news, on Facebook and Twitter, and at uncomfortable family dinners across the country. I'm talking, of course, about the interpretation of the statement "Black Lives Matter," and various responses to it -- "all lives matter," "blue lives matter," and even the more aggressive "black lives don't matter," that occasionally pops up in some recesses of the internet. I think that part of this controversy is purely social, but part of it is linguistic in nature. I've been seeing well-meaning people talking at cross purposes, and I think it arises from a fundamental misunderstanding of starting assumptions. I'm going to make a linguistic claim, and then attempt to justify it. The claim:

Some confusion, and animosity, over the statements black lives matter and all lives matter comes from different interpretations of assumed Scalar Implicature and the context of the utterance.

Obviously, the first step in justifying this is explaining what the hell Scalar Implicature is. It's two words, and we'll start with the second.

Implicature:

Linguistics is traditionally divided into sub-fields, and the relevant ones here are Semantics (the study of meaning and how we use language to 'mean' things), and Pragmatics (the study of how we do things with language -- this covers everything from making promises, christening ships, and declaring all men born equal to sarcasm, irony, and shade). Implicature is a technical term used in both. Implicature was coined by H.P. Grice, and it refers to a way of communicating something without it being strictly entailed by what was said. It will be helpful to compare the two. Let's look at a sentence:

Andre owns three dogs.

The entailments are ALL the logical things that 'fall out' from this sentence: Andre owns something. Andre owns three of something. Andre owns one dog. Andre owns two dogs. Andre owns three dogs. (There are also presuppositions: that there is a person named Andre, who is known to the listener, etc.).

In some instances, an utterance is straightfoward, and all that matters is the logical structure of the utterance and it's surface meaning. However, we sometimes use language in indirect ways to communicate something beyond the obvious surface meaning. Let's put that sentence in a context:

[Tyrone]: Is Andre a cat person?
[Erykah]: Andre owns three dogs.

Did you catch it? Erykah is basically saying no. But she's doing so with an utterance that isn't directly answering the question. Rather, Tyrone has to figure out that what Erykah is saying is relevant to his question. And this isn't just applicable for negation:

[Tyrone]: Does Andre like dogs?
[Erykah]: Andre owns three dogs!

Here it's an affirmation.

[Tyrone]: Is Andre coming to the party tonight?
[Erykah]: Andre owns three dogs...

Here, it's plausibly simultaneously an answer (no) and an explanation (he has to be home to walk his dogs).

Implicature is a way of using language to communicate something without stating it directly in your utterance.

An important element of implicature is that it is cancellable. This means you can explicitly amend what you said, which is not the case with entailment. So in the third example, the conversation could continue:

[Tyrone]: So he's not coming?
[Erykah]: No, he is. He'll just be a little late.

Note that you can't cancel entailment: Erykah can't then deny that Andre has three dogs, and claim that she never said that. But she can amend or cancel the implicature.

Great -- so what's scalar implicature? Well, the general idea is that a lot of things fall along a spectrum. For instance, <freezing, cold, cool, comfy, warm, hot, sweltering> are ordered with respect to one another. The idea is that when you specifically say something that falls on a scale, you are implicating everything below, and up to that thing, and simultaneously implicating nothing higher on the scale. An easy example is money. If I say:

Congratulations, you've won five dollars!

You take that to mean that you've won all amounts up to and including five dollars (because those things are directly entailed!). But you don't take it to mean you've won twenty. However, this is purely implicature, and it can be cancelled:

Congratulations, you've won five dollars! In fact, you've won twenty!

Similarly, if I say:

I did half of my assigned reading.

...It implies I did up to and including half, but also implicates that I didn't do more. Again, this is cancellable.

I did half of my assigned reading. In fact, I did two-thirds of it.

[EDIT: the same is true of the dog example above: it implicates that Andre has exactly three dogs and no more.] So let's think about implicature and entailment with some simple examples:

Pinot Noir is a red wine
Red wines are delicious
wines are made from grapes
grapes are fruit

As simple statements, there are couple of points to note. First, the obvious one: saying "Pinot Noir is a red wine" does not entail that Beaujolais is therefore not a red wine. Saying grapes are fruit does not entail that kiwis are not fruit.

Now for something more controversial. Saying:

Black Lives Matter

...does not entail that other lives don't. Second, it is not saying that black lives matter more than other lives. Saying "Pinot Noir is a red wine" does not entail that it is inherently more red wine-y than Beaujolais although there are contexts where this interpretation -- via implicature -- could be valid:

[Virginia Madsen]: Beaujolais is a great red.
[Paul Giamatti]: no, Pinot Noir is a red wine.

The implicature is that Beaujolais is lacking some property that therefore makes it not really a red wine, or a defective one at best. Among the people I know who have good intentions, the reactions to Black Lives Matter and All Lives Matter seem to be about what kind of context you put these utterances into. So my black friends are all claiming:

Black Lives Matter [too!]

...but some of my white friends are interpreting that as:

Black Lives Matter [more than others/white lives/your life!]
[Only] Black Lives Matter!

Already, there's a fundamental misunderstanding here, which is exacerbated by the response:

All Lives Matter!

Often, I think they're trying to respond to a perceived "black lives matter more than others" with "all lives matter equally!" But it's missing the point because they're having two different conversations. More importantly, given the context -- black people being executed by agents of the state, with a complete disregard for due process -- and it's hard to understand why people leap immediately to the interpretation that there's a "[more]" there.

The way most people use it, there's a (silent) scalar implicature: Black Lives Matter [As Much As Others]. This does not make for a good chant, and is hard to fit on t-shirts, though. Note, though, that the most natural reading is not to assume "more than others," without a context that would suggest that implicature.

Perhaps most interesting to me is that people are having a reasonable conversation with normal use of pragmatics -- but backwards. That is, it would make sense if the order were:

all lives matter!
no, (just) black lives matter
[angry grumbling]

But that's not what's happening. That's precisely backwards. BLM is not restricting which lives matter, they're focusing on precisely those that are not treated like they matter as much as others.

Unfortunately, some people, often those who have limited real life contact with black folks, are taking the statement "black lives matter" to be restricting the class of lives that matter. That is, they're interpreting it as:

In the set of all lives, black lives, and only black lives, matter.

Even more unfortunately, the most natural interpretation of the response "All Lives Matter" is that it is dismissive. If I really, really liked the movie Ferris Beuhler's Day Off and I have the following conversation:

Ferris Beuhler's Day Off was a fun movie!
All movies are fun.

It would seem like my interlocutor was challenging what I was saying. They're saying, in effect, it was not particularly fun. They might even be implying I'm silly for getting so excited about it, or I have bad taste in films, or that I should shut up about movies and read a book for once.

Ultimately, I think the response "All Lives Matter" comes from assuming a different focus for scalar implicature:

Black Lives Matter [as much as others], versus:
Black Lives matter (and by implicature, not White ones or Asian ones.)

The reasons for these differing starting assumptions, are, of course, not linguistic. If you believe that affirmative action is just black folks "getting one over" on white America, as roughly 30% of Americans do, or you think that black people receive more government services like food stamps than other groups (they don't), then you may think that the implicature in "Black Lives Matter" is that "[only] black lives matter" or "black lives matter [more than others]." If you are outraged at police brutality against black people in particular, and your context for the statement is 400 years of state sanctioned violence against black people, then you may be more inclined to interpret the statement as "black lives matter [as much as any other, damn it]." And of course, if you think that it's "[only] black lives matter," and you're a cop who isn't black, then you may reflexively respond "Blue lives matter" assuming an implicit "too," at the end, and maybe even feeling like you've made it more inclusive.

When 3/4 of white Americans have friend groups that are 99% white, however, is anyone really surprised that what we've got here is a failure to communicate?

So let me end this by saying that [while, yes, all lives, including police officers matter] Black Lives Matter [too]. Or for short:

Black Lives Matter.

---

[EDIT: I was asked why "Andre has three dogs" entails that he has three dogs, but "you've won five dollars" implicates winning five. The answer is that "Andre has three dogs" entails that he has three dogs, and implicates that he has exactly three dogs (not, say, five dogs). This difference is the fodder for stereotypical jokes about software engineers, or Data on Star Trek, Dr. Sheldon Cooper, or any other overly linear/literal thinkers: they'll miss how we use implicature and give technically correct but misleading answers, like the classic "does your dog bite?" joke.]

----

On Stank Face

June 30, 2016 by Taylor Jones

Normally I can keep a cool head about language and keep scientific, descriptive detachment. However, today I saw something deeply perplexing to me, as a musician who plays music in the tradition of Black American Music (jazz, funk, etc.): A kitty litter company has taken the term stank face redefined it, and used it as the main hook for a large scale advertising campaign. Elsewhere, I've written about imagined Black English, and borrowing of terms. This is not borrowing. Rather, whoever does marketing for Tidycat has chosen to simply take and redefine an existing term.

So today, on television, I saw this:

Now, why was this horrifying to me? Well, musicians have used stank face for decades to refer to something completely different. More importantly, this is not natural borrowing and reinterpretation; this is corporate.

First, it may be helpful to discuss the origin: stank comes from a stereotypically black/southern pronunciation of "stink." What stinks? Thinks that are funky. Things that are nasty. Filthy McNasty. There's a long tradition of reacting to something particularly funky with stank face. It's a sign of respect, and for musicians like me, a sign that you're doing it really right. When I first heard Vulfpeck's "Funky Duck" the musician who put the recording on for me knew I liked it because of my reflexive stank face (not duck face!).

Let me reiterate: he knew I LIKED it because of stank face. LIKED IT!

The thing is, the above ad is so close as to be almost right, and then it's just very, very wrong. And it's wrong at the expense of a community of musicians who tend to come from marginalized backgrounds. It literally takes something that musicians -- in primarily black styles -- use, and declares it to be something completely different, for the purpose of selling kitty litter to middle class, white women. And make no mistake, it's very explicitly targeted to middle class, white, female cat owners:

Moreover, it is asking if they're at risk of doing something associated with black cultural styles, assuming you have ever heard of stank face before. Which it's hard not to have. Outkast's André 3000 is stanky enough to say "stank you smelly much" on the regular. I might be wrong, but I'm pretty sure that even if you aren't the kind of fan who saw the movie Idlewild in theaters, you probably still know who Outkast is. And of course, A3K's use is a nod to George Clinton and Parliament Funkadelic, which brings me to my second point, one which is way more hilarious to me than the first.

**Another thing that's stereotypically stank ain't kitty litter, it's marijuana:**

So we've got a cultural mode of expression tied to musical styles (and often associated with marijuana) that's now being (1) explicitly portrayed as negative (no surprises there), and (2) used to sell kitty litter to the group least likely to participate in the culture it's from (although equally likely to participate in drug use -- see, for instance this ACLU report. The sociology of drug use and arrest rates is a fascinating topic, but one for another day, and maybe another blog). Note that this is not to say there aren't some funky white housewives (great band name!), they're just not really the target audience here.

So, I've given two examples of not-quite-stank-face above. What's real stank face look like? This keyboard player listening to another member of Snarky Puppy tear up the keys:

In fact, if you'd like to experience stank face firsthand, you can listen to the track here, and stank along.

Another example is the drummer for the Roy Hargrove Quintet, listening to the pianist while they're playing "Strasbourg St. Denis":

And here's the full recording:

I mentioned "Funky Duck," and it turns out I'm not the only one who finds it funky enough to trigger stank face -- the singer is so nasty he can't help but react to his own funk:

The most ironic thing about all of this, for me, is that it's taking (sometimes) drug-related and always funk related slang and using it so innocently and wrong, while trying to be cool, or as cool as you can be while still being a suburban cat owner. It's reminiscent of the Kyle & Kyle (Kyle from SNL) YouTube sketch in which the character desperately wants to be seen as a stoner, but doesn't know quite how to use any of the words, so claims his toddler is dealing, and suggests "we should box hot the place." (instead of "hot box").

Now, it should be noted that stank face is distinct from a number of other possible faces, including the ill grill:

or the mean mug:

Lastly, there's something ironic about using stank face to sell kitty litter, since it's the cats (yes, musicians still say this) who like it stanky. So while I won't go so far as sayin "somethin' stank and I want some," since I'm not about that life, I will say that I wish they'd stop marketing me whatever funk comes with their kitty litter, and just make my funk the P-Funk.

-----

Have a question or comment? Share your thoughts below!