A different way of looking at the Atlas of North American English

For my dissertation I am investigating regional variation in African American English. The key baseline for comparison is the Atlas of North American English (from here on: ANAE) by Labov, Ash, and Boberg.

The original analysis in the ANAE was done by taking individual (point) observations and dividing them into classes — for example, looking at fronting of the tongue in /uw/ as in goose , they might divide the (normalized) observed frequencies in Hertz into five classes, plot the points, and color them one of five colors based on those divisions. Then, they’d draw a line around apparent clusters (the procedure elaborated in the Atlas is more complicated, but that’s the gist of it). This is not directly comparable with my data, since I have a different number of points in different locations, from a different ethnic group that is spread out with different population centers.

So the first step to make my data and the ANAE — the “gold standard” — comparable, is to do some statistical transformations that (1) take the researcher decisions out of it a bit more, and (2) interpolate values to create something like a heatmap.

After a few months of thinking and coding, I’ve finally got a procedure that does this. Here, I’ll compare the ANAE maps and my maps of the same data (the TelSur, or “telephone survey” data). I’m looking at the second formant (that is, “F2”) of /uw/ before non-coronals — meaning how far front or back the tongue is in the mouth in words like goose, school, fool, pool, etc., but not words like to, dew, toot, sue, etc. The best I can exaggerate it in writing is to say that it’s the difference between “cool” and “kyewl” but if you’re from the southeast, that may not be a meaningful distinction to you.

FIrst, here’s the ANAE data (mapped using the viridis color palette, so colorblind people can read it too). Warmer colors are higher values in Hertz, corresponding (roughly) to the high point of the tongue being further forward in the mouth:


And this is the ANAE analysis of fronting of /uw/, based on their division of point values into categories and visual inspection and demarcation of clusters. Note that the point locations are very inaccurate in the ANAE, it was a specific choice, for readability.


What I did next was to interpolate, assigning missing values based on their 10 nearest neighbors. This won’t change the outcome of subsequent steps, but makes it easier to run the same algorithm to process all the different measurements. So here’s the same data, but with missing values filled in based on their nearest neighbors:


Then, what I did was use Getis-Ord Gi* to smooth the data and highlight hot (and cold) spots. This gives a much clearer (but more abstract) picture of regional patterns. I used the 25 nearest neighbors in this map (I tested 5, 10, 25, and 50, and 25 gives the clearest picture without over smoothing).


Finally, I smoothed the map using Kriging (a method originally used in mining, and commonly used in weather maps — it is the best linear unbiased interpolation of intermediate values) and then assigned kriged values to counties.

This means that caution should be exercised, and the reader should not overinterpret the map. However, it gives a very readable, high level picture of regional variation.


What’s great about this is that it looks very similar to the findings in the ANAE, but it takes researcher decision making a little further out of it, and it’s now theoretically comparable to a map from a dataset that has a different number of point observations from different locations. So as I finish cleaning and analyzing my African American English accent survey, I now have a way of comparing regional variation in AAE with the dialect regions in white varieties of North American English.


©Taylor Jones 2019

Have a question or comment? Share your thoughts below!

Testifying While Black

[content warning: language] [co-authored with Jessica Kalbfeld]

For the last four years I've been working on a large-scale project distinct from writing my dissertation that my family and friends know I refer to as my "shadow dissertation." It's a co-authored paper, with Jessica Kalbfeld (Sociology, NYU), Ryan Hancock (Philadelphia Lawyers for Social Equity, WWDLaw), and my advisor, Robin Clark (Linguistics, University of Pennsylvania), and we just received word that it has been accepted for publication in Language. Many of my other projects, including my work on the verb of quotation talkin' 'bout, on first person use of a nigga, and on the spoken reduction of even to "eem", among others, were all in service of this project.

Simply put: court reporters do not accurately transcribe the speech of people who speak African American English at the level of their industry standards. They are certified as 95% accurate, but when you evaluate sentence-by-sentence only 59.5% of the transcribed sentences are accurate, and when you evaluate word-by-word, they are 82.9% accurate. The transcriptions changed the who, what, when, or where 31% of the time. And 77% of the time, they couldn't accurately paraphrase what they had heard.

Let me be clear: I am not saying that all court reporters mistranscribe AAE. However, the situation is dire. For this project, we had access to 27 court transcriptionists who currently work in the Philadelphia courts -- fully a third of the official court reporter pool. All are required to be certified at 95% accuracy, however the certification is based primarily on the speech of lawyers and judges, and they are tested for speed, accuracy, and technical jargon. 

We recruited the help of 9 native speakers of African American English (if you're new to my blog, African American English is a rule-governed dialect as systematic and valid as any other), from West Philadelphia, North Philadelphia, Harlem, and Jersey City (4 women and 5 men). Each of these speakers were recorded reading 83 different sentences, all of which were taken from actual speech (that is, we didn't just make up example sentences). These sentences each had specific features of AAE, 13 in total, as well as combinations of features. Examples of sentences included:

  • When you tryna go to the store?

  • what he did?

  • where my baby pacifier at?

  • she be talkin’ ‘bout “why your door always locked?”.

  • Did you go to the hospital?

  • He been don’t eat meat.

  • It be that way sometimes.

  • Don’t nobody never say nothing to them.

Features we tested for included: 

  • null copula (deletion of conjugated is/are, as in he workin’ for “he is working”).

  • negative concord (also known as multiple negation or “double negatives”).

  • negative inversion (don’t nobody never say nothing to them meaning “nobody ever says anything to them).

  • deletion of posessive s (as in his baby mama for his baby’s mama).

  • habitual be (an invariant grammatical marker that indicates habitual action, as in he be workin’ for “he is usually working”).

  • stressed been (this marks completion in the subjectively distant past, as in I been did my homework meaning “I completed my homework a long time ago”).

  • preterite had (this is the use of had where it does not indicate prior action in the past tense, but rather often indicates emotional focus in the narrative, as in what had happened was… for “what happened was…”).

  • question inversion in subordinate clauses (this is when questions in subordinate clauses invert the same way as in matrix clauses in standard English, as in I was wondering did you have trouble for “I was wondering whether you had trouble”).

  • first person use of a nigga (This is where a nigga does not mean any person, but rather indicates the speaker, as in a nigga hungry for “I am hungry”).

  • spoken reduction of negation (this is the reduction of ain’t even to something that sounds like “eem”, or the reduction of don’t to something that sounds like “ohn”).

  • quotative talkin’ ‘bout (this is the use of talkin’ ‘bout, often reduced to sounding like “TOM-out” to introduce direct or indirect quotation, as in he talkin’ ‘bout “who dat?” meaning “he asked ‘who’s that?’”).

  • modal tryna (this is the use of tryna to indicate intent or futurity, as in when you tryna go for “when do you intend to go?”).

  • perfect done (this is a perfect marker, indicating completion or thoroughness, as in he done left meaning “he left”).

  • be done (this is a construction that can mark a combination of habitual and completed actions, or can mark resultatives, as in I be done gone home when they be acting wild for “I’ve usually already gone home when they act wild”).

  • Expletive it (this is replacing standard English “there” with it, as in it’s a lot of people outside for “there are a lot of people outside”).

  • combinations of the above, as in she be talkin’ ‘bout “why your door always locked?” meaning “she often asks ‘why is your door always locked?’”

These are by no means all the patterns of syntax unique to AAE, but we thought they were a decent starting point. However, not only does AAE have different grammar than other varieties of English, but more often than not, African Americans have different accents from their white counterparts within the same city. Think about it: Kevin Hart's Philadelphia accent is not the same as Tina Fey's (it's also why Kenan Thompson's Philly accent is so weird in that sketch). 

All of the court reporters we tested were given a 220Hz warning tone to tell them a sentence was coming, followed by the same sentence played twice, followed by 10 seconds of silence. We asked them to 1) transcribe what they heard (their official job) and 2) to paraphrase what they heard in "classroom English" as best as they could (not their job!). The audio was at 70-80 Decibels at 10 feet (that is, very loud). The sentences and voices were randomized so they heard a mix of male and female voices, and they didn't hear the same syntactic structures all at the same time. All of the court reporters expressed that what they heard was:

  • better quality audio than they're used to in court

  • consistent with the types of voices they hear in court (more specifically, they often volunteered "in criminal court").

  • spaced with more than enough time for them to perform the task (they often spent the last 5 seconds just waiting -- they write blisteringly fast).

What was the result? None of them performed at 95% accuracy, no matter how you choose to define accuracy, when confronted with everyday African American English spoken by local speakers from the same speech communities as the people they are likely to encounter on the job. If you choose to measure accuracy in terms of  full sentences -- either the sentence is correct or it is not -- the average accuracy was 59.5% If you choose to measure accuracy in terms of words -- how many words were correct -- they were 82.9% accurate on average. Race, gender, education, and years on the job did not have a significant effect on performance, meaning that black court reporters did not significantly outperform white court reporters (we think this is likely because of the combination of neighborhood, language ideologies, and stance toward the speakers -- black court reporters distanced themselves from the speakers and often made a point of explaining they "don't speak like that."). Interestingly, the kinds of errors did seem to vary by race: there's weak evidence that black court reporters did better understanding the accents, but still struggled with accurately transcribing the grammar associated with more vernacular AAE speakers. 

For all the court reporters, their performance was significantly worse when we asked them to paraphrase (although individual court reporters did better or worse with individual features. For example, one white court reporter nailed stressed been every time -- something we did not expect). Court reporters correctly paraphrased on average 33% of the sentences they heard. There was also not a strong link between their transcription and paraphrase accuracy -- in some cases they even transcribed all the words correctly, but paraphrased totally wrong. In a few instances, they paraphrased correctly, but their official transcription was wrong!  The point here is that while the court reporters did poorly transcribing AAE, they did even worse understanding it -- which makes it no surprise they had difficulty transcribing.

In the linguistics paper, we go into excruciating detail cataloguing the precise ways accent and grammar led to error. However, the takeaway for the general public is that speakers of African American English are not guaranteed to be understood by the people transcribing them (and they're probably even less likely to be understood by some lawyers, judges, and juries), and not guaranteed that their words will be transcribed accurately. Some examples of sentences together with their transcription and paraphrase include (sentence in italics, transcription in braces <>, and paraphrase in quotes):

  • he don’t be in that neighborhood — <We going to be in this neighborhood> — “We are going to be in this neighborhood”

  • Mark sister friend been got married — <Wallets is the friend big> — (no paraphrase)

  • it’s a jam session you should go to — <this [HRA] jean [SHA] [TPHAO- EPB] to> — (no paraphrase)

  • He don’t eat meat — <He’s bindling me> — “He’s bindling me”

  • He a delivery man — <he’s Larry, man> — “He’s a leery man”

Why does this matter?

First and foremost, African Americans are constitutionally entitled to a fair trial, just like anyone else, and the expectation of comprehension is fundamental to that right. We picked the "best ears in the room" and found that they don't always understand or accurately transcribe African American English. And crucially, what the transcriptionist writes down becomes the official FACT of what was said. For 31% of the sentences they heard, the transcription errors changed the who, what, when, or where. Some were squeamish about writing the "n-word" and chose to replace it with other words, however those who did often failed to understand who it referred to (for instance, changing a nigga been got home 'I got home a long time ago" to <He got home>, or in one instance <Nigger Ben got home>, evidently on the assumption it was a nickname). 

And it's not just important for when black folks are on the stand. Transcriptions of depositions, for instance, can be used in cross-examination. In fact, it was seeing Rachel Jeantel defending herself against claims she said something she hadn't that sparked the idea for this project. (And she really hadn't said it -- I've listened to the deposition tape independently, and two other linguists -- John Rickford and Sharese King -- came to the conclusion the transcription was wrong, and have published to that effect). Transcriptions are also used in appeals. In fact, one appeal was decided based on a judge's determination of whether "finna" is a word (it is) and whether "he finna shoot me" is admissible in court as an excited utterance. The judge claimed, wrongly, that it is impossible to determine the "tense" of that sentence because it does not have a conjugated form of "to be", claiming that it could have meant "he was finna shoot me." If you know AAE, you know that you can drop "to be" in the present but not in the past. That is, you can drop "is" but not "was". The sentence unambiguously means "he is about to shoot me," that is, in the immediate future.

This is excluding misunderstanding like with the recent "lawyer dog" incident in which a defendant said "I want a lawyer, dawg" and was denied legal counsel because there are no dogs who are lawyers.

All of this suggests a way that African Americans do not receive fair treatment from the judicial system; one that is generally overlooked. Most of us learn unscientific and erroneous language ideologies in school. We are explicitly taught that there is a correct way to speak and write, and that everything else is incorrect. Linguists, however, know this is not the case, and have been trying to tell the public for years (including William Labov’s “The Logic of Nonstandard English,” Geoffrey Pullum’s “African American English is Not Standard English with Mistakes,” the Linguistic Society of America statement on the “ebonics” controversy, and much of the research programs of professors like John Rickford, Sonja Lanehart, Lisa Green, Arthur Spears, John Baugh, and many, many others). The combination of these pervasive language attitudes and anti-black racism leads to linguistic discrimination against people who speak African American English — a valid, coherent, rule-governed dialect that has more complicated grammar than standard classroom English in some respects. Many of the court reporters assumed criminality on the part of the speakers, just from hearing how the speakers sounded — an assumption they shared in post-experiment conversations with us. Some thought we had obtained our recordings from criminal court. Many also expressed the sentiment that they wish the speakers spoke "better" English. That is, rather than recognizing that they did not comprehend a valid way of speaking, they assumed they were doing nothing wrong, and the gibberish in their transcriptions (see above examples) was because the speakers were somehow deficient.

Here, I think it is very important to point out two things: first, many people hold these negative beliefs about African American English. Second, the court reporters do not have specific training on part of the task they are required to do, and they all expressed a strong desire to improve, and frustration with the mismatch between their training and their task. That is, they were not unrepentant racist ideologues out to change the record to hurt black people — they were professionals, both white and black, who had training that didn't fully line up with their task and who held  common beliefs many of us are actively taught in school.

What can we do about it?

There is the narrow problem we describe of court transcription inaccuracy, and there is the broader problem of public language attitudes and misunderstanding of African American English. For  the first, I believe that training can help at least mitigate the problem. That's why I have worked with CulturePoint to put together a training suite for transcription professionals that addresses the basics of "nonstandard" dialects, and gives people the tools to decode accents and unexpected grammatical constructions. Anyone who has ever looked up lyrics on genius.com or put the subtitles on for a Netflix comedy special with a black comic knows that the transcription problem is widespread. For the second problem, bigger solutions are needed. Many colleges and universities have undergraduate classes that introduce African American English (in fact, I've been an invited speaker at AAE classes at Stanford, Georgetown, University of Texas San Antonio, and UMass Amherst), but many, even those with linguistics departments, do not (including my current institution!). Offering such classes, and making sure they count for undergraduate distribution requirements is an easy first step. Offering linguistics, especially sociolinguistics in high schools, as part of AP or IB course offerings could also go a long way toward alleviating linguistic prejudice, and to helping with cross dialect comprehension. Within the judicial system more specifically, court reporters should be encouraged to ask clarifying questions (currently, it's officially encouraged but de facto strongly discouraged). Lawyers representing AAE speaking clients should make sure that they can understand AAE and ask clarifying questions to prevent unchecked misunderstanding on the part of judges, juries, and yes, court reporters. Linguists and sociologists can, and should, continue public outreach so that the general public has an informed idea about what science tells us about language and discrimination.

This is a disturbing finding that has strong implications for racial equality and justice. And there's no evidence that the problem of cross-dialect miscomprehension is only limited to this domain (in fact, we have future studies planned already, in medical domains). This study represents a first step toward quantifying the problem and what the key triggers are. Unfortunately, the solutions are not all clear or easy to enact, but we can chip away at the problem through careful scientific investigation. On the heels of the 19th national observance of Martin Luther King Jr. Day (It has only been observed in all 50 states since 2000(!)), it seems appropriate to reaffirm that “No, no, we are not satisfied, and we will not be satisfied until justice rolls down like waters and righteousness like a mighty stream.”


©Taylor Jones 2019

Have a question or comment? Share your thoughts below!

Another quick look at AAE variation

I’m still gathering and analyzing spoken data for my dissertation, studying regional variation in African American accents. In the meantime, I’ve been analyzing a data set of ~59 million AAE(-like) tweets. Another pattern consistent with anecdotal experience has emerged from this twitter data. The image below is two maps, both using Gi* hot spot detection. I compared use of nothing/nothin against variants with either th-stopping (pronouncing it like “nuttin”) or th-fronting (pronouncing it like “nuffin”). I used all the spellings I could think of: nuttin, nutin, nutn, nuin, nun, versus nuffin, nufin, nuffn, nufn, nuffen, etc. Areas that are redder are areas where a given variant is more likely to be used, blue ares are places where it is less likely to be used.

In the image below, the left is the map for nuffin (more accurately, all the th-fronting variants) and the right is the map for nuttin (all the th-stopping variants).


Briefly, the above shows that in this dataset of AAE tweets, th-stopping is more common in the deep south (almost perfectly picking out the historical slave states), and th-fronting is more common along the east coast, especially the DMV, but following the east coast all the way up destination cities for the Great Migrations. It looks like th-stopping is more popular around NYC and maybe Philadelphia, also consistent with what I hear on the street (a mix of both, but with more th-stopping in NYC).



©Taylor Jones 2018

A different look at AAE dialect regions

While it’s still far too early for me to have results from my (audio) dialect survey for African American English Accents, I recently got a nice replication of some previous findings. I was at a computational sociolinguistics workshop a few weeks ago, and was graciously allowed access to some 2013 data, to compliment my 2015 work. Anyway, looking at a million tweets, I found the same pattern — South to North — of use of sholl for “sure”. (This is comparing all spelling variants, so sholl, shol, showl, showll, shole, etc. versus sure, shurr, shur, etc.)

Using a Getis-Ord Gi* statistic for hotspot analysis, much the same pattern as I’d argued for in 2015 appears in this different dataset. Note, however, that I did play fast and loose with the K nearest-neighbors, and this is a very preliminary analysis. I’m also still working out how best to represent the “envelope of variation” — so this is an analysis where I’m comparing sholl to sure as a proportion of total tweets.


As I’ve written about elsewhere, this feature and others do not pattern at all like regional variation in (white) English described in the Atlas of North American English.



©Taylor Jones 2018

African American English and Cross Dialect Comprehension

A while back, I wrote a handful of tweets in response to someone describing a linguist giving students a test on their comprehension of African American English. I explained that I am a linguist and part of what I study is cross-dialect comprehension between AAE and mainstream, “classroom” (white) English. Or really, the lack of comprehension on the part of the mainstream speakers. The tweet was seen by over 50,000 people (!) and a lot of people asked for DMs with more information about AAE. I figured it was easier to put some information all in one place here.

I’ve written elsewhere about what AAE is, and about borrowing and appropriation, especially those based on not quite understanding what is being borrowed, but here I want to dig a little more into whether and to what extent people who don’t speak AAE actually understand it.

I have a co-authored paper under review right now that I won’t discuss further here, that investigates to what extent court reporters understand and accurately transcribe AAE, which I will blog about once it’s published (spoilers: it’s bad out there). Below is a primer on AAE, a handful of things that are not understood by non-AAE-speakers, and some recommended readings.

A quick primer on AAE:

AAE is a dialect spoken primarily but not exclusively by black Americans, and is the language associated primarily with the descendants of slaves in the American South. It is a systematic, rule-governed, logical, fully-formed language variety, and it differs significantly from other varieties of English, across all levels of the language (that is, the phonology, or sound system, is different, it has different grammatical rules, etc.). It is important to note that AAE has different grammatical rules than standard English, and not that it has no grammatical rules. Therefore, it is absolutely possible to speak it wrong — something white people who are ignorant of the rules do often when imitating black people who speak AAE.

The accent of AAE is different from white accents, and because of segregation, people in the same city often have very different accents depending on race. Take Chicago for instance. The stereotypical white Chicago accent exhibits what’s called the Northern Cities Vowel Shift, which SNL made fun of with their sketch about “da bears.” But that’s not the only Chicago accent. Think about it: does Kanye West sound like that?

It’s actually not fair to say the accent of AAE, since there’s regional differences (Michael B. Jordan (Philly) sounds nothing like Ryan Coogler (Bay area)). In fact, my dissertation research is on regional variation in AAE accents (if you identify as black and grew up in the US, please think about participating in my anonymous survey — it takes 3-4 minutes and can be found here: www.languagejones.com/aaes).

The grammar:

When I talk about cross-dialect comprehension, different accents definitely play a part, but so does very different grammar. There’s not much research on how well non-AAE speakers understand or don’t understand AAE, but what there is does not look good.

Labov 1972 found that white teachers in Harlem did not understand habitual be or stressed been. When given the secnario “you ask a child if he did his homework, and he replies ‘I been did my homework’” most incorrectly interpreted that to mean the child had not completed their homework. (see #2 below) Similarly, Rickford 1975 mentions an informal survey in which white participants took “they been got married” to mean a number of different, all wrong things.

Arthur Spears coined the term “camouflage construction” for constructions in AAE that look like they mean something in standard English, but really mean something else. He did this initially when describing “indignant come”, which is a marker of indignation, not a verb of motion. John Rickford and a few of his students did work on the use of had in preterite, not perfective, constructions. Christopher Hall and I have written on first person use of a nigga, and have a paper under review right now dealing with more than 10 different uses of “the n-word” in AAE that are distinct from those available to speakers of other dialects. I’ve written about “talkin’ ‘bout'“ as a verb of quotation.

But beyond a handful of papers on individual morphosyntactic features of AAE, there’s not really any research on how well other people actually understand it. We know they don’t always understand habitual be, but not at what rate they do or don’t. Same for a ton of other features. The court reporter paper I mentioned above is, to my knowledge, the first quantitative test of cross-dialect comprehension for almost all of the features mentioned in it.

What is unique to AAE? What is not understood by others?

Keeping in mind that there’s not much quantitative research on this, I can at least point to a handful of differences between AAE and other language varieties that lead to confusion or miscomprehension. Here’s a partial list:

  1. Habitual be: he be workin’ does not mean “he is at work” or “he is working.” It means he works, usually or often. In fact, a sentence like this can imply he’s not currently at work. I wrote a short post about it here, comparing hiring ads for fast food restaurants. This is one of the earliest features that sociolinguists focused on. Bill Labov, Walt Wolfram, and John Rickford, as well as many, many others have written about this.

  2. stressed been: This refers to actions completed in the distant past. So I been did my homework means I finished it a long time ago. I been told you that means I told you a long time ago. They been got married means they got married a long time ago, and still are. It does not mean the same thing as standard English “have been” as in I have been doing my homework — which implies I didn’t finish yet. John Rickford has written extensively about this.

  3. Preterite had: This is use of “had” for past events, but not to situate them before others. I had went to the store means the same thing as “I went to the store”, although it may have a different function in terms of emotion in a narrative. John Rickford has written extensively about this.

  4. Quotative “talkin’ ‘bout”: This is “talkin’ ‘bout” used the same way white people use “like” as in “he was all like ‘oh my god’”. It’s often used with indignant come, and often used in a mocking context. I wrote a paper about it available here. It’s also touched on in Arthur Spears’ work on indignant come, and in Patricia Cukor-Avila’s work on verbs of quotation.

  5. First person a nigga: this is where a nigga means the same thing as “me” or “I”. I have blogged about it here, I have a paper in conference proceedings about it here, and Christopher S. Hall and I have a paper about it (and other n-words) under review right now.

  6. Negative Auxiliary Inversion: This is don’t nobody never instead of “nobody (n)ever does”. Interestingly, there’s some evidence that without context, people who don’t speak AAE interpret these as commands. Lisa Green has written about the grammar of this construction.

  7. Question Inversion in subordinate clauses: instead of “I was wondering whether you did it,” you may hear I was wondering did you do it. Lisa Green has written about this. There’s some evidence that it’s below the level of consciousness even for middle class speakers of what Arthur Spears calls AASE (African American Standard English).

  8. The associative plural nem (an’ them"): to my knowledge, there’s only one sentence on this in the sociolinguistics literature, in a book chapter written by Salikoko Mufwene (in African American English: Structure, History, and Use). This functions the same as associative plurals in other languages (like Zulu). Saying Malik nem (or “Malik an’ ‘em") means “Malik and the people associated with him” and from context it’s clear who that means. Could be family, could be friends, could be the people he’s sitting with right now. I have an aunt (it the African American family-by-choice-not-blood kind of way) named M., and stay asking about M nem.

  9. Stay for regular or repeated action: He stay acting stupid does not mean “he’s still acting stupid” or “he remains acting stupid” but rather, he consistently, repeatedly acts stupid.

  10. It instead of there: it’s a lot of people means “There are a lot of people”…

  11. Deletion of the subject relative pronoun: Standard English can delete “who” when referring to a person in a subordinate clause only if the person is the direct object (“That’s the man who I saw yesterday” or “Thats the man I saw yesterday”). AAE can delete the subject version (That’s the man saw me yesterday). I recently heard 10 and 11 combined, on the radio: It’s a lot of people don’t go there (meaning, there are a lot of people who don’t go there).

  12. finna and tryna as immediate future markers: There’s one conference paper written by an undergrad (who I think didn’t continue to grad school in linguistics) about tryna as marking intent or immediate future action. There’s an entire court case where the appeal decision hinged on whether finna was a word and what it means. Both can be used to mean you’re about to do something.

  13. be done: White folks often know done as in “he done hit him!” but don’t know be done as in “I be done gone to bed when he be getting off work” meaning “I’ve usually already gone to bed when he is getting off work”. There’s also the be done familiar from the crows in Dumbo: I’ll be done seen most everything when I seen an elephant fly, which is a slightly different construction.

  14. Set expressions, idioms, clichés: Things like it be that way sometimes, or what had happened was are not always understood, or even recognized as set expressions.

There plenty of others, but these are the main ones (in my opinion). And of course, these can all combine with each other in longer sentences (“it be a lot of people talkin’ ‘bout ‘why she always be hanging out with Malik nem?’”). Combine that with a completely different accent, even (especially?) in the same city, and you have a recipe for total miscomprehension.

The interesting thing for me, though, is that from both personal anecdotal experience and some limited research, it appears that people who don’t speak AAE, especially white folks, generally assume (1) black folks are speaking “broken” English, and (2) that they understand it even when they don’t. So people will hear I been told you that and assume it means “I have been telling you that” and that the speaker just…said that wrong. Both sentence structures exist in AAE, and they mean different things. But only one exists in “classroom” English.

Some good readings:

There’s not a lot of material aimed at regular people instead of linguists, however, I highly recommend a few books:

  • Spoken Soul (Rickford and Rickford)

  • African American English: A Linguistic Introduction (Lisa Green)

  • Language and the Inner City (William Labov — this one is from 1972, at the beginning of AAE being taken seriously as an object of study).

  • African American English: Structure, History, and Use (ed. Salikoko Mufwene)

  • The Oxford Handbook of African American Language (ed. Sonja Lanehart. This one is massive and new, but a lot of it is very technical).



©Taylor Jones 2018

"Also, dude, 'Chinaman' is not the preferred nomenclature": Game Theory and the Euphemism Treadmill

A few years ago, I had a good talk with NPR's Gene Demby about why we have so many terms for people of color, and what linguists refer to as the euphemism treadmill. (He ended up writing this article). I've been thinking about this topic off and on since then. As with so much of language, for terms for other people the choice of word a speaker uses signals something to listeners about the person speaking. 

What this means, is that we can think of the use of terms for groups of people as a signaling game, and we can situate it within the kind of discussion of strategic thinking that happens in Game Theory. What makes this instance particularly interesting is that it’s a coordination game on a massive scale. Basically, we’re all sometimes confronted with having a choice of words, and that choice of words may tell other people something about us (whether we want it to or not!). That, in turn, may affect how they react to us or treat us. Therefore, when confronted with such a choice, say between “colored” and “African American”, we have no choice but to strategize about what word we choose. (Technically, this is not strictly true, as we could just say whatever, but this is generally a losing strategy —- one that results in people thinking less of you and adjusting how they treat you. However, as with everything linguistic, it’s complicated; some people are willing to make an allowance for, say, the elderly, as when older relatives of mine asked after my groomsmen, and wanted to know how “the Canadian”, “the negro”, and “the oriental” were doing, and followed up with “send them [our] love.”)

It turns out these kinds of patterns look very similar to those from evolutionary models originally designed to capture gene flow, and predator-prey dynamics, with some minor tweaks. I won’t get into the math here, since it’s more complicated than I’m willing to dive into in a blog post, but the general idea is that there are a few factors that can all affect the euphemism treadmill. First is random drift — the creation of a new word or repurposing of an existing word can be thought of as analogous to mutation. Sometimes these forms disappear, and sometimes they completely take over; in the long run, in strict competition, it’s one or the other. Second is a predator-prey dynamic: we can think of these forms as being in competition for the same ecological niche, or we can think of the new form as literally preying on the old one(s). The second metaphor isn’t perfect, but it captures something about the ecology of word use.


On a very large scale, we can think of this as a coordination game. If “bad” people say “oriental” then when we hear “oriental” we can’t strictly determine that the speaker isn’t bad. So we go out of our way as speakers to signal that we’re “good” by picking a different word. However, in large populations, in the long run, these kinds of signaling games have interesting properties for a few reasons. First, we tend to be lazy, so if we can get away with saying “oriental” and it’s easy for us (say, it’s what we’ve always said), then we’ll do that. If we think there’s no cost, we go with the easy option. Second, sometimes bad people don’t want us to know they’re bad people. So if they know that everyone else assumes one is bad if one says “oriental” when referring to a person, they may hop on the new word bandwagon to avoid being “outed” as a bad person!

Eventually, two related things happen. First, people who use the new words just because it’s what everyone else expects of “good” people sometimes give themselves away, and the words they use may become associated with people who have their views. It doesn’t matter if I avoid saying “oriental” if I instead say “those damn Asian-Americans are ruining our cities” or some such nonsense. Second, things that people think of negatively are still associated with the new words to describe them, so eventually euphemisms become taboo themselves (see, for example, “toilet”). So for two related reasons, as a euphemism or a new term gains traction, especially if it becomes the main word people use, it leads to the need for another new word to separate out who means what. If I have negative beliefs about black people, but want to be thought of in a positive light by others, I might say “African American” … but it will be abundantly clear that I still mean something vaguely negative by it. If you want to signal virtue [sidenote: “virtue signalling” was a term from evolutionary game theory that has now been adopted by some regressive cranks, and is now slowly becoming something I had to think twice about using here…because of what it now signals by association with racist and sexist groups], you have to find a new way to differentiate yourself.

And so the wheel turns again.

For each of the above charts, we can generally think of social movements that relate to the terms used — for instance, the shift from “negro” to “black” coincides with the Black Power movement — but this doesn’t invalidate the game theory approach here. Rather, knowing about these social movements adds to our understanding just how this kind of massive signaling game plays out in society, and the kind of real world repercussions involved. Obviously, there’s a lot more involved here, and the above is a gross simplification that just scrapes the surface of the kinds of strategies involved, but I find it fascinating that the models developed to better understand gene flow and animal competition do a pretty good job of also capturing how words change in society in the long run. When I’m deciding whether to say “black” or “African American” based on what I think I know about the listener (or reader), millions of other people are making the same strategic calculations every day, and our individual decision-making (and the fallout from our decisions) in part drives massive social changes on a much larger time scale. It’s related to both how signaling strategy plays out in large groups in general, and to how other kinds of words change.



©Taylor Jones 2018

My work was just cited by a crank, here's a response

I recently came across an article written for Quillette by Heather Mac Donald which uses a research paper of mine published in American Speech in 2015 to defend a frankly stupid position. The article was shared by Stephen Pinker, which means increased visibility, so naturally I want to make sure the record is straight as far as concerns my research. The position she uses my work to justify is a position I disagree with not on political grounds, but on empirical grounds. I'm going to contextualize all of this for those unfamiliar with the players involved, before adding my response. [Note, this post uses a racial slur, a sex/gender slur, and some colorful Quebecois in citation form.]

Some Context:


Quillette is a for-profit 'safe space' from 'political correctness' and 'leftist bias' created by a grad school dropout (one who, in interviews, claims explicitly that she is "actually trained as a psychologist" despite not, you know, actually having finished her training). You may have come across it recently when they published an article written by an undergrad that purported to challenge Ta-Nehisi Coates' work (among others). 

This article serves as a pretty good explanation of what Quillette is, and what it's trying to be. (Highlights include: "Quillette makes tired alt-right talking points sound erudite", and "Instead of writing off the academic left — and, generally speaking, women and people of color — as crybabies or social justice warriors, Quillette’s writers use the classical liberal tradition of 'mature debate' to dismiss marginalized voices".)

Heather Mac Donald

The author of this particular piece, Heather Mac Donald, is most notable for authoring such works as The War on Cops, The Illegal Alien Crime Wave, In Defense of Fascism and The Diversity Delusion. (Ok, one of those is fake, but the other three are real). I think her works speak for themselves.


Steven Pinker is a well-known cognitive psychologist who does some work in linguistics, and who has been relatively influential. He also has gone off the rails on Twitter lately, so, for instance, in tweeting the link to this Quillette article, he complains about "PC/SJW." That is, Political Correctness (which is, more or less, trying not to intentionally say mean, hurtful, or offensive things by thinking about your choice of words before speaking) and Social Justice Warriors. I'm not 100% clear on what's wrong with social justice, but it's clear from use that SJW is intended as derisive, and directed toward people who --- I don't know. Want equality? Anyway, the point is Pinker is well known and is amplifying Quillette's signal, using in-group signals for the alt-right (whether intentionally or not). In this case, it's the writings of a woman who believes that "phantom police racism" is a cover to keep people from discussing the "uncomfortable problem" of "black on black crime". One who then cites my research out of context, evidently to defend her desire to say nigger (no, really, this is not an embellishment; see below). 

The Quillette Article

I will reiterate that Quillette is for profit, so keep that in mind when deciding to click through. The article in question can be found here, if you, dear reader, wish to read it for yourself (perhaps use an ad blocker?). 

The article is ostensibly a defense of the poet Anders Carlson-Wee, who was the subject of a minor online tiff last week, after The Nation published a poem of his written in an approximation of African American English. John McWhorter, with whom I do not always agree, wrote an excellent, thoughtful piece in defense of Carlson-Wee, which can be found here

Heather Mac Donald, however, has taken the controversy as a jumping-off point to dive into her feelings. In this case, her feelings about censorship. My goal here is not to catalogue all the things wrong with her article, as I simply don't have the time to do so, and others have done so better (especially with regards to her bizarre reading of Plato). I do want to touch, however, on a few points. 

First, she refers to African American English as "black street dialect". I object to this not on "SJW" grounds (that is, that it is clearly offensive dogwhistle: what is the function of "street" in this description? It's not location; it's judgment. Is whatever Heather speaks only spoken indoors?), but rather I object to it on scientific grounds. There is a wealth of literature on the speech of African Americans going back at least 60 years, and that is simply not the term used by anyone who knows even the slightest bit about the subject. You may have feelings about AAE versus AAL versus AAVE, but if you're discussing a language variety it would behoove you to use really any of the actual names for it. It would be like me discussing "Iranian town dialect" instead of Persian/Farsi. I just look dumb and unecessarily prejudiced.

Second, she argues strongly that there is some boogeyman mob that will ruin your life if you ever mention a taboo word, in citation form or otherwise. As a linguist who researches and says taboo words, this is total nonsense. People are generally extremely good at, well, context. I am a cis/het white man, and part of my job is to discuss taboo words publicly. And you know what? No ill has come of it yet, because I do so in (1) appropriate contexts, (2) with academic rigor, and (3) with respect for both the communities that hold those taboos and respect for the people described by those words (when those words describe people). 

It's the third point that's going to take a little work. The paragraph that cites my work, is, well, absurd. In that paragraph, Mac Donald writes a lot of garbage: 

"The elaborate rituals around the ‘n-word’ evince the same double standard regarding authorial intention. According to existing conventions, whites may never use the full word without elision, even if they are doing so not to refer to anyone but as reported speech. Its mere presence in the mouth of a white person launches a nuclear bomb against blacks; the transgressor will be punished accordingly, as the founder of Papa John’s pizza discovered after using the full word as an embedded quote from chicken impresario Colonel Sanders. Blacks, however, can use the word in toto to refer to actual people, because their intentions matter and it is assumed that blacks are incapable of racist intent. Black Twitter users used the n-word 6.2 million times in one month, according to a 2015 study; it is ubiquitous in urban vernacular and in rap music, with black entertainers like Jay Z, Beyoncé, and Kanye West tossing it off with impunity."

Let's unpack this.

  1. "According to existing conventions..."  --- What conventions? In what contexts? This has the appearance of social science without any of the social science. 
  2. "whites may never use the full word without elision, even if they are doing so not to refer to anyone but as reported speech."  --- This is untrue, but as I've written elsewhere, not a bad rule of thumb if you want to avoid pissing people off. 
  3. "Its mere presence in the mouth of a white person launches a nuclear bomb against blacks; the transgressor will be punished accordingly, as the founder of Papa John’s pizza discovered after using the full word as an embedded quote from chicken impresario Colonel Sanders" --- This is patently, obviously untrue, and just wildly hyperbolic. A nuclear bomb? As I've written elsewhere, there's context for when it is possible to say the n-word (and that's a separate question from whether you should say it). ALSO, it's important to note that while the founder of Papa John's did use "nigger" in citation form, he did so while complaining about how he can't say it, but someone else got away with it! That's like getting mad that people call you misogynist when you complain that 'feminazis' are preventing you from calling women 'bitch'. It's just a sneaky way of trying to say it anyway. It's like me saying "Why can't I tell everyone that 'Heather Mac Donald is an idiot.'?" Just because it's embedded doesn't mean it loses its force, right Heather? (this was actually the subject of an academic talk at this year's Annual Meeting of the Linguistic Society of America).
  4. "Blacks,"  --- really? Listen. You can call people that, but I'm pretty sure it pisses of black people to be called "blacks" just as much as it pisses off white people to be called "whites". In general, taking an adjective and then using it as a noun for a group of people you think it describes is not well received. This is basic stuff here.
  5. "Blacks, however, can use the word in toto to refer to actual people, because their intentions matter and it is assumed that blacks are incapable of racist intent. " --- Justify this statement. According to whom? Under what circumstances? This is the sentence before citing my work, and the implication is that my work in some way justifies this stupid statement. If you want to draw on arguments that prejudice and racism are different and that black people can be prejudiced, but not systemically racist against white people, then make that explicit and attribute the argument. This is weak writing that I wouldn't tolerate from undergrads (but then again, we know where Quillette stands on publishing academically lazy, poorly written articles by undergrads).
  6. Black Twitter users used the n-word 6.2 million times in one month, according to a 2015 study" --- This is a bait-and-switch using my work to justify something other than it says. As Christopher S. Hall and I have written extensively about: there is not just one n-word in African American English(note: NOT Christopher J. Hall, although I assume he's lovely).  Mac Donald, here, is attempting to justify using a racist slur in one dialect by saying a similar word exists in another dialect. It's like saying tabernak is not a swear word in Quebec French because "tabernacle" is totally mundane word in Quebec English. They're not pronounced the same, they refer to different things, and they're not used in the same linguistic or social contexts.
  7. "it is ubiquitous in urban vernacular and in rap music, with black entertainers like Jay Z, Beyoncé, and Kanye West tossing it off with impunity." --- Define "urban vernacular". More importantly, again, you're comparing apples to slurs. Also, Beyoncé? When?

You've cherry picked a line from my research that isn't actually applicable to your argument in the hopes that my academic reputation will somehow add a veneer of respectability to your weak reasoning. 

More broadly, the point is reasonable people generally don't have a problem with other reasonable people discussing a slur when it's clear that they are doing so with rigor and from a place of respect. When you disingenuously demand to know why "blacks" get to say "the n-word"  but you don't, it's clear you just want to say offensive shit. Then people (correctly) call you an asshole and tell you to stop. When they do things like protest your speaking engagements, or say mean things to you on Twitter, THAT'S NOT CENSORSHIP. That's other people also exercising their freedom of speech, and is a natural result of your exercise of your freedom of speech. You are playing the victim in an attempt to silence other people's free speech, because evidently you want to say "nigger" at people without repercussions.

The takeaway:

You can say "the n-word" and nobody can stop you. However, there will be social ramifications. That's how pretty much all of language works. The real question is why do you want to say it so badly, Heather?



©Taylor Jones 2018



A Malefactive in African American English

This is a quick post about something I've heard all my life in AAE speech communities but haven't seen discussed, well, really anywhere. 

Benefactives and Malefactives in English

A lot of languages can take a verb and mark whether it was done with kind or harmful intent toward someone else. In ('standard') English, the benefactive marker is a separate word that introduces the recipient, and that word is for. For example:

  • She baked a cake for me. (meaning either, she baked a cake with the intention that I eat it, or she baked a cake so I wouldn't have to).
  • He made a phone call for me. (meaning he made a phone call so I wouldn't have to, or on my behalf).

Other languages may mark this differently (for instance, Zulu adds the infix -el- just before the end of the verb). 

English also has a very limited malefactive marker: on. For instance:

  • She hung up on me
  • She walked out on me
  • He told on me

But you can't just use it with anything:

  • ??? She baked a cake on me

That said, some non-standard varieties allow for much more productive use of malefactive on. For instance, my (somewhat Southern) grammar lets me say it so long as the verb is prefaced with up and, as in:

  • She up and baked a cake on me (meaning: She surprised me by baking a cake, contrary to my expectations and possibly with some negative effect on me...but not physically on me in any sense).

An AAE only Malefactive

I've been thinking about this recently, and noticed something that's not grammatical in other varieties of English: to (tell a) lie on someone. Examples:

  • She told a lie on him
  • He would never tell a lie on her

I've asked a few people who use this, and they agree it's equivalent in meaning (but not in mood!) to telling a lie about someone, and doesn't mean to tell a lie to someone. 



©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What, really, is a word?

I just received the new issue of Language in the mail, and the first paper is a paper I'd read a draft of a while back, and loved. Not only for it's snazzy title ("The Lexicalist Hypothesis: Both Superfluous and Wrong"), but because it gets at the heart of a really interesting issue in linguistics.

The core argument Benjamin Breuning proposes is that if we assume there's such a thing as a "word" and that words are fundamentally different from "phrases" in the grammar (that is, the rules in the mind of a native speaker), then we end up with a lot of difficulty and more assumed grammatical structure that doesn't really give us much. Not only that, but we have to explain some weird observed behavior that this model doesn't predict. This contention, that wordhood is assumed by people who speak European languages, has been an issue for people working on agglutinating languages (like Zulu, Iroquois, etc.) fora while. The genius of his paper is, in part, that he uses all English examples.

So the argument is more-or-less that "word" is a phonological object, and not a grammatical/syntactic object. That is, we manipulate syntactic structure in our minds, but where we put the breaks in speech is really a property of sound and not the structure we are manipulating. Some examples should help clarify.

First, however, I want to point out that we already all know that words (in the traditional sense) themselves often have structure. So for instance, we can look at a word like:

  • unlockable

and know that it has three pieces that carry meaning, un-lock, and -able. One of those can stand on its own, the others can't. AND, interestingly, we can think of them as representing two different structures with two different meanings:

  • [un [ lock-able ] ]  == not able to be locked
  • [ [ un-lock] able ] == able to be unlocked

So what Bruening does is give a ton of examples where the adjective modifying a noun is really an entire phrase:

  • she gave me a don't-you-dare! look

He argues that if you have a model of syntax that assumes the existence of words as atomic pieces, and these words have categories (like "adjective," "noun", "verb") then you can't really account for don't-you-dare! looking both like an imperative and an adjective. 

The paper itself is a lot more complicated, since he gets into some really thorny issues in syntax that are probably not appropriate for a blog post, but the paper itself is delightful and its example sentences are great. 

One of my favorite things in reading linguistics literature is the special schadenfreude I get when reading someone point out that another linguist's example sentences are wrong, and it's even got that, where he points out the grammaticality (contra another linguist's analysis) of utterances like:

  • I have to go re-tuck in my kids.
  • he was re-sworn in as governor.

I have been thinking about this word/phrase distinction for a long time, but evidently not on the same level as Bruning. I have, however, been collecting examples of sentences like these for years, and now have a good reason to share them. I have generally put the phrase in brackets, and in some instances if there's an unsaid element (like "I would wear it" in example 2), I have left an underscore where we might expect another syntactic element. So without further ado:

  1. The one I had at tale was [I can't even handle it] sweet.
  2. It was totally an [I would wear____] style.
  3. You put your computer in the [my computer] spot.
  4. It's a really [hard to open] door.
  5. It's not entirely a [nigga, we made it] moment. (Childish Gambino in an interview)
  6. Did you swallow a [too big] piece?
  7. Sometimes when I cough it sound like it's a sickness cough but it's really a [my-lungs-aren't-ok] cough.
  8. It's always bad when it's *too* [too big].
  9. Please return for a [left behind] item. (over the intercom at JFK airport)
  10. Is 250 texts a good number, or a [not enough____] number?
  11. As a [[i don't have to be there for very long] ____] I don't really mind it.
  12. It was almost [knock you over] wind.
  13. I don't have a specific [it has to look like this] idea.
  14. I'm sure you'll be past the [a thousand] mark.
  15. It's a vacation house, it's not a [___ live there] house.
  16. It was a [my lungs are tight] kind of cough.
  17. I don't mind the [making my own lunches] part of it...
  18. I'll find, like, *old* [my hair], and be like, "how did this happen?"
  19. It's a writing desk, not a [leave a pile of books and papers on it] desk.
  20. Go see Ailey. It's [change your life] good. (Advertisement in the subway for Alvin Ailey Dance Theater)
  21. It was too [not enough time].
  22. Wow, this is a really [___ sink into it] couch!
  23. It's [if you're desperate you'll eat it] bread.
  24. Now we have a [thank you] reason to send that card.
  25. I need an overnight flight, not a [during the day] flight.
  26. It's stupid. I wore a pair of boots on a [slightly too warm] day and it gave me a rash.
  27. That's the [be careful because if you sit on it wrong the chair might break] chair.

I really, thoroughly enjoyed this paper, which can be found [here].



©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

Know it all

Trump recently tweeted something that was linguistically interesting (shocking, right?). Tweeting about James Comey, he wrote:

"Comey knew it all, and much more!"

This is the kind of thing that would be starred as an infelicitious utterance in an intro to pragmatics. The reason is what we call "scalar implicature." If I said:

"Comey knew some of it."

That carries the implicature that he, well, knows some. I can cancel that implicature by stating:

"...in fact, he knew all of it!"

The same goes with everything less than allFor instance:

"Comey knew most if it. In fact, he knew all of it."

However, because all inherently means "everything" it makes no sense to say he knew all "and more".

It's the kind of thing you might see as an insult with negation:

"Comey knew nothing. In fact, he knew less than nothing."

But it still doesn't quite make sense in that sentence frame:

*"Comey knew none (of it), and much less!"

I'm not entirely sure what to make of this, since it's not an off the cuff remark that can be attributed to a speech production error or brainfart. Perhaps it's further evidence that people tend to "tweet how they speak."




©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

New Working Paper on Zulu published

I recently gave a talk on Zulu morphosyntax in which I (hopefully politely an respectfully) challenged some of the mainstream approaches to Zulu syntax. The working paper is now out, in the Proceedings of the Linguistic Society of America, available here (pdf download under "full text").

It's not a fun read for a layperson, but the general gist is that (1) a lot of previous syntax work doesn't pay enough attention to the phonology, (2) the justifications for arguing that the noun augment is really a determiner are a little shaky, and (3) if we just treat the 'linking vowel' as a determiner, everything is simpler. This has the unexpected outcome of also suggesting that Zulu has construct state, something known (and controversial) in Semitic languages, but not known to exist in Bantu languages. To paraphrase a colleague at Penn, I've reduced a seemingly unique thorny problem to an already known thorny problem, which is about as good as you can hope for in syntax.




©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What would Wakanda sound like?

Today, Marvel's Black Panther is released. The Black Panther, aka T'Challa (played by Chadwick Boseman), is the king of Wakanda, a fictional country in Africa (neighbored by other fictional countries like Azania and Narobia (but not Nambia). While I'm extremely excited for the movie (NO SPOILERS PLEASE), I don't have high hopes for a surprise fictional language in the movie, given the pre-film hype about the inspiration for design elements, costume, and even T'Challa's accent. In previous films, T'Challa's father was played by a Xhosa speaking actor, and it now seems that Xhosa being spoken in Wakanda is now Marvel Cinematic Universe head-canon.

Geographic improbability aside, I don't have a problem with this, as Chadwick Boseman does a great Xhosa accent --- far better than, say, Morgan Freeman in Invictus. But, given that Wakanda is supposedly 5,000km away from South Africa (where the non-Wakandan Xhosa people are), what would the languages of Wakanda sound like? This is just a short blog post to (shallowly) explore that question with some links for the interested.

Location, Location, Location

Wakanda is situated somewhere in East Africa, by either Lake Victoria, or Lake Turkana. That means it's somewhere around Uganda, Kenya, Rwanda, Ethiopia, and South Sudan. What's great about this is that it's an area where a lot of languages from different language families are spoken. So the five major ethnic groups in Wakanda could all potentially have their own very different languages.

What about the comics?

The character and country were created by Stan Lee and Jack Kirby in 1966. Both white guys, neither linguists. So there are a lot of elements of the Black Panther mythos that have names that sound, well, like what a white guy would make up to sound exotic and African (or look it on the page). That said, certain things are just part of the canon. So The kings have an (evidently) ejective /t'/ as the first part of their names. The all female fighting force, the Dora Milaje are called what they're called. Anyone contracted to construct languages for the MCU will have to work with the existing material, much like how Marc Oakrand developed Klingon by building around what was already uttered on-screen in Star Trek.  And, that will have an effect on the backstory and character development. To my knowledge, Ta-Nehisi Coates and other recent writers have not done a deep dive into the linguistic side of Wakanda, but we can't really expect Ta-Nehisi to solve everything for us.

What's spoken in that area?

As I mentioned above, that particular (vague) part of East Africa has representation from a few of the major families: Afro-Asiatic, Niger-Congo A, Niger-Congo B ("Bantu" languages), and Nilo-Saharan languages.

In Kenya, there is, of course, Swahili --- a Bantu language spoken by 50 to 100 million people and a lingua franca for the region. Swahili's huge number of speakers means you can hear it on internet radio if you want. It also means that it has lost lexical tone (when the pitch of a word or syllable changes the meaning), and because it's used for trade by so many people who speak so many other languages as their native language, it is relatively regular, meaning there's not a lot of unpredictable grammatical stuff.

But there's also a lot else spoken there. Kenya alone is home to 68 languages. The most prominent of which are Kikuyu, with 8 million speakers, and Dholuo, or Luo, with 4 to 5 million speakers.

The latter, Dholuo, is not a Bantu language, but a Nilo-Saharan language. What's the difference? The main difference is that all the Bantu languages group nouns into types (think gender in European languages, except there's 10-17 of them). Every noun has a prefix for its noun class, and the prefixes generally com in pairs (singular vs plural). So in Zulu (and Swahili!) the base form for the noun 'person' is ntu. But this doesn't just show up on its own. Rather, it has one of these noun class prefixes, as in :

  • umuntu 'person'
  • abantu 'people' (hence the name for the languages...they all call people some form of "bantu")
  • ubuntu 'humanity, humanness' (whence also the operating system).

So you can get phrases like umuntu ngumuntu ngabantu: "a person is a person through other people".

Bantu languages also generally have a LOT of sounds, but simple syllable types, almost always CV --- Consonant Vowel. You'll never see a word like English strengths. This is obscured by the writing a bit, so for instance, <ng> in Zulu is one sound, not two (the sound of <ng> in sing). Swahili also has syllabic nasals, so for instance, the <m> in mzungu 'white person' is it's own syllable: m-zu-ngu.

Back to Luo: Luo has vowel harmony, meaning all the vowels in a word have to share the same feature. What's the separating factor? How advanced your tongue root is. So words with the vowels in (an American pronunciation of) bean, bait, bot, boat, and boot, are one class, and words with the vowels in bin, bet, bat, bought, and foot are in another. A single word will not have vowels from both groups, only one.

Even cooler, Luo grammatically distinguishes between alienable and inalienable posession, so for instance, the word for a dog's bone has different forms depending on whether you mean the bone is part of the dog's skeleton, or a cow bone it's chewing on. If it can be taken away, it's got a suffix marking that fact.

Wakanda is also close to Ethiopia and South Sudan, where Afro-Asiatic languages are spoken. The most well-known subset of these are the Semitic languages, which include Arabic and Hebrew, but also languages Americans are often less familiar with, like Amharic, spoken in Ethiopia.

Amharic, like other Semitic languages, has what's called non-concatenative morphology, meaning that words aren't always built by adding prefixes, suffixes, or infixes, but are instead built with a system of (unpronouncable) roots that combine with vowels in between. The standard example linguists use is from Arabic (also spoken in that region), where k-t-b is always in things related to books and writing, but the vowels make it mean different things: kitaab 'book', kataba 'he wrote', kutib 'was written', etc. Amharic, like Swahili, has a massive number of speakers: roughly 22 million. It also has an objectively cool writing system.

Semitic languages like Amharic and Ge'ez are not the only Afro-Asiatic languages, though. To the south of Lake Victoria (so, somewhere sort of near Wakanda?) Iraqw, a Cushitic language, is spoken by approximately 460,000 people (because it's spoken by a much smaller number of people, the best video I could find was about porcine cysticerosis --- tapeworm in pigs).

And of course, we've established that Xhosa is MCU head canon (I really want to know the back story of how they first arrived in Wakanda, reversing the Bantu Migration, and how they rose to power!), which means that one could expect to hear clicks in Wakanda, too.

Wakanda Forever!

Given pre-release ticket sales alone, it seems like Hollywood has been sleeping on Black Panther's type of pan-African magic just the way the rest of the world has been sleeping on Wakanda's advanced technological civilization. If we're lucky, BP is going to be a smash hit with future films, TV series, Spinoffs...and maybe we'll get to hear the sounds of Wakanda just as we hear the sounds of Essos and Valyria, Middle Earth, and Qo'noS.

A great resource for the IPA

One of the best tools a linguist uses is the International Phonetic Alphabet, however learning it can feel daunting. I have historically referred students to the wikipedia page on the IPA, because it has links to individual pages for each sound, with descriptions of how the sound is produced, and audio recordings.

Now, there's another tool: an interactive IPA chart with a cross-sectional MRI so you can see the position of the tongue, lips, velum, etc. while a sound is being produced.

It's courtesy of the UCLA Speech Production and Articulation Knowledge Group, and can be found here.

One caveat: of the five available speakers, John Esling is the only one who pronounces the alveolar click /!/ correctly. Everything else seems to be great across all speakers.



Fun With Morphology!

Causative Smallening

Friends and family members have recently said some morphologically interesting things, and I decided to take a quick second to put them down here, for posterity, because they're so freaking cool.

The context for the first was manipulating images for a slideshow. The sentence used was:

I smallened it

Everyone clearly understood it as "I made it smaller," and also knew that it was non-standard. But why?

Well, some adjectives can be made into inchoative verbs. This means if you have some adjective X you can make a verb that means 'to become X'. It's super easy: just add an -en to the word:

  • darken: to become dark
  • redden: to become red
  • liven: to become more alive/lively
  • quicken: to become quick.
  • leaven: to raise (from an older word in English we no longer have, ultimately from Latin levare 'raise')
  • toughen: to become tough
  • smarten (up): to become smart

These can also then be made transitive and are then causative verbs, meaning someone causes something to become X.

The thing is, it's normally taken to apply only to what linguists call a "closed set" which is a fancy way of saying you can only do it to some adjectives and not others. That is, it sounds weird to say "dumben it" (instead of "dumb it down") or "absurden the story" or "spicen the food."

And yet, we all have the grammatical competence to be able to (playfully) generalize to new instances, so everyone knew what "I smallened it" meant.


When linguists get to the morphology segment of Intro to Linguistics, we teach "bracketing" as a tool for recognizing the internal structure of words. It's literally drawing brackets around word-pieces (let's call them morphemes). For example:

  • [ nation ]
  • [[ nation ] al ]
  • [ inter [[ nation ] al ]]

Some kinds of ambiguity are then easy to explain, as in, the door is:

  • [ un [ [ lock ] able] ]  == unable to be locked ~ un-lockable
  • [ [ un [ lock ] ] able ] == able to be unlocked ~ unlock-able

Similarly, we can bracket words that go together in sentences:

  • [ [that ridiculous man ] [ looks [ dumb ] ] ]

Sometimes, though, things break free. A classic example is the suffix -ish, which for many people now can modify much more than adjectives:

  • It was a yellowish color.
  • I guess I was excited about it, ish

All of that was to get to a family member recently saying:

There's no point in waiting to leave, it's not going to get any not dark er

That is, it's not going to get any [ [ not [ dark ] ] er ], where -er is modifying the complex structure not dark.

Often, linguists will treat these kinds of examples as mistakes, play, or somehow not part of the object of study (and make pronouncements like "inchoatives derived from adjectives are a closed set" and sometimes even claim that words like smallen are "impossible"). I think it's important that we take these kinds of novel forms --- forms that sometimes challenge theory we've learned in grad school --- seriously. In part, because if you start listening for them, they happen all. the. time.

Happy listening!



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Habitual hiring

I recently came across a couple of images of African American English use in hiring signs. I think they could be an excellent tool for teaching about AAE in Introduction to Linguistics, or Intro to Sociolinguistics, since

  1. neither is 'standard' English
  2. they have a difference in meaning
  3. The difference in meaning affects strategy for someone on a job hunt.

So without further ado, let's say you've been out hunting for work, and you're down to your last resume. Which of these two places do you take your last resume?


If you don't speak AAE and don't know about its system of tense and aspect (which is more complex than mainstream American English), you may think it's a toss-up between the two.

However, you'd be wrong.

  • we hiring features what's called copula deletion, which is common in many languages (including Russian, Arabic, and others). It means "we're hiring (right now)".
  • we be hiring makes use of habitual 'be' which is a grammatical marker of, well, habitualness. It means "we are usually/habitually/often hiring."

Therefore, if we're to trust the signs, you've got a better chance of being hired right now going to the first store.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Bare Subject Relatives and the Sophisticated Complexity of AAE

[Trigger warning: My focus here is on a syntactic phenomenon, but a video example I'll be focusing on includes the threat of police violence and a man hypothesizing about his death at the hands of the police. The man in question is alive and unharmed.]

I've been thinking a lot lately about the complexity and sophistication of AAE syntax. Much of the work and outreach around AAE in the last 50 years has been trying to demonstrate that AAE is neither deficient nor wrong. There's a big jump, however, between not wrong, which is the end goal for many linguists, and marvelously rich, which seemingly hasn't percolated through the field much beyond AAE specialists. Christopher Hall, a colleague and friend, often implores lay people to flip the script and think about language with the starting point that AAE is the default against which other dialects should be judged.

My focus here is a syntactic feature of some varieties of African American English that doesn't get as much attention, but that is surprisingly common (especially in the South) is referred to as null subject relatives, or bare subject relative (clauses).

A recent, salient example can be found in this video a motorist took of himself chastising a cop for approaching his car with his (the officer's) service weapon drawn, about 15 seconds from the end. It probably goes without saying, but the video may be triggering for some.

[Link to video here]


The subtitles say "Dad shot dead by a cop who made a mistake," however this is yet another case of reporters "translating" AAE. The gentleman actually said:

"Dad shot dead by a cop made a mistake".

What's going on here?

Well, first, a relative clause is like a little mini sentence or sentence fragment that adds more information about a part of the main sentence. For instance:

  • That is the man [who I saw yesterday]
  • That is the man [who saw me yesterday]
  • The book [that I recommended to you] is on sale now.

In most varieties of English, you can delete -- that is, not say -- the relative marker (who, which, that), if it refers to the object of the relative clause.

To take the first example, the man who I saw yesterday, we can rework the relative clause as meaning something like "I saw him yesterday." In fact, many varieties of English make use of such resumptive pronouns, so it would be perfectly natural to say "That's the man who I saw him yesterday." And unsurprisingly, this kind of things is cross-linguistically common, and in some languages it's obligatory.

So if it's:

  • I (subject) saw him (object)

Most varieties of English allow you to do away with the relative marker:

  • That's the man who I saw yesterday
  • That's the man ___ I saw yesterday

AAE is interesting in that it also allows deletion of the relativizer if it marks the subject.

  • That's the man who saw me yesterday.
  • That's the man ___ saw me yesterday.

This is pretty well described in the literature, so for instance, Stefan Martin and Walt Wolfram have a chapter in Salikoko Mufwene's book African American English: Structure, History, and Use that gives a ton of excellent examples:

  • He the ___ man got all the old records
  • Wally the teacher ___ wanna retire next year
  • Jill like the man ___ met her brother last week

The above example in the video was particularly interesting because syntactic structure of the full utterance is extremely complex.

There's a pernicious and widespread view that AAE, or "ebonics" is somehow inferior or defective. It's widely regarded as both "simpler" than "standard" English, and simpler in ways that are "broken" or "wrong." However, not only does it have more complex grammar in some respects, but AAE speakers deploy sophisticated combinations of syntactic structures even under extreme stress. The sentence the motorist in the above video uttered makes use of:

  1. An "imposter" construction in which the speaker is understood to mean himself when using a name/title ("Daddy") instead of a first person pronoun ("I").
  2. Copula deletion ("Daddy shot" instead of "Daddy was shot"). This is very common cross linguistically, and is standard in Arabic, Chinese, Russian, etc.
  3. A resultative compliment to the verb ("shot dead")
  4. Passive voice --- with copula deletion --- which we understand because of the resultative. Compare "Daddy shot a gun" vs. "Daddy shot dead."
  5. A bare subject relative ("a cop ___ made a mistake").

This is a sophisticated interlocking clockwork of syntactic structures, produced under extreme stress. A tree diagram of this sentence would show all kinds of movement and deletion. And there's some evidence that people who speak other dialects do not have the complex grammatical knowledge to correctly parse this kind of utterance. And yet, people like this motorist are routinely treated as though their language is deficient.

It's a starting point for us linguists to point out that AAE is rule-governed and syntactically well-formed. However, I don't think this goes nearly far enough. "Technically not inferior" is a far cry from the truth: AAE is a varied, complex, sophisticated language variety that makes use of many complex grammatical rules that "standard" English lacks. AAE speakers are doing things other people don't understand, and not because the AAE speakers are wrong, but because they have a fuller syntactic toolbox.




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!


Lately, I've been noticing a particular phenomenon in speech, in which the word "particularly" is pronounced more or less as "partickerly" or "partickly."

It turns out I'm not the only one to notice this, as Mark Liberman has an excellent, and much more in-depth description of the phenomenon at Language Log, with a ton of excellent audio.



©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

A linguist's take on the Great GIF Controversy

The Conflict:

For years, the English-speaking internet has been divided. We cannot agree on how to pronounce gif, the acronym for graphics interchange format. Much as with the dress, each side thinks their own position is the only correct one, and that the other side is absolutely crazy. And much as with the dress, it's probably a little more complicated.

People write articles with titles like you are 100 percent wrong about how to pronounce gif. People share mocking gifs with arguments bolstering their point of view. People yell at one another. Things get entirely too heated.

I intend to shed some light on this situation.

The Options:

There are technically three ways you could pronounce gif in English, although the conflict is over the first two. The three are:

  1. so-called "hard g" which linguists represent with /g/. This is <g> as in "gift".
  2. so-called "soft g" which linguists represent with /d͡ʒ/. This is <g> as in "George." It is also sometimes represented with <j> as in "Jazz".
  3. The "French" or "super soft g", which linguists represent with /ʒ/. It is in (some) pronunciations of "rouge". (Note that some English speakers "nativize" words with this to have the /d/ sound in the "soft g", so what I call "baton rouge" they may call "baton roudge".

While I relish in ironically using the third option and watching people on both sides of the hard/soft g debate lose their minds, I recognize that nobody is going to take seriously the argument that "French g" is correct.

The Arguments:

Arguments for "hard g":

  1. It's an acronym, and the word the <g> comes from is one where it is pronounced "hard" (namely, "graphical").
  2. We often pronounce acronyms differently than we would pronounce a word spelled the same way (CIA is "see eye aye" and not "kia").
  3. Feelings. People have really strong feelings that this is the only correct way.

Arguments for "soft g":

  1.  Lots of words spelled with <gi> are pronounced with a "soft g": ginger, gin, giraffe, giant...
  2. It's easier to pronounce gif as a word and not as an acronym. Nobody is actually saying "gee eye eff". If you're going to make it a word, then make it a word!
  3. "Foreign" words often have a "soft g" (giraffe...).
  4. Feelings. People have really strong feelings this is the only correct way.

A dash of science:

I decided to take a look at this list of over 58,000 (relatively common) English words, and see what the patterns are for g-words.

There are 1836 words that start with <g> in this list, and there's not a clear rhyme or reason to the choice of "hard" versus "soft" g, so one would have to look at each of them to get a sense of the overall pattern. That's a pain in the ass. However, there is a helpful fun fact from linguistics that can constrain this problem a bit more:

"soft g" often comes from a combination of sounds, historically: a "hard g" followed by a non-low front vowel. What does that mean? That means that for the vowels /i/ "bead", /e/ "bade", /ɪ/ "bid", and /ɛ/ "bed", your tongue is actually higher in your mouth, and closer to the front of the mouth than it is for the vowels /u/ "booed", /o/ "bode", etc. The "hard g" sound is made by the back of the tongue forming a closure at the back of your mouth. These high front vowels tend to cause people to move their tongues slightly forward, and over time (we're talking hundreds of years) the sound changes to one made intentionally further forward. "Soft g" is created by a tongue closure further forward in your mouth than "hard g". Try saying words with them and pay attention to where your tongue is. (Try it! It's fun!)

This fact is part of why Italian spelling is so weird, for anyone who's tried to learn Italian.

All of that means I don't need to bother with words like "goof" because nobody is going to pronounce that with a "soft g."

So I chose to limit myself to words that start with <gi>. It turns out there are 102 of them, which meant I could simply read them and split them into "hard" and "soft". Of those, 30 are "soft" and almost all of this are of foreign origin.

30/102 (29.4%) of words that start with <gi> have a "soft g."

It's not entirely unreasonable then to thing that gif should perhaps be pronounced with a "soft g." People will argue There are more with a hard g, and that's true, but the same people will say that "soft g" is crazy, which is clearly not true.

BUT WAIT. What about words with <ge> you ask? I'm glad you asked. There were 223 of those. Of them, 197 were pronounced with a "soft g" (e.g., gene, gender, geriatric, geology, gelatinous...).


197/223 (88.3%) of words that start with <ge> have a "soft g."

This means that:

Of all of the words with <g> where it could be pronounced hard or soft, 227/325 (69.8%) are pronounced with a "soft g".

It's also worth noting that in the particular list I have, fully 38% of the words are <g> either <i> or <e> and then <n>. This is important, because many people have what is referred to as the PIN-PEN merger, meaning that <i> and <e> before <n> are pronounced the same. That means Jim and gem are both pronounced the same (namely, as Jim). This is a feature of Southern American English, pretty much the entirety of the West, most of Canadian English, and most of African American English. A LOT of people do this.

This means that even if they're limiting themselves to only words that are pronounced <gi>, there are 109 more words in this list that they believe are pronounced with the "ih" vowel than if they don't have the PIN-PEN merger.


For people with the PIN-PEN merger, 139/211 (65.8%) of <gi> words are pronounced with a "soft g."

The Takeaway:

Even if people are being completely rational about their decision about how to pronounce gif, it's informed by their dialect, and their personal pronunciations of other words. While it is rational to say "it's from graphics which has a 'hard g'" Nobody is saying "gee eye eff" (which coincidentally, has a "soft g"). While it's rational to say that foreign words are often nativized with a "soft g" (like giraffe), nobody says "gift" with a "soft g".

Finally, even if people are thinking statistically about it (even if it's sort of "fuzzy" math based on what they have heard in their life and not hard numbers), The conclusions they come to are dependent on their dialect, speech community, and vocabulary.

This is why I ironically go with the "French g": if you have strong feelings about the pronunciation of gif, no matter what they are, you're probably wrong. And if you're having the argument, it's because someone tried to share an image with you. Why not just be nice, instead of pedantically (and no matter what side you choose, wrongly) lecturing your acquaintances on how to say words?




©Taylor Jones 2017

Have a question or comment? Share your thoughts below!