African American English and Cross Dialect Comprehension

A while back, I wrote a handful of tweets in response to someone describing a linguist giving students a test on their comprehension of African American English. I explained that I am a linguist and part of what I study is cross-dialect comprehension between AAE and mainstream, “classroom” (white) English. Or really, the lack of comprehension on the part of the mainstream speakers. The tweet was seen by over 50,000 people (!) and a lot of people asked for DMs with more information about AAE. I figured it was easier to put some information all in one place here.

I’ve written elsewhere about what AAE is, and about borrowing and appropriation, especially those based on not quite understanding what is being borrowed, but here I want to dig a little more into whether and to what extent people who don’t speak AAE actually understand it.

I have a co-authored paper under review right now that I won’t discuss further here, that investigates to what extent court reporters understand and accurately transcribe AAE, which I will blog about once it’s published (spoilers: it’s bad out there). Below is a primer on AAE, a handful of things that are not understood by non-AAE-speakers, and some recommended readings.

A quick primer on AAE:

AAE is a dialect spoken primarily but not exclusively by black Americans, and is the language associated primarily with the descendants of slaves in the American South. It is a systematic, rule-governed, logical, fully-formed language variety, and it differs significantly from other varieties of English, across all levels of the language (that is, the phonology, or sound system, is different, it has different grammatical rules, etc.). It is important to note that AAE has different grammatical rules than standard English, and not that it has no grammatical rules. Therefore, it is absolutely possible to speak it wrong — something white people who are ignorant of the rules do often when imitating black people who speak AAE.

The accent of AAE is different from white accents, and because of segregation, people in the same city often have very different accents depending on race. Take Chicago for instance. The stereotypical white Chicago accent exhibits what’s called the Northern Cities Vowel Shift, which SNL made fun of with their sketch about “da bears.” But that’s not the only Chicago accent. Think about it: does Kanye West sound like that?

It’s actually not fair to say the accent of AAE, since there’s regional differences (Michael B. Jordan (Philly) sounds nothing like Ryan Coogler (Bay area)). In fact, my dissertation research is on regional variation in AAE accents (if you identify as black and grew up in the US, please think about participating in my anonymous survey — it takes 3-4 minutes and can be found here: www.languagejones.com/aaes).

The grammar:

When I talk about cross-dialect comprehension, different accents definitely play a part, but so does very different grammar. There’s not much research on how well non-AAE speakers understand or don’t understand AAE, but what there is does not look good.

Labov 1972 found that white teachers in Harlem did not understand habitual be or stressed been. When given the secnario “you ask a child if he did his homework, and he replies ‘I been did my homework’” most incorrectly interpreted that to mean the child had not completed their homework. (see #2 below) Similarly, Rickford 1975 mentions an informal survey in which white participants took “they been got married” to mean a number of different, all wrong things.

Arthur Spears coined the term “camouflage construction” for constructions in AAE that look like they mean something in standard English, but really mean something else. He did this initially when describing “indignant come”, which is a marker of indignation, not a verb of motion. John Rickford and a few of his students did work on the use of had in preterite, not perfective, constructions. Christopher Hall and I have written on first person use of a nigga, and have a paper under review right now dealing with more than 10 different uses of “the n-word” in AAE that are distinct from those available to speakers of other dialects. I’ve written about “talkin’ ‘bout'“ as a verb of quotation.

But beyond a handful of papers on individual morphosyntactic features of AAE, there’s not really any research on how well other people actually understand it. We know they don’t always understand habitual be, but not at what rate they do or don’t. Same for a ton of other features. The court reporter paper I mentioned above is, to my knowledge, the first quantitative test of cross-dialect comprehension for almost all of the features mentioned in it.

What is unique to AAE? What is not understood by others?

Keeping in mind that there’s not much quantitative research on this, I can at least point to a handful of differences between AAE and other language varieties that lead to confusion or miscomprehension. Here’s a partial list:

  1. Habitual be: he be workin’ does not mean “he is at work” or “he is working.” It means he works, usually or often. In fact, a sentence like this can imply he’s not currently at work. I wrote a short post about it here, comparing hiring ads for fast food restaurants. This is one of the earliest features that sociolinguists focused on. Bill Labov, Walt Wolfram, and John Rickford, as well as many, many others have written about this.

  2. stressed been: This refers to actions completed in the distant past. So I been did my homework means I finished it a long time ago. I been told you that means I told you a long time ago. They been got married means they got married a long time ago, and still are. It does not mean the same thing as standard English “have been” as in I have been doing my homework — which implies I didn’t finish yet. John Rickford has written extensively about this.

  3. Preterite had: This is use of “had” for past events, but not to situate them before others. I had went to the store means the same thing as “I went to the store”, although it may have a different function in terms of emotion in a narrative. John Rickford has written extensively about this.

  4. Quotative “talkin’ ‘bout”: This is “talkin’ ‘bout” used the same way white people use “like” as in “he was all like ‘oh my god’”. It’s often used with indignant come, and often used in a mocking context. I wrote a paper about it available here. It’s also touched on in Arthur Spears’ work on indignant come, and in Patricia Cukor-Avila’s work on verbs of quotation.

  5. First person a nigga: this is where a nigga means the same thing as “me” or “I”. I have blogged about it here, I have a paper in conference proceedings about it here, and Christopher S. Hall and I have a paper about it (and other n-words) under review right now.

  6. Negative Auxiliary Inversion: This is don’t nobody never instead of “nobody (n)ever does”. Interestingly, there’s some evidence that without context, people who don’t speak AAE interpret these as commands. Lisa Green has written about the grammar of this construction.

  7. Question Inversion in subordinate clauses: instead of “I was wondering whether you did it,” you may hear I was wondering did you do it. Lisa Green has written about this. There’s some evidence that it’s below the level of consciousness even for middle class speakers of what Arthur Spears calls AASE (African American Standard English).

  8. The associative plural nem (an’ them"): to my knowledge, there’s only one sentence on this in the sociolinguistics literature, in a book chapter written by Salikoko Mufwene (in African American English: Structure, History, and Use). This functions the same as associative plurals in other languages (like Zulu). Saying Malik nem (or “Malik an’ ‘em") means “Malik and the people associated with him” and from context it’s clear who that means. Could be family, could be friends, could be the people he’s sitting with right now. I have an aunt (it the African American family-by-choice-not-blood kind of way) named M., and stay asking about M nem.

  9. Stay for regular or repeated action: He stay acting stupid does not mean “he’s still acting stupid” or “he remains acting stupid” but rather, he consistently, repeatedly acts stupid.

  10. It instead of there: it’s a lot of people means “There are a lot of people”…

  11. Deletion of the subject relative pronoun: Standard English can delete “who” when referring to a person in a subordinate clause only if the person is the direct object (“That’s the man who I saw yesterday” or “Thats the man I saw yesterday”). AAE can delete the subject version (That’s the man saw me yesterday). I recently heard 10 and 11 combined, on the radio: It’s a lot of people don’t go there (meaning, there are a lot of people who don’t go there).

  12. finna and tryna as immediate future markers: There’s one conference paper written by an undergrad (who I think didn’t continue to grad school in linguistics) about tryna as marking intent or immediate future action. There’s an entire court case where the appeal decision hinged on whether finna was a word and what it means. Both can be used to mean you’re about to do something.

  13. be done: White folks often know done as in “he done hit him!” but don’t know be done as in “I be done gone to bed when he be getting off work” meaning “I’ve usually already gone to bed when he is getting off work”. There’s also the be done familiar from the crows in Dumbo: I’ll be done seen most everything when I seen an elephant fly, which is a slightly different construction.

  14. Set expressions, idioms, clichés: Things like it be that way sometimes, or what had happened was are not always understood, or even recognized as set expressions.

There plenty of others, but these are the main ones (in my opinion). And of course, these can all combine with each other in longer sentences (“it be a lot of people talkin’ ‘bout ‘why she always be hanging out with Malik nem?’”). Combine that with a completely different accent, even (especially?) in the same city, and you have a recipe for total miscomprehension.

The interesting thing for me, though, is that from both personal anecdotal experience and some limited research, it appears that people who don’t speak AAE, especially white folks, generally assume (1) black folks are speaking “broken” English, and (2) that they understand it even when they don’t. So people will hear I been told you that and assume it means “I have been telling you that” and that the speaker just…said that wrong. Both sentence structures exist in AAE, and they mean different things. But only one exists in “classroom” English.

Some good readings:

There’s not a lot of material aimed at regular people instead of linguists, however, I highly recommend a few books:

  • Spoken Soul (Rickford and Rickford)

  • African American English: A Linguistic Introduction (Lisa Green)

  • Language and the Inner City (William Labov — this one is from 1972, at the beginning of AAE being taken seriously as an object of study).

  • African American English: Structure, History, and Use (ed. Salikoko Mufwene)

  • The Oxford Handbook of African American Language (ed. Sonja Lanehart. This one is massive and new, but a lot of it is very technical).

-----

 

©Taylor Jones 2018

"Also, dude, 'Chinaman' is not the preferred nomenclature": Game Theory and the Euphemism Treadmill

A few years ago, I had a good talk with NPR's Gene Demby about why we have so many terms for people of color, and what linguists refer to as the euphemism treadmill. (He ended up writing this article). I've been thinking about this topic off and on since then. As with so much of language, for terms for other people the choice of word a speaker uses signals something to listeners about the person speaking. 

What this means, is that we can think of the use of terms for groups of people as a signaling game, and we can situate it within the kind of discussion of strategic thinking that happens in Game Theory. What makes this instance particularly interesting is that it’s a coordination game on a massive scale. Basically, we’re all sometimes confronted with having a choice of words, and that choice of words may tell other people something about us (whether we want it to or not!). That, in turn, may affect how they react to us or treat us. Therefore, when confronted with such a choice, say between “colored” and “African American”, we have no choice but to strategize about what word we choose. (Technically, this is not strictly true, as we could just say whatever, but this is generally a losing strategy —- one that results in people thinking less of you and adjusting how they treat you. However, as with everything linguistic, it’s complicated; some people are willing to make an allowance for, say, the elderly, as when older relatives of mine asked after my groomsmen, and wanted to know how “the Canadian”, “the negro”, and “the oriental” were doing, and followed up with “send them [our] love.”)

It turns out these kinds of patterns look very similar to those from evolutionary models originally designed to capture gene flow, and predator-prey dynamics, with some minor tweaks. I won’t get into the math here, since it’s more complicated than I’m willing to dive into in a blog post, but the general idea is that there are a few factors that can all affect the euphemism treadmill. First is random drift — the creation of a new word or repurposing of an existing word can be thought of as analogous to mutation. Sometimes these forms disappear, and sometimes they completely take over; in the long run, in strict competition, it’s one or the other. Second is a predator-prey dynamic: we can think of these forms as being in competition for the same ecological niche, or we can think of the new form as literally preying on the old one(s). The second metaphor isn’t perfect, but it captures something about the ecology of word use.

 

On a very large scale, we can think of this as a coordination game. If “bad” people say “oriental” then when we hear “oriental” we can’t strictly determine that the speaker isn’t bad. So we go out of our way as speakers to signal that we’re “good” by picking a different word. However, in large populations, in the long run, these kinds of signaling games have interesting properties for a few reasons. First, we tend to be lazy, so if we can get away with saying “oriental” and it’s easy for us (say, it’s what we’ve always said), then we’ll do that. If we think there’s no cost, we go with the easy option. Second, sometimes bad people don’t want us to know they’re bad people. So if they know that everyone else assumes one is bad if one says “oriental” when referring to a person, they may hop on the new word bandwagon to avoid being “outed” as a bad person!

Eventually, two related things happen. First, people who use the new words just because it’s what everyone else expects of “good” people sometimes give themselves away, and the words they use may become associated with people who have their views. It doesn’t matter if I avoid saying “oriental” if I instead say “those damn Asian-Americans are ruining our cities” or some such nonsense. Second, things that people think of negatively are still associated with the new words to describe them, so eventually euphemisms become taboo themselves (see, for example, “toilet”). So for two related reasons, as a euphemism or a new term gains traction, especially if it becomes the main word people use, it leads to the need for another new word to separate out who means what. If I have negative beliefs about black people, but want to be thought of in a positive light by others, I might say “African American” … but it will be abundantly clear that I still mean something vaguely negative by it. If you want to signal virtue [sidenote: “virtue signalling” was a term from evolutionary game theory that has now been adopted by some regressive cranks, and is now slowly becoming something I had to think twice about using here…because of what it now signals by association with racist and sexist groups], you have to find a new way to differentiate yourself.

And so the wheel turns again.

For each of the above charts, we can generally think of social movements that relate to the terms used — for instance, the shift from “negro” to “black” coincides with the Black Power movement — but this doesn’t invalidate the game theory approach here. Rather, knowing about these social movements adds to our understanding just how this kind of massive signaling game plays out in society, and the kind of real world repercussions involved. Obviously, there’s a lot more involved here, and the above is a gross simplification that just scrapes the surface of the kinds of strategies involved, but I find it fascinating that the models developed to better understand gene flow and animal competition do a pretty good job of also capturing how words change in society in the long run. When I’m deciding whether to say “black” or “African American” based on what I think I know about the listener (or reader), millions of other people are making the same strategic calculations every day, and our individual decision-making (and the fallout from our decisions) in part drives massive social changes on a much larger time scale. It’s related to both how signaling strategy plays out in large groups in general, and to how other kinds of words change.

-----

 

©Taylor Jones 2018

My work was just cited by a crank, here's a response

I recently came across an article written for Quillette by Heather Mac Donald which uses a research paper of mine published in American Speech in 2015 to defend a frankly stupid position. The article was shared by Stephen Pinker, which means increased visibility, so naturally I want to make sure the record is straight as far as concerns my research. The position she uses my work to justify is a position I disagree with not on political grounds, but on empirical grounds. I'm going to contextualize all of this for those unfamiliar with the players involved, before adding my response. [Note, this post uses a racial slur, a sex/gender slur, and some colorful Quebecois in citation form.]

Some Context:

Quillette

Quillette is a for-profit 'safe space' from 'political correctness' and 'leftist bias' created by a grad school dropout (one who, in interviews, claims explicitly that she is "actually trained as a psychologist" despite not, you know, actually having finished her training). You may have come across it recently when they published an article written by an undergrad that purported to challenge Ta-Nehisi Coates' work (among others). 

This article serves as a pretty good explanation of what Quillette is, and what it's trying to be. (Highlights include: "Quillette makes tired alt-right talking points sound erudite", and "Instead of writing off the academic left — and, generally speaking, women and people of color — as crybabies or social justice warriors, Quillette’s writers use the classical liberal tradition of 'mature debate' to dismiss marginalized voices".)

Heather Mac Donald

The author of this particular piece, Heather Mac Donald, is most notable for authoring such works as The War on Cops, The Illegal Alien Crime Wave, In Defense of Fascism and The Diversity Delusion. (Ok, one of those is fake, but the other three are real). I think her works speak for themselves.

Pinker

Steven Pinker is a well-known cognitive psychologist who does some work in linguistics, and who has been relatively influential. He also has gone off the rails on Twitter lately, so, for instance, in tweeting the link to this Quillette article, he complains about "PC/SJW." That is, Political Correctness (which is, more or less, trying not to intentionally say mean, hurtful, or offensive things by thinking about your choice of words before speaking) and Social Justice Warriors. I'm not 100% clear on what's wrong with social justice, but it's clear from use that SJW is intended as derisive, and directed toward people who --- I don't know. Want equality? Anyway, the point is Pinker is well known and is amplifying Quillette's signal, using in-group signals for the alt-right (whether intentionally or not). In this case, it's the writings of a woman who believes that "phantom police racism" is a cover to keep people from discussing the "uncomfortable problem" of "black on black crime". One who then cites my research out of context, evidently to defend her desire to say nigger (no, really, this is not an embellishment; see below). 

The Quillette Article

I will reiterate that Quillette is for profit, so keep that in mind when deciding to click through. The article in question can be found here, if you, dear reader, wish to read it for yourself (perhaps use an ad blocker?). 

The article is ostensibly a defense of the poet Anders Carlson-Wee, who was the subject of a minor online tiff last week, after The Nation published a poem of his written in an approximation of African American English. John McWhorter, with whom I do not always agree, wrote an excellent, thoughtful piece in defense of Carlson-Wee, which can be found here

Heather Mac Donald, however, has taken the controversy as a jumping-off point to dive into her feelings. In this case, her feelings about censorship. My goal here is not to catalogue all the things wrong with her article, as I simply don't have the time to do so, and others have done so better (especially with regards to her bizarre reading of Plato). I do want to touch, however, on a few points. 

First, she refers to African American English as "black street dialect". I object to this not on "SJW" grounds (that is, that it is clearly offensive dogwhistle: what is the function of "street" in this description? It's not location; it's judgment. Is whatever Heather speaks only spoken indoors?), but rather I object to it on scientific grounds. There is a wealth of literature on the speech of African Americans going back at least 60 years, and that is simply not the term used by anyone who knows even the slightest bit about the subject. You may have feelings about AAE versus AAL versus AAVE, but if you're discussing a language variety it would behoove you to use really any of the actual names for it. It would be like me discussing "Iranian town dialect" instead of Persian/Farsi. I just look dumb and unecessarily prejudiced.

Second, she argues strongly that there is some boogeyman mob that will ruin your life if you ever mention a taboo word, in citation form or otherwise. As a linguist who researches and says taboo words, this is total nonsense. People are generally extremely good at, well, context. I am a cis/het white man, and part of my job is to discuss taboo words publicly. And you know what? No ill has come of it yet, because I do so in (1) appropriate contexts, (2) with academic rigor, and (3) with respect for both the communities that hold those taboos and respect for the people described by those words (when those words describe people). 

It's the third point that's going to take a little work. The paragraph that cites my work, is, well, absurd. In that paragraph, Mac Donald writes a lot of garbage: 

"The elaborate rituals around the ‘n-word’ evince the same double standard regarding authorial intention. According to existing conventions, whites may never use the full word without elision, even if they are doing so not to refer to anyone but as reported speech. Its mere presence in the mouth of a white person launches a nuclear bomb against blacks; the transgressor will be punished accordingly, as the founder of Papa John’s pizza discovered after using the full word as an embedded quote from chicken impresario Colonel Sanders. Blacks, however, can use the word in toto to refer to actual people, because their intentions matter and it is assumed that blacks are incapable of racist intent. Black Twitter users used the n-word 6.2 million times in one month, according to a 2015 study; it is ubiquitous in urban vernacular and in rap music, with black entertainers like Jay Z, Beyoncé, and Kanye West tossing it off with impunity."

Let's unpack this.

  1. "According to existing conventions..."  --- What conventions? In what contexts? This has the appearance of social science without any of the social science. 
  2. "whites may never use the full word without elision, even if they are doing so not to refer to anyone but as reported speech."  --- This is untrue, but as I've written elsewhere, not a bad rule of thumb if you want to avoid pissing people off. 
  3. "Its mere presence in the mouth of a white person launches a nuclear bomb against blacks; the transgressor will be punished accordingly, as the founder of Papa John’s pizza discovered after using the full word as an embedded quote from chicken impresario Colonel Sanders" --- This is patently, obviously untrue, and just wildly hyperbolic. A nuclear bomb? As I've written elsewhere, there's context for when it is possible to say the n-word (and that's a separate question from whether you should say it). ALSO, it's important to note that while the founder of Papa John's did use "nigger" in citation form, he did so while complaining about how he can't say it, but someone else got away with it! That's like getting mad that people call you misogynist when you complain that 'feminazis' are preventing you from calling women 'bitch'. It's just a sneaky way of trying to say it anyway. It's like me saying "Why can't I tell everyone that 'Heather Mac Donald is an idiot.'?" Just because it's embedded doesn't mean it loses its force, right Heather? (this was actually the subject of an academic talk at this year's Annual Meeting of the Linguistic Society of America).
  4. "Blacks,"  --- really? Listen. You can call people that, but I'm pretty sure it pisses of black people to be called "blacks" just as much as it pisses off white people to be called "whites". In general, taking an adjective and then using it as a noun for a group of people you think it describes is not well received. This is basic stuff here.
  5. "Blacks, however, can use the word in toto to refer to actual people, because their intentions matter and it is assumed that blacks are incapable of racist intent. " --- Justify this statement. According to whom? Under what circumstances? This is the sentence before citing my work, and the implication is that my work in some way justifies this stupid statement. If you want to draw on arguments that prejudice and racism are different and that black people can be prejudiced, but not systemically racist against white people, then make that explicit and attribute the argument. This is weak writing that I wouldn't tolerate from undergrads (but then again, we know where Quillette stands on publishing academically lazy, poorly written articles by undergrads).
  6. Black Twitter users used the n-word 6.2 million times in one month, according to a 2015 study" --- This is a bait-and-switch using my work to justify something other than it says. As Christopher S. Hall and I have written extensively about: there is not just one n-word in African American English(note: NOT Christopher J. Hall, although I assume he's lovely).  Mac Donald, here, is attempting to justify using a racist slur in one dialect by saying a similar word exists in another dialect. It's like saying tabernak is not a swear word in Quebec French because "tabernacle" is totally mundane word in Quebec English. They're not pronounced the same, they refer to different things, and they're not used in the same linguistic or social contexts.
  7. "it is ubiquitous in urban vernacular and in rap music, with black entertainers like Jay Z, Beyoncé, and Kanye West tossing it off with impunity." --- Define "urban vernacular". More importantly, again, you're comparing apples to slurs. Also, Beyoncé? When?

You've cherry picked a line from my research that isn't actually applicable to your argument in the hopes that my academic reputation will somehow add a veneer of respectability to your weak reasoning. 

More broadly, the point is reasonable people generally don't have a problem with other reasonable people discussing a slur when it's clear that they are doing so with rigor and from a place of respect. When you disingenuously demand to know why "blacks" get to say "the n-word"  but you don't, it's clear you just want to say offensive shit. Then people (correctly) call you an asshole and tell you to stop. When they do things like protest your speaking engagements, or say mean things to you on Twitter, THAT'S NOT CENSORSHIP. That's other people also exercising their freedom of speech, and is a natural result of your exercise of your freedom of speech. You are playing the victim in an attempt to silence other people's free speech, because evidently you want to say "nigger" at people without repercussions.

The takeaway:

You can say "the n-word" and nobody can stop you. However, there will be social ramifications. That's how pretty much all of language works. The real question is why do you want to say it so badly, Heather?

-----

 

©Taylor Jones 2018

 

A Malefactive in African American English

This is a quick post about something I've heard all my life in AAE speech communities but haven't seen discussed, well, really anywhere. 

Benefactives and Malefactives in English

A lot of languages can take a verb and mark whether it was done with kind or harmful intent toward someone else. In ('standard') English, the benefactive marker is a separate word that introduces the recipient, and that word is for. For example:

  • She baked a cake for me. (meaning either, she baked a cake with the intention that I eat it, or she baked a cake so I wouldn't have to).
  • He made a phone call for me. (meaning he made a phone call so I wouldn't have to, or on my behalf).

Other languages may mark this differently (for instance, Zulu adds the infix -el- just before the end of the verb). 

English also has a very limited malefactive marker: on. For instance:

  • She hung up on me
  • She walked out on me
  • He told on me

But you can't just use it with anything:

  • ??? She baked a cake on me

That said, some non-standard varieties allow for much more productive use of malefactive on. For instance, my (somewhat Southern) grammar lets me say it so long as the verb is prefaced with up and, as in:

  • She up and baked a cake on me (meaning: She surprised me by baking a cake, contrary to my expectations and possibly with some negative effect on me...but not physically on me in any sense).

An AAE only Malefactive

I've been thinking about this recently, and noticed something that's not grammatical in other varieties of English: to (tell a) lie on someone. Examples:

  • She told a lie on him
  • He would never tell a lie on her

I've asked a few people who use this, and they agree it's equivalent in meaning (but not in mood!) to telling a lie about someone, and doesn't mean to tell a lie to someone. 

-----

 

©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What, really, is a word?

I just received the new issue of Language in the mail, and the first paper is a paper I'd read a draft of a while back, and loved. Not only for it's snazzy title ("The Lexicalist Hypothesis: Both Superfluous and Wrong"), but because it gets at the heart of a really interesting issue in linguistics.

The core argument Benjamin Breuning proposes is that if we assume there's such a thing as a "word" and that words are fundamentally different from "phrases" in the grammar (that is, the rules in the mind of a native speaker), then we end up with a lot of difficulty and more assumed grammatical structure that doesn't really give us much. Not only that, but we have to explain some weird observed behavior that this model doesn't predict. This contention, that wordhood is assumed by people who speak European languages, has been an issue for people working on agglutinating languages (like Zulu, Iroquois, etc.) fora while. The genius of his paper is, in part, that he uses all English examples.

So the argument is more-or-less that "word" is a phonological object, and not a grammatical/syntactic object. That is, we manipulate syntactic structure in our minds, but where we put the breaks in speech is really a property of sound and not the structure we are manipulating. Some examples should help clarify.

First, however, I want to point out that we already all know that words (in the traditional sense) themselves often have structure. So for instance, we can look at a word like:

  • unlockable

and know that it has three pieces that carry meaning, un-lock, and -able. One of those can stand on its own, the others can't. AND, interestingly, we can think of them as representing two different structures with two different meanings:

  • [un [ lock-able ] ]  == not able to be locked
  • [ [ un-lock] able ] == able to be unlocked

So what Bruening does is give a ton of examples where the adjective modifying a noun is really an entire phrase:

  • she gave me a don't-you-dare! look

He argues that if you have a model of syntax that assumes the existence of words as atomic pieces, and these words have categories (like "adjective," "noun", "verb") then you can't really account for don't-you-dare! looking both like an imperative and an adjective. 

The paper itself is a lot more complicated, since he gets into some really thorny issues in syntax that are probably not appropriate for a blog post, but the paper itself is delightful and its example sentences are great. 

One of my favorite things in reading linguistics literature is the special schadenfreude I get when reading someone point out that another linguist's example sentences are wrong, and it's even got that, where he points out the grammaticality (contra another linguist's analysis) of utterances like:

  • I have to go re-tuck in my kids.
  • he was re-sworn in as governor.

I have been thinking about this word/phrase distinction for a long time, but evidently not on the same level as Bruning. I have, however, been collecting examples of sentences like these for years, and now have a good reason to share them. I have generally put the phrase in brackets, and in some instances if there's an unsaid element (like "I would wear it" in example 2), I have left an underscore where we might expect another syntactic element. So without further ado:

  1. The one I had at tale was [I can't even handle it] sweet.
  2. It was totally an [I would wear____] style.
  3. You put your computer in the [my computer] spot.
  4. It's a really [hard to open] door.
  5. It's not entirely a [nigga, we made it] moment. (Childish Gambino in an interview)
  6. Did you swallow a [too big] piece?
  7. Sometimes when I cough it sound like it's a sickness cough but it's really a [my-lungs-aren't-ok] cough.
  8. It's always bad when it's *too* [too big].
  9. Please return for a [left behind] item. (over the intercom at JFK airport)
  10. Is 250 texts a good number, or a [not enough____] number?
  11. As a [[i don't have to be there for very long] ____] I don't really mind it.
  12. It was almost [knock you over] wind.
  13. I don't have a specific [it has to look like this] idea.
  14. I'm sure you'll be past the [a thousand] mark.
  15. It's a vacation house, it's not a [___ live there] house.
  16. It was a [my lungs are tight] kind of cough.
  17. I don't mind the [making my own lunches] part of it...
  18. I'll find, like, *old* [my hair], and be like, "how did this happen?"
  19. It's a writing desk, not a [leave a pile of books and papers on it] desk.
  20. Go see Ailey. It's [change your life] good. (Advertisement in the subway for Alvin Ailey Dance Theater)
  21. It was too [not enough time].
  22. Wow, this is a really [___ sink into it] couch!
  23. It's [if you're desperate you'll eat it] bread.
  24. Now we have a [thank you] reason to send that card.
  25. I need an overnight flight, not a [during the day] flight.
  26. It's stupid. I wore a pair of boots on a [slightly too warm] day and it gave me a rash.
  27. That's the [be careful because if you sit on it wrong the chair might break] chair.

I really, thoroughly enjoyed this paper, which can be found [here].

-----

 

©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

Know it all

Trump recently tweeted something that was linguistically interesting (shocking, right?). Tweeting about James Comey, he wrote:

"Comey knew it all, and much more!"

This is the kind of thing that would be starred as an infelicitious utterance in an intro to pragmatics. The reason is what we call "scalar implicature." If I said:

"Comey knew some of it."

That carries the implicature that he, well, knows some. I can cancel that implicature by stating:

"...in fact, he knew all of it!"

The same goes with everything less than allFor instance:

"Comey knew most if it. In fact, he knew all of it."

However, because all inherently means "everything" it makes no sense to say he knew all "and more".

It's the kind of thing you might see as an insult with negation:

"Comey knew nothing. In fact, he knew less than nothing."

But it still doesn't quite make sense in that sentence frame:

*"Comey knew none (of it), and much less!"

I'm not entirely sure what to make of this, since it's not an off the cuff remark that can be attributed to a speech production error or brainfart. Perhaps it's further evidence that people tend to "tweet how they speak."

 

-----

 

©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

New Working Paper on Zulu published

I recently gave a talk on Zulu morphosyntax in which I (hopefully politely an respectfully) challenged some of the mainstream approaches to Zulu syntax. The working paper is now out, in the Proceedings of the Linguistic Society of America, available here (pdf download under "full text").

It's not a fun read for a layperson, but the general gist is that (1) a lot of previous syntax work doesn't pay enough attention to the phonology, (2) the justifications for arguing that the noun augment is really a determiner are a little shaky, and (3) if we just treat the 'linking vowel' as a determiner, everything is simpler. This has the unexpected outcome of also suggesting that Zulu has construct state, something known (and controversial) in Semitic languages, but not known to exist in Bantu languages. To paraphrase a colleague at Penn, I've reduced a seemingly unique thorny problem to an already known thorny problem, which is about as good as you can hope for in syntax.

 

-----

 

©Taylor Jones 2018

Have a question or comment? Share your thoughts below!

What would Wakanda sound like?

Today, Marvel's Black Panther is released. The Black Panther, aka T'Challa (played by Chadwick Boseman), is the king of Wakanda, a fictional country in Africa (neighbored by other fictional countries like Azania and Narobia (but not Nambia). While I'm extremely excited for the movie (NO SPOILERS PLEASE), I don't have high hopes for a surprise fictional language in the movie, given the pre-film hype about the inspiration for design elements, costume, and even T'Challa's accent. In previous films, T'Challa's father was played by a Xhosa speaking actor, and it now seems that Xhosa being spoken in Wakanda is now Marvel Cinematic Universe head-canon.

Geographic improbability aside, I don't have a problem with this, as Chadwick Boseman does a great Xhosa accent --- far better than, say, Morgan Freeman in Invictus. But, given that Wakanda is supposedly 5,000km away from South Africa (where the non-Wakandan Xhosa people are), what would the languages of Wakanda sound like? This is just a short blog post to (shallowly) explore that question with some links for the interested.

Location, Location, Location

Wakanda is situated somewhere in East Africa, by either Lake Victoria, or Lake Turkana. That means it's somewhere around Uganda, Kenya, Rwanda, Ethiopia, and South Sudan. What's great about this is that it's an area where a lot of languages from different language families are spoken. So the five major ethnic groups in Wakanda could all potentially have their own very different languages.

What about the comics?

The character and country were created by Stan Lee and Jack Kirby in 1966. Both white guys, neither linguists. So there are a lot of elements of the Black Panther mythos that have names that sound, well, like what a white guy would make up to sound exotic and African (or look it on the page). That said, certain things are just part of the canon. So The kings have an (evidently) ejective /t'/ as the first part of their names. The all female fighting force, the Dora Milaje are called what they're called. Anyone contracted to construct languages for the MCU will have to work with the existing material, much like how Marc Oakrand developed Klingon by building around what was already uttered on-screen in Star Trek.  And, that will have an effect on the backstory and character development. To my knowledge, Ta-Nehisi Coates and other recent writers have not done a deep dive into the linguistic side of Wakanda, but we can't really expect Ta-Nehisi to solve everything for us.

What's spoken in that area?

As I mentioned above, that particular (vague) part of East Africa has representation from a few of the major families: Afro-Asiatic, Niger-Congo A, Niger-Congo B ("Bantu" languages), and Nilo-Saharan languages.

In Kenya, there is, of course, Swahili --- a Bantu language spoken by 50 to 100 million people and a lingua franca for the region. Swahili's huge number of speakers means you can hear it on internet radio if you want. It also means that it has lost lexical tone (when the pitch of a word or syllable changes the meaning), and because it's used for trade by so many people who speak so many other languages as their native language, it is relatively regular, meaning there's not a lot of unpredictable grammatical stuff.

But there's also a lot else spoken there. Kenya alone is home to 68 languages. The most prominent of which are Kikuyu, with 8 million speakers, and Dholuo, or Luo, with 4 to 5 million speakers.

The latter, Dholuo, is not a Bantu language, but a Nilo-Saharan language. What's the difference? The main difference is that all the Bantu languages group nouns into types (think gender in European languages, except there's 10-17 of them). Every noun has a prefix for its noun class, and the prefixes generally com in pairs (singular vs plural). So in Zulu (and Swahili!) the base form for the noun 'person' is ntu. But this doesn't just show up on its own. Rather, it has one of these noun class prefixes, as in :

  • umuntu 'person'
  • abantu 'people' (hence the name for the languages...they all call people some form of "bantu")
  • ubuntu 'humanity, humanness' (whence also the operating system).

So you can get phrases like umuntu ngumuntu ngabantu: "a person is a person through other people".

Bantu languages also generally have a LOT of sounds, but simple syllable types, almost always CV --- Consonant Vowel. You'll never see a word like English strengths. This is obscured by the writing a bit, so for instance, <ng> in Zulu is one sound, not two (the sound of <ng> in sing). Swahili also has syllabic nasals, so for instance, the <m> in mzungu 'white person' is it's own syllable: m-zu-ngu.

Back to Luo: Luo has vowel harmony, meaning all the vowels in a word have to share the same feature. What's the separating factor? How advanced your tongue root is. So words with the vowels in (an American pronunciation of) bean, bait, bot, boat, and boot, are one class, and words with the vowels in bin, bet, bat, bought, and foot are in another. A single word will not have vowels from both groups, only one.

Even cooler, Luo grammatically distinguishes between alienable and inalienable posession, so for instance, the word for a dog's bone has different forms depending on whether you mean the bone is part of the dog's skeleton, or a cow bone it's chewing on. If it can be taken away, it's got a suffix marking that fact.

Wakanda is also close to Ethiopia and South Sudan, where Afro-Asiatic languages are spoken. The most well-known subset of these are the Semitic languages, which include Arabic and Hebrew, but also languages Americans are often less familiar with, like Amharic, spoken in Ethiopia.

Amharic, like other Semitic languages, has what's called non-concatenative morphology, meaning that words aren't always built by adding prefixes, suffixes, or infixes, but are instead built with a system of (unpronouncable) roots that combine with vowels in between. The standard example linguists use is from Arabic (also spoken in that region), where k-t-b is always in things related to books and writing, but the vowels make it mean different things: kitaab 'book', kataba 'he wrote', kutib 'was written', etc. Amharic, like Swahili, has a massive number of speakers: roughly 22 million. It also has an objectively cool writing system.

Semitic languages like Amharic and Ge'ez are not the only Afro-Asiatic languages, though. To the south of Lake Victoria (so, somewhere sort of near Wakanda?) Iraqw, a Cushitic language, is spoken by approximately 460,000 people (because it's spoken by a much smaller number of people, the best video I could find was about porcine cysticerosis --- tapeworm in pigs).

And of course, we've established that Xhosa is MCU head canon (I really want to know the back story of how they first arrived in Wakanda, reversing the Bantu Migration, and how they rose to power!), which means that one could expect to hear clicks in Wakanda, too.

Wakanda Forever!

Given pre-release ticket sales alone, it seems like Hollywood has been sleeping on Black Panther's type of pan-African magic just the way the rest of the world has been sleeping on Wakanda's advanced technological civilization. If we're lucky, BP is going to be a smash hit with future films, TV series, Spinoffs...and maybe we'll get to hear the sounds of Wakanda just as we hear the sounds of Essos and Valyria, Middle Earth, and Qo'noS.

A great resource for the IPA

One of the best tools a linguist uses is the International Phonetic Alphabet, however learning it can feel daunting. I have historically referred students to the wikipedia page on the IPA, because it has links to individual pages for each sound, with descriptions of how the sound is produced, and audio recordings.

Now, there's another tool: an interactive IPA chart with a cross-sectional MRI so you can see the position of the tongue, lips, velum, etc. while a sound is being produced.

It's courtesy of the UCLA Speech Production and Articulation Knowledge Group, and can be found here.

One caveat: of the five available speakers, John Esling is the only one who pronounces the alveolar click /!/ correctly. Everything else seems to be great across all speakers.

Enjoy!

 

Fun With Morphology!

Causative Smallening

Friends and family members have recently said some morphologically interesting things, and I decided to take a quick second to put them down here, for posterity, because they're so freaking cool.

The context for the first was manipulating images for a slideshow. The sentence used was:

I smallened it

Everyone clearly understood it as "I made it smaller," and also knew that it was non-standard. But why?

Well, some adjectives can be made into inchoative verbs. This means if you have some adjective X you can make a verb that means 'to become X'. It's super easy: just add an -en to the word:

  • darken: to become dark
  • redden: to become red
  • liven: to become more alive/lively
  • quicken: to become quick.
  • leaven: to raise (from an older word in English we no longer have, ultimately from Latin levare 'raise')
  • toughen: to become tough
  • smarten (up): to become smart

These can also then be made transitive and are then causative verbs, meaning someone causes something to become X.

The thing is, it's normally taken to apply only to what linguists call a "closed set" which is a fancy way of saying you can only do it to some adjectives and not others. That is, it sounds weird to say "dumben it" (instead of "dumb it down") or "absurden the story" or "spicen the food."

And yet, we all have the grammatical competence to be able to (playfully) generalize to new instances, so everyone knew what "I smallened it" meant.

Rebracketing

When linguists get to the morphology segment of Intro to Linguistics, we teach "bracketing" as a tool for recognizing the internal structure of words. It's literally drawing brackets around word-pieces (let's call them morphemes). For example:

  • [ nation ]
  • [[ nation ] al ]
  • [ inter [[ nation ] al ]]

Some kinds of ambiguity are then easy to explain, as in, the door is:

  • [ un [ [ lock ] able] ]  == unable to be locked ~ un-lockable
  • [ [ un [ lock ] ] able ] == able to be unlocked ~ unlock-able

Similarly, we can bracket words that go together in sentences:

  • [ [that ridiculous man ] [ looks [ dumb ] ] ]

Sometimes, though, things break free. A classic example is the suffix -ish, which for many people now can modify much more than adjectives:

  • It was a yellowish color.
  • I guess I was excited about it, ish

All of that was to get to a family member recently saying:

There's no point in waiting to leave, it's not going to get any not dark er

That is, it's not going to get any [ [ not [ dark ] ] er ], where -er is modifying the complex structure not dark.

Often, linguists will treat these kinds of examples as mistakes, play, or somehow not part of the object of study (and make pronouncements like "inchoatives derived from adjectives are a closed set" and sometimes even claim that words like smallen are "impossible"). I think it's important that we take these kinds of novel forms --- forms that sometimes challenge theory we've learned in grad school --- seriously. In part, because if you start listening for them, they happen all. the. time.

Happy listening!

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

 

Habitual hiring

I recently came across a couple of images of African American English use in hiring signs. I think they could be an excellent tool for teaching about AAE in Introduction to Linguistics, or Intro to Sociolinguistics, since

  1. neither is 'standard' English
  2. they have a difference in meaning
  3. The difference in meaning affects strategy for someone on a job hunt.

So without further ado, let's say you've been out hunting for work, and you're down to your last resume. Which of these two places do you take your last resume?

 

If you don't speak AAE and don't know about its system of tense and aspect (which is more complex than mainstream American English), you may think it's a toss-up between the two.

However, you'd be wrong.

  • we hiring features what's called copula deletion, which is common in many languages (including Russian, Arabic, and others). It means "we're hiring (right now)".
  • we be hiring makes use of habitual 'be' which is a grammatical marker of, well, habitualness. It means "we are usually/habitually/often hiring."

Therefore, if we're to trust the signs, you've got a better chance of being hired right now going to the first store.

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

 

Bare Subject Relatives and the Sophisticated Complexity of AAE

[Trigger warning: My focus here is on a syntactic phenomenon, but a video example I'll be focusing on includes the threat of police violence and a man hypothesizing about his death at the hands of the police. The man in question is alive and unharmed.]

I've been thinking a lot lately about the complexity and sophistication of AAE syntax. Much of the work and outreach around AAE in the last 50 years has been trying to demonstrate that AAE is neither deficient nor wrong. There's a big jump, however, between not wrong, which is the end goal for many linguists, and marvelously rich, which seemingly hasn't percolated through the field much beyond AAE specialists. Christopher Hall, a colleague and friend, often implores lay people to flip the script and think about language with the starting point that AAE is the default against which other dialects should be judged.

My focus here is a syntactic feature of some varieties of African American English that doesn't get as much attention, but that is surprisingly common (especially in the South) is referred to as null subject relatives, or bare subject relative (clauses).

A recent, salient example can be found in this video a motorist took of himself chastising a cop for approaching his car with his (the officer's) service weapon drawn, about 15 seconds from the end. It probably goes without saying, but the video may be triggering for some.

[Link to video here]

 

The subtitles say "Dad shot dead by a cop who made a mistake," however this is yet another case of reporters "translating" AAE. The gentleman actually said:

"Dad shot dead by a cop made a mistake".

What's going on here?

Well, first, a relative clause is like a little mini sentence or sentence fragment that adds more information about a part of the main sentence. For instance:

  • That is the man [who I saw yesterday]
  • That is the man [who saw me yesterday]
  • The book [that I recommended to you] is on sale now.

In most varieties of English, you can delete -- that is, not say -- the relative marker (who, which, that), if it refers to the object of the relative clause.

To take the first example, the man who I saw yesterday, we can rework the relative clause as meaning something like "I saw him yesterday." In fact, many varieties of English make use of such resumptive pronouns, so it would be perfectly natural to say "That's the man who I saw him yesterday." And unsurprisingly, this kind of things is cross-linguistically common, and in some languages it's obligatory.

So if it's:

  • I (subject) saw him (object)

Most varieties of English allow you to do away with the relative marker:

  • That's the man who I saw yesterday
  • That's the man ___ I saw yesterday

AAE is interesting in that it also allows deletion of the relativizer if it marks the subject.

  • That's the man who saw me yesterday.
  • That's the man ___ saw me yesterday.

This is pretty well described in the literature, so for instance, Stefan Martin and Walt Wolfram have a chapter in Salikoko Mufwene's book African American English: Structure, History, and Use that gives a ton of excellent examples:

  • He the ___ man got all the old records
  • Wally the teacher ___ wanna retire next year
  • Jill like the man ___ met her brother last week

The above example in the video was particularly interesting because syntactic structure of the full utterance is extremely complex.

There's a pernicious and widespread view that AAE, or "ebonics" is somehow inferior or defective. It's widely regarded as both "simpler" than "standard" English, and simpler in ways that are "broken" or "wrong." However, not only does it have more complex grammar in some respects, but AAE speakers deploy sophisticated combinations of syntactic structures even under extreme stress. The sentence the motorist in the above video uttered makes use of:

  1. An "imposter" construction in which the speaker is understood to mean himself when using a name/title ("Daddy") instead of a first person pronoun ("I").
  2. Copula deletion ("Daddy shot" instead of "Daddy was shot"). This is very common cross linguistically, and is standard in Arabic, Chinese, Russian, etc.
  3. A resultative compliment to the verb ("shot dead")
  4. Passive voice --- with copula deletion --- which we understand because of the resultative. Compare "Daddy shot a gun" vs. "Daddy shot dead."
  5. A bare subject relative ("a cop ___ made a mistake").

This is a sophisticated interlocking clockwork of syntactic structures, produced under extreme stress. A tree diagram of this sentence would show all kinds of movement and deletion. And there's some evidence that people who speak other dialects do not have the complex grammatical knowledge to correctly parse this kind of utterance. And yet, people like this motorist are routinely treated as though their language is deficient.

It's a starting point for us linguists to point out that AAE is rule-governed and syntactically well-formed. However, I don't think this goes nearly far enough. "Technically not inferior" is a far cry from the truth: AAE is a varied, complex, sophisticated language variety that makes use of many complex grammatical rules that "standard" English lacks. AAE speakers are doing things other people don't understand, and not because the AAE speakers are wrong, but because they have a fuller syntactic toolbox.

 

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Partickly

Lately, I've been noticing a particular phenomenon in speech, in which the word "particularly" is pronounced more or less as "partickerly" or "partickly."

It turns out I'm not the only one to notice this, as Mark Liberman has an excellent, and much more in-depth description of the phenomenon at Language Log, with a ton of excellent audio.

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

A linguist's take on the Great GIF Controversy

The Conflict:

For years, the English-speaking internet has been divided. We cannot agree on how to pronounce gif, the acronym for graphics interchange format. Much as with the dress, each side thinks their own position is the only correct one, and that the other side is absolutely crazy. And much as with the dress, it's probably a little more complicated.

People write articles with titles like you are 100 percent wrong about how to pronounce gif. People share mocking gifs with arguments bolstering their point of view. People yell at one another. Things get entirely too heated.

I intend to shed some light on this situation.

The Options:

There are technically three ways you could pronounce gif in English, although the conflict is over the first two. The three are:

  1. so-called "hard g" which linguists represent with /g/. This is <g> as in "gift".
  2. so-called "soft g" which linguists represent with /d͡ʒ/. This is <g> as in "George." It is also sometimes represented with <j> as in "Jazz".
  3. The "French" or "super soft g", which linguists represent with /ʒ/. It is in (some) pronunciations of "rouge". (Note that some English speakers "nativize" words with this to have the /d/ sound in the "soft g", so what I call "baton rouge" they may call "baton roudge".

While I relish in ironically using the third option and watching people on both sides of the hard/soft g debate lose their minds, I recognize that nobody is going to take seriously the argument that "French g" is correct.

The Arguments:

Arguments for "hard g":

  1. It's an acronym, and the word the <g> comes from is one where it is pronounced "hard" (namely, "graphical").
  2. We often pronounce acronyms differently than we would pronounce a word spelled the same way (CIA is "see eye aye" and not "kia").
  3. Feelings. People have really strong feelings that this is the only correct way.

Arguments for "soft g":

  1.  Lots of words spelled with <gi> are pronounced with a "soft g": ginger, gin, giraffe, giant...
  2. It's easier to pronounce gif as a word and not as an acronym. Nobody is actually saying "gee eye eff". If you're going to make it a word, then make it a word!
  3. "Foreign" words often have a "soft g" (giraffe...).
  4. Feelings. People have really strong feelings this is the only correct way.

A dash of science:

I decided to take a look at this list of over 58,000 (relatively common) English words, and see what the patterns are for g-words.

There are 1836 words that start with <g> in this list, and there's not a clear rhyme or reason to the choice of "hard" versus "soft" g, so one would have to look at each of them to get a sense of the overall pattern. That's a pain in the ass. However, there is a helpful fun fact from linguistics that can constrain this problem a bit more:

"soft g" often comes from a combination of sounds, historically: a "hard g" followed by a non-low front vowel. What does that mean? That means that for the vowels /i/ "bead", /e/ "bade", /ɪ/ "bid", and /ɛ/ "bed", your tongue is actually higher in your mouth, and closer to the front of the mouth than it is for the vowels /u/ "booed", /o/ "bode", etc. The "hard g" sound is made by the back of the tongue forming a closure at the back of your mouth. These high front vowels tend to cause people to move their tongues slightly forward, and over time (we're talking hundreds of years) the sound changes to one made intentionally further forward. "Soft g" is created by a tongue closure further forward in your mouth than "hard g". Try saying words with them and pay attention to where your tongue is. (Try it! It's fun!)

This fact is part of why Italian spelling is so weird, for anyone who's tried to learn Italian.

All of that means I don't need to bother with words like "goof" because nobody is going to pronounce that with a "soft g."

So I chose to limit myself to words that start with <gi>. It turns out there are 102 of them, which meant I could simply read them and split them into "hard" and "soft". Of those, 30 are "soft" and almost all of this are of foreign origin.

30/102 (29.4%) of words that start with <gi> have a "soft g."

It's not entirely unreasonable then to thing that gif should perhaps be pronounced with a "soft g." People will argue There are more with a hard g, and that's true, but the same people will say that "soft g" is crazy, which is clearly not true.

BUT WAIT. What about words with <ge> you ask? I'm glad you asked. There were 223 of those. Of them, 197 were pronounced with a "soft g" (e.g., gene, gender, geriatric, geology, gelatinous...).

So...

197/223 (88.3%) of words that start with <ge> have a "soft g."

This means that:

Of all of the words with <g> where it could be pronounced hard or soft, 227/325 (69.8%) are pronounced with a "soft g".

It's also worth noting that in the particular list I have, fully 38% of the words are <g> either <i> or <e> and then <n>. This is important, because many people have what is referred to as the PIN-PEN merger, meaning that <i> and <e> before <n> are pronounced the same. That means Jim and gem are both pronounced the same (namely, as Jim). This is a feature of Southern American English, pretty much the entirety of the West, most of Canadian English, and most of African American English. A LOT of people do this.

This means that even if they're limiting themselves to only words that are pronounced <gi>, there are 109 more words in this list that they believe are pronounced with the "ih" vowel than if they don't have the PIN-PEN merger.

So...

For people with the PIN-PEN merger, 139/211 (65.8%) of <gi> words are pronounced with a "soft g."

The Takeaway:

Even if people are being completely rational about their decision about how to pronounce gif, it's informed by their dialect, and their personal pronunciations of other words. While it is rational to say "it's from graphics which has a 'hard g'" Nobody is saying "gee eye eff" (which coincidentally, has a "soft g"). While it's rational to say that foreign words are often nativized with a "soft g" (like giraffe), nobody says "gift" with a "soft g".

Finally, even if people are thinking statistically about it (even if it's sort of "fuzzy" math based on what they have heard in their life and not hard numbers), The conclusions they come to are dependent on their dialect, speech community, and vocabulary.

This is why I ironically go with the "French g": if you have strong feelings about the pronunciation of gif, no matter what they are, you're probably wrong. And if you're having the argument, it's because someone tried to share an image with you. Why not just be nice, instead of pedantically (and no matter what side you choose, wrongly) lecturing your acquaintances on how to say words?

 

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Bill Maher, the N-word, and that pesky R

[Trigger warning: n-words]

Bill Maher is in the news right now for dropping the n-bomb on his show in a context that many, many people found offensive. Predictably, people are coming to his defense with two arguments: (1) he was referring to himself, and (2) he "didn't say the /r/."

As a linguist, and as one of the handful of us who has given serious thought to the n-word(s), (shout out to Christopher Hall, to Arthur Spears, and to Geneva Smitherman) I want to weigh in with a (socio)linguistic perspective. My argument is:

  1. It was not ok for him to say either, and,
  2. White folks (in general) should not say either if they don't want to offend, because
  3. It is an artificial distinction for most white people, if they are borrowing from a dialect they do not speak...and the vast majority of white people do not speak (or understand) African American English, natively or otherwise. And also,
  4. In most white people's native dialect, the only n-word is a slur.

Elsewhere, Christopher Hall and I have written about the grammatical and social functions of the n-words in some varieties of AAE. We argued that there are multiple words that all include the "n-word" that fulfill various grammatical and pragmatic functions: from first person pronouns to social distance markers, to politeness (yes, politeness) forms. If you are not a native speaker of AAE, it is easy to misunderstand these uses because they are what Arthur Spears coined the term "camouflage constructions" to describe. That is, they look like they might mean something else, and so people assume they understand when they don't. Recent pilot work on cross-dialect comprehension that I worked on with a team at U Penn and NYU confirms that in general, white folks don't understand the range of uses of the n-words.

More importantly, these are uses that occur in African American English, which is a dialect that has its own accent (really, range of accents, but we'll set that aside for now). Crucially, most forms of AAE are what linguists call non-rhotic, meaning /r/s after vowels are often not pronounced. Many white dialect varieties are not non-rhotic, including Bill Maher's normal speech. So Maher will make the argument that nigga and nigger are different words, and that he said the "acceptable" one.

HOWEVER, Maher, I would argue, only has nigga in his vocabulary as a taboo deformation of the word nigger. It's the same as claiming he didn't call someone bitch, he called them betch, or bish. The point is to say "I technically didn't say the word" while still saying the word.

Here's the crux of my argument: If you don't speak AAE, whether you borrow AAE sounds or not to say nigger doesn't change what you're saying. For people to be comfortable (or less uncomfortable) with Maher's use of nigga, he'd have to (1) use it in the appropriate social context, which this was not, and (2) back it up with literally any other features of AAE... and this would still probably not make it ok. As is, he was just "being edgy" by saying a taboo word he knew would offend.

That is, we white folks don't get to say "I was using that word like you people do!" without actually being able to use any other words like AAE speakers. If the accent is right, if the word choice is right, if the grammar is right (yes, you can butcher AAE grammar --- it is as systematic and rule governed as any other language variety), and if the cultural context is right you can maybe get away with speaking AAE as a white person. Notice I didn't say "saying the n-word". That's still pretty much off the table. Even if you understand the grammar, social function, and pragmatics of use. 

Here are some tips and general rules of thumb around the n-words if you don't want to offend, and you're white in America:

When you can say "nigger" without offending:

  1. maybe in citation, either directly quoting old racist stuff, or discussing the word itself, best if at a linguistics conference or conference on race, and even then you might encounter pushback.
  2. never in casual conversation.

So basically, you can't.

When you can say "nigga" without offending:

  1. To a POC who has specifically said to you "yo, we cool, you can call me nigga. You get a pass." To that person ONLY. Probably not within earshot of anyone else. I've never heard of this situation occurring, but who knows. Also, even if you find yourself in that situation, if you actually do it, I'm not saying it's gonna go great, or that I endorse that path. 
  2. discussing the word nigga in citation form at a linguistics conference. And even then, not everyone will agree.
  3. Never in casual speech.

So theoretically it's possible, but maybe just don't.

The distinction between r-full and r-less forms has a long history, and linguists are not remotely settled as to the history of the word (for instance, Hiram Smith argues the semantically neutral r-less form goes back 200 years or more). While it's interesting, it's completely orthogonal to the question of whether it's appropriate for white people to say it. Because it has been a slur in white English from its beginnings to literally right now, in both r-full and r-less varieties of white English, people like Bill Maher don't get to decide that it no longer has all that historical baggage.

And even if you deeply understand its use in AAE speaking communities, and participate in those communities, if you actually care about the people in those communities, you still won't say it.  Even when it's linguistically appropriate. Because our language use is culturally and socially situated.

 

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

 

 

I ran all Trump's tweets through a neural net to try and figure out the meaning of Covfefe. Here is what I learned.

Unless you've been living under a rock, by now you probably know that just after midnight two days ago, Donald Trump tweeted:

Despite the negative press covfefe

Twitter went wild, predictably. For two days now, there has been heated debate about (1) how to pronounce covfefe, and (2) what covfefe means. Yesterday, Trump's press secretary Sean Spicer declared that Trump meant to type covfefe, and that its meaning was known to Trump and a select few others.

In an attempt to get to the bottom of this mystery, I decided that semantic Word Embedding Models might be useful. I have written about such models elsewhere. The (extreme oversimplification) general gist is that if you treat some document or documents as a big bag of words, you can start to treat individual words as being related to one another by their position in a (high dimensional) space. Like words cluster together, dissimilar words are far apart in this vector space. The actual implementation is technically a "Feed forward neural net" that "fine tunes through back propagation," but this is all linear algebra and code and ignores the fun of it.

In order to try to get at the Mystery of Covfefe, I decided to train a word2vec (that is, word to vector) model in R, using Ben Schmidt's wonderful package in R. In order to do so, I first needed to gather all 30,999 Trump tweets (at the time I gathered them). I did so by cloning the Trump Twitter Data Archive (note: if you have a cool coding idea, chances are someone did most of the work already. I'm learning half of coding well and fast is just finding the appropriate already collected data, already written module/library, or already worked recipes).

Once I gathered all 30,999 Trump tweets, I needed to clean them. I did minimal cleaning on the data set, so I just made all words lowercase, eliminated punctuation, and eliminated common "stopwords" -- words like "and, are, in, at, be, there, no, such" etc. This has the effect of normalizing a bit, so sad and SAD! are treated as the same word. I have not yet gotten around to lemmatization: grouping words like ran, run, running all under "run", but I'm not sure to what extent that will really affect the output.

Having run the results through Word2Vec, I did some quick sanity checks by investigating which words are the closest to a handful of given words. Closest to could? would, honestly, and can. Closest to america? safe, again, outsider, make, lets. Closest to new? york, hampshire, albany, yorkers

Clearly, it's working the way we would want it to, but are these really Trump's tweets? Closest to hillary? clinton, email, unfit, crooked, judgement, 33000, temperament. Closest to rosie? odonnell, theview, unprofessional, rude, bully.  IT WORKS!

As I did before, I chose to visualize the word embedding space by using t-SNE (for t-distributed stochastic neighbor embedding). This does not preserve relationships exactly, but keeps near things near to one another and far things far. I present the full results for your enjoyment:

 Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Tremendous. (Clicking "view image" should give a slightly bigger, clearer version)

Some really fun/interesting/hilarious clusters emerge. There's read book art deal. There's barackobama obama obamas china iran. There's my favorite: totally sad bad terrible wrong. There's the small cluster of bush cruz. There's scotland golf course.

What's missing? Covfefe.

So I decided to up the size of the model and include more words. Normally, you want 200-500 vectors in a model like this. I gave it 1000. The results are even better.

 wonderful.

wonderful.

This model results in a cluster: realdonaldtrump mr awesome 2016. And, as a quality check, crooked is still right next to hillary.

But where's covfefe?

STILL not in the model. When I manually search for it, it shows up as excluded from these findings, and is returned next to realdonaldtrump, you, and i. Which is, frankly, perfect. Perhaps Covfefe is the word for all of us together with realdonaldtrump.

I know that's kind of a cop-out, but in the process, I learned a few other interesting things. In no particular order:

First, pick almost any word and the top 10-20 nearest words in either of the resulting vector spaces will include some negative sentiment. GOP? Establishment. Christian? Jailed. Beheading. Media? Fake.   He's even hard on Russia in tweets: Russia? Traitor, laughs, taunting.

Second, closest to Ivanka? Daughter. For Barron, you have to wait till number 5 for "son" (most of the top 10 are family related words, or the names of family members).

Third, closest words to usa are miss, pageant, missuniverse, and perplexingly moscow. If you subtract pageant the closest word to usa is...balls. Checks out. Also, further down the list needs, trump, and businessman.

This brings me to one of my favorite findings. A classic example of word embeddings capturing something about semantics is that on other data sets these models have been trained on, you can add and subtract vectors meaningfully. So for instance,

paris - france + italy = rome

...which is intuitively correct. The classic example is:

king - man + woman = queen

Trump doesn't use the words man or woman all that much, actually, so in Trump's world:

king - man + woman = larry

I'm certain there are other relationships in the data that I've missed, but if there's anything that's clear from the above, it's that word embedding models really, really, really work (even if adding or subtracting "man" and "woman" are basically adding and subtracting zero, in Trump's tweets). I love the examples from cookbooks, historical newspapers, and RateMyProfessor reviews, but there's something really validating about these results, in part because Trump's speech (and twitter speech) is so colorful, and the above so clearly accurately captures it.

Finally, it looks like covfefe is off the charts, even for the surprisingly regular logic of Trump's twitter.

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Cablese and Wirespeak

I'm always interested in jargons, cants, patois(es?) and codes, and recently learned that my father-in-law, a career newsman, didn't just have a problem with his throat this whole time, but rather has been communicating with me and in Cablese (and movie references).  I knew it wasn't that he had a problem with his throat, but I had no idea what was going on. Allow me to explain:

For about a hundred years after the invention of the telegraph, the main way news was shared was using the telegraph. Converting a story written in regular old English to Morse code was time consuming, expensive, and crucially, charged by the word. And you couldn't just stick words together, like say joining phrasal verbs like WRITEUP for "write up." That's obviously two words smushed together to get around being charged by the word. However, you could get away with making a new word like UPWRITE. Newsmen developed a complex system they used on the cables ("Cablese") as well as their own codes for the news wire ("wirespeak") and each news agency also had their own secret codes (so they couldn't get scooped). Even though they don't use the telegraph anymore, Cablese and Wirespeak live on. 

This last week, my father-in-law gave me the Rosetta Stone: the book Wirespeak: Codes and Jargon of the News Business. It is fantastic.

The book has chapters on Cablese, Wirespeak, and various news agency codes. It's chapter on Cablese is entitled "backwards run the words."

So how does it work? Well, first, anything that can be joined is. But backwards, so it's clearly a different word. So for instance, DOWNHOLD for "hold down." There's a story that when British writer Evelyn Waugh was asked to investigate a rumor a British nurse had been killed in an air raid he received the cable from his editor: SEND TWO HUNDRED WORDS UPBLOWN NURSE. Waugh investigated, found the rumors were untrue, and wrote back NURSE UNUPBLOWN.

That brings me to the second part of how it works: prefixes. everywhere. Most of them are Latin, but some are French, or other.

Examples:

  1. CUM = with
  2. EX = from
  3. ET = and (e.g., MOM ETDAD)
  4. PAR = by
  5. PRO = for
  6. AD = to
  7. ANTI = against
  8. DANS = in (e.g., DANSRIVER 'in the river')
  9. UN = no, not
  10. POST = after
  11. PRE = before
  12. SUPER = on, over
  13. OMNI = all (e.g., OMNICHEERED 'everyone cheered')
  14. UNI = one
  15. SANS = without
  16. SUR = on

There are also suffixes:

  1. WARD = toward
  2. WISE = manner adverb
  3. EST = most. (Why we don't just say "est" for all superlatives will get its own blog post, to come later)
  4. ING = makes a verb from a noun, or light verb construction (This will also get its own post).
  5. SOME = full of (e.g., GLADSOME TIDINGS)

There is an apocryphal story that an international correspondent quit their job with the cable:

UPSHOVE JOB ASSWISE

There are also a ton of one offs, like SMORNING for "this morning" and SNIGHT for 'last night.'

So you get cables like: SOS ETWIFE HEADS TOKYOWARD SMORNING SANSTOP. MUCHLY APC EYEBALL ARRIVAL. URGENTEST NEED THUMBSUCKER CUM ART.

(taken from the excellent blog post on the subject Onwriting: Unearthing a lost language, which explains some of the more specific terms as well, like "thumbsucker" for "news analysis" and "art" for "photographer".)

So when my wife etme downwent DCward sweek advisit mother etfather inlaw, it outturns father unupmade weirdtalking. It was preupmade parnewsmen prehim. And now, postwise, I understand his texts meward.

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!

Benedict Cumberbatch: Ye Shall Know Him By His Dactyls

[EDIT: I've been beaten to the punch, unsurprisingly by Gretchen McCulloch. I'm only 4 years late to this party, apparently.]

For some time, people on the internet have been playing with British actor Benedict Cumberbatch's name. They call him all manner of other names. And yet we all know who is being discussed. This post is about the simple reason why.

First, an example:

 This image is legit saved to my computer as "Crindlesnatch.jpg"

This image is legit saved to my computer as "Crindlesnatch.jpg"

Now, a few more examples. And why not throw in Reddit's favorites?

So what is going on here? A few things:

  1. Meter
  2. Vowels
  3. Bs and Cs???
  4. Context?

Meter is the most important. Cinnamon Thundercat has a distinctive name in that (1) he's almost always referred to by both first and last name, and (2) both his first and last name are dactyls. That is, they are a stressed syllable followed by two unstressed syllables. Once you know the context, any string of STRESSED-unstressed-unstressed STRESSED-unstressed-unstressed referred to as a person can be easily recovered as actually referring to Brandywine Crumplepuss.

Second, the replacements often have the same kinds of vowels in the same places. Most important seems to be that the last vowel be an /æ/ as in "batch."

Third, people often, but not always, use replacements that start with B and C.

Lastly, there's often a picture, or an introduction like "British actor ______." From here, it's clear who Battleship Crustybrunch refers to.

I don't have the time at the moment, but a true overkill analysis for the Hashtag SCIENCE fans out there would be something like:

  1. collect a corpus of name replacements
  2. have study participants rank them on felicity: how good are they at being "Benedict Cumberbatch names"?

Once you've got some large number of good ones:

  1. count how many start with B--- C---, B---- X----, or X---- C---- (where X is any other letter).
  2. count how many conform to a two dactyl pattern.
  3. run them all through some tokenizer, and associate each part with a pronouncing dictionary pronunciation (e.g., the CMUDICT pronouncing dictionary).
  4. Evaluate how well each maps its vowels to those in the original name.

The real question is how can his name and the game people are playing with it be so distinctive that when I talk about Enterprise Custardshirt, you think of Khan:

 The guy on the right...

The guy on the right...

 

 

...and not Kirk?!

For good measure, here's a Benedict Cumberbatch name generator. Enjoy!

-----

 

©Taylor Jones 2017

Have a question or comment? Share your thoughts below!