What Nobody's Discussing about Rachel Doležal, Dishonesty, Dialect, and Strategy

EDIT: This looks worse than I originally thought. The linguistic observations in this post are based on the video linked in the post (from early 2014), however, Rachel Doležal speaks differently in her recent interview with Melissa Harris-Perry. Looks like my new summer project is a more rigorous comparison.

By now, everyone I know has heard about Rachel Doležal, the former president of the Spokane, WA chapter of the NAACP: specifically, that she is a white woman who has been successfully posing as black for over a decade, and was recently outed by her white parents. Many of my friends have written thoughtful and interesting comments on the situation (e.g., Brianna Cox's take on it) and a few professors I know are discussing how to go about using it as a teaching moment (and what to take away from it).

While people have discussed power structures and privilege,  passing and assimilation, whether "trans-racial" is really "a thing," and what Doležal's motivations could possibly have been, there are a few interrelated aspects of the increasingly ridiculous spectacle that have been overlooked, and which I find fascinating. Really, they're all facets of the same one observation:

Rachel Doležal didn't bother to attempt ANY use of African American English. AT ALL.

Don't believe me? Here are videos of a 2014 interview with her.

This is incredible to me -- not in the sense of 'unbelievable' but in the sense that it's just astounding that she didn't use any AAE features and that she was right that she didn't need to in order to pass. For a decade.

First, I will explain what I mean when I say she didn't use any AAE features. Then I will discuss two interrelated possibilities: (1) she couldn't hear it well enough to even know she wasn't doing it, and (2) she made the decision that she didn't need to bother with speech after changing her appearance.

As a linguist, and as a white person who speaks AAE by virtue of my childhood speech communities, I am as aware as anyone could be that race and dialect are not intrinsically related. There are white people who speak AAE, there are black people who don't, there are people of both races who think they do but really don't (like this guy!), and ultimately though race is even more difficult to pin down than dialect both resist simple description (although dialects are easier to pin down). In the US, though, because of our history, there is an ethnolect spoken by many people who are raced (i.e, "perceived by most people") as black, and while it varies from speech community to speech community, it has overarching features that we can describe -- just like how we can talk about "French" despite the fact that what's spoken in Quebec is very different from what's spoken in Côte d'Ivoire.

What Doležal pulled off was the equivalent of successfully posing as a Parisian without having so much as a French accent, let alone speaking French. A black turtleneck, a beret, the occasional Galois cigarette, and voilà: you're French. Except in this example Doležal wouldn't have needed to say voilà.

While no black American will necessarily speak with all the distinctive features of AAE, and some have none of the features of AAE, I'm flabbergasted by the fact that Doležal didn't bother with any. An excellent introduction to all the things she didn't do is Dr. John Rickford's article Phonological and Grammatical Features of African American Vernacular English. Listening to her speak, even accounting for the fact that it's a semi-formal interview and she's speaking in the capacity of Professor, it's still surprising how few features she exhibits. There are zero grammatical features: no habitual be, no stressed been, no preterite had, no AAE patterns of wh-operator and do-support inversion in main and relative clauses (relative to white North American Englishes), no done, no finna, no tryna, no typically AAE use of ain't, and so on and so forth. There's not even negative concord!

There are also very few to no phonological markers: she does not generally reduce consonant clusters (as in I was juss confuse for 'I was just confused'). She tends to fully release stops, even word finally, and she has no secondary glottalization on unvoiced stops. There's no word final devoicing of anything let alone deletion. Words ending with -ing don't get reduced to -in'.

Finally, she doesn't use any AAE specific words or phrases. This is bizarre, especially given that she went to Howard. Even white non-speakers of AAE pick up words and phrases when they live in AAE speaking communities (it's called "accommodation"). I'm not expecting her, as a Howard grad, to use bama, lunchin', cised, press, or jont after leaving DC, but...like...give us something. Say bison after mentioning Howard (even though you sued them for 'reverse racism' when you identified as white, and lost). There's no stress shift, even in completely enregistered words, like police. There's literally nothing. It. Ain't. No. NOTHIN. What's perplexing about this is that such features can often serve as in-group signals that reinforce shared community, and so it would seem reasonable that she would employ some AAE features (1) to demonstrate she is actually 'down', and (2) to connect with her interviewer.

So the question here is: why no AAE?! I have two theories. The first is that she couldn't hear it. More precisely, she could hear something, but didn't know exactly what it was that made people speak differently, and if she tried to imitate it, found out she did so poorly. I did my undergrad in Canada, but while I knew there was something going on with Torontonians' accents, I couldn't imitate it successfully until I learned about Canadian Raising. I knew words: sore-E for 'sorry,' and when to use "eh", but saying "ooot" for "out" would have just oooted me as faking the accent.  It may be that, having grown up in the Midwest and only encountering speakers of AAE after she went to Howard, at 18 years old, she just couldn't successfully acquire the phonology, and knew that if she tried it would sound like caricature. A friend of mine from Georgia can hear the New Yorkers around her saying /ɔ / in coffee and boss, but can't reliably produce it, and doesn't know which words to put it in. Others in a similar position try, but say "kwafie" and think they're succeeding. Maybe Doležal can hear AAE phonology, but isn't sure when to use it, and knows it sounds 'off' when she does.

However, this can't be the full story. She could have taken classes on African American English at Howard, and in an immersion environment for four years, she could have gotten good at it.

The other aspect to this situation is that she did the equivalent of a cost-benefits analysis and decided she didn't need to fake AAE to pass. One of my academic interests is Game Theory, and specifically Bayesian Signaling Games, which are surprisingly applicable here. In the simplest of this class of games, you have two possible types, and at least two possible signals. Agents try to infer the type of other agents by the signals those agents send. In pooling equilibria, there's no way of telling: agents of both types send the same signal and you can't glean anything from it. In separating equilibria, it's the opposite: you can completely categorize agents by the type of signal they send. Of course, the interesting class of games are those with semi-separating equilibria: this opens up the possibility of dishonesty.

In the Game Theory literature, there's also a distinction between cheap talk and costly signaling. The basic idea is that a signal that costs the sender something is potentially more trustworthy. If it costs me nothing to tell you something, it costs me nothing to lie, and if our interests are not aligned, you should be wary. Conversely, if our interests aren't aligned and I send a costly signal, that signal might tell you something useful, otherwise, why would I incur the cost of sending it?

If you were very white and chose to lie and pose as black, you would have to make decisions about what gives you the most bang for your buck, so to speak. It may be that Doležal made the evaluation that trying to use AAE features in her speech was already past the point of diminishing returns. In order to be immediately raced as black, you have to do something about appearance, although appearance alone isn't enough (the literature on passing and assimilation is relevant here). If you're going to pull off the deception, and if you're being rational about it, you want to do the minimum necessary to lie effectively, assuming going out of your way to lie incurs cost. She's trying to get the most value out of tricking people into believing she's black (being paid to teach classes, paid to be interviewed about black womens' experiences, presiding over a chapter of the NAACP, she's definitely getting value, even if we limit ourselves to financial terms only). She does so by doing the least: change her hair, think carefully about wardrobe, spray-tan but not too much. Monitoring your speech all day every day to imitate a dialect you did not acquire in the critical period? That's WAY harder.

In this respect, Doležal is reminiscent of various animals that successfully invade ecological niches, like bird species who replaces other birds' eggs with their own. She found the one niche where black women might have a slight advantage, and she did the minimum necessary to successfully signal that she was a black woman. And it worked for a decade. What's weird is that there are white professors of African American Studies, and white presidents of NAACP chapters, so it's not clear how much more advantage she got from posing as black, and she obviously incurred a cost (the enormous cost of being raced as black in America), although it may be that her values were such that she got some perverse benefit out of experiencing that cost (in a discussion of Bayesian Games in his excellent textbook on Game Theory, Steven Tadelis refers to a type that derives perverse pleasure from what should be a losing strategy as the "crazy" type.).

It seems that the obvious cost of being black in America was strong enough that, in combination with the minimum necessary changes to plausibly looks some kind of black, everyone just went with it, since it is a priori  bat-shit crazy to pose as a disenfranchised type rather than posing as the type with the significantly higher expected value (in everything from educational outcomes, to interview call-backs and job prospects, to heart disease and life expectancy, to interactions with the police. EDIT: Here's a clue to the expected value). So, given that it seems crazy to pose as black, everyone just went with it -- for a decade, people made the Bayesian calculation "what is the likelihood she's not black and just faking it given (1) her appearance and overall bearing, and (2) the relative costs and benefits of being black versus being white in America?"...and came to the rational, but wrong, conclusion that she was being truthful.

If the above is right, it has an upsetting implication for the trans-racial camp. She claims she feels black, and that she's really black, whether her ancestry is or is not. If she has such an affinity for blackness, why then do the bare minimum to pass?  I'm not black, but I'm a white ally with positive feelings toward a number of black cultures, and I use AAE not just because I am natively able to speak it, but because I like it and I respect it. What is most disturbing about what's coming to light about Doležal is that she seems to have a love-hate relationship with the idea of blackness that tends surprisingly toward hate, and tends toward caricature where it's love. She sued Howard University for being pro-black at her (white) expense (and lost), and then did the minimum to take on the mantle of blackness to benefit in precisely the ways she claimed actual black people were benefiting at her expense, and she did so in a places where there are the fewest actual black people around to compare against or to call her bluff. And, now, Black Twitter has called her bluff precisely in this arena, with #AskRachel, giving multiple choice questions 'any' black person should know the answer to, where the answers are [SPOILERS] things like "they smell like outside," or the word for "remote control" is C) Moken Troll.

I'm not sure what more to say about this other than: I can'eem deal right now.

-----

©Taylor Jones 2015

Have a question or comment? Share your thoughts below!

 

 

SoCal is Getting Fleeked Out

For anyone who's been living under a rock for the past few months, there is a term, "on fleek," that has been around since at least 2003, but which caught like wildfire on social media after June 21, 2014, when Vine user Peaches Monroe made a video declaring her eyebrows "on fleek."

Since then, the apparently non-compositional phrase on fleek has been wildly popular, and has generated the usual discussion: both declarations that it is literally the worst and "should die," and heated debates about what exactly on fleek even means. People seem to be divided on the question of whether it's synonymous with "on point." There is also a great deal of disagreement as to what can and cannot be on fleek, with "eyebrows" now the prototype against which things are measured.

After a conversation with Mia Matthias, a linguistics student at NYU, I decided to look at other syntactic constructions, thinking it possible -- in principle -- to generalize from on fleek to other constructions. Lo and behold, there is a minority of negative-minded people who describe others, snarkily, as "off fleek," (haters).  More interestingly, Southern California is getting fleeked out.

 

Geocoded tweets using variations of fleek. Toronto, you're not fooling anyone.

Geocoded tweets using variations of fleek. Toronto, you're not fooling anyone.

This is interesting because it suggests that "on fleek" is being re-interpreted, and that it is not necessarily rigidly fixed for all speakers as an idiom. Moreover, it looks like LA is leading the first move away from strictly adhering to the idiom "on fleek," by extending the use of "fleek" to the stereotypically Californian construction of [x]-ed out.

Geocoded tweets using "fleek" in California. Las Vegas, you're not fooling anyone.

Geocoded tweets using "fleek" in California. Las Vegas, you're not fooling anyone.

I'm looking forward to watching this develop, just as we can watch bae developing (one can now be baeless, for instance). I'm also looking forward to the day one can get a fleek over, or get one's fleek on.

-----

©Taylor Jones 2015

Have a question or comment? Share your thoughts below!

 

"Eem" Negation in AAVE

I recently found out that another paper of mine was accepted for a talk,  at the Penn Linguistics Conference (PLC39).

This time, I'll be discussing a phenomenon in some registers of African American Vernacular English that I recently noticed, and have dubbed "eem negation." As with much of my research interest, this is a phenomenon that is not used by everyone, but which suggests a possible syntactic change which may or may not catch on. The basic idea is that Jespersen's Cycle is progressing for some speakers of AAVE, such that for some people, "eem" is available as a negative marker. If the last sentence sounded like gibberish to you, don't worry: the rest of this post will unpack it.

Jespersen's Cycle: two negatives makes one positive

Jespersen's Cycle is a phenomenon named for this handsome fellow:

Otto 'Great Dane' Jespersen

Otto 'Great Dane' Jespersen

First thing first: Jespersen's Cycle is about Negation. It's very important to start by noting that so-called double negation (e.g., "I don't owe nobody nothin'!") is not inherently wrong or bad, as some style guides, high school English teachers, or annoying relatives who correct your speech obnoxiously at family functions might have you believe. In English, multiple negation is stigmatized, but there is nothing inherent to the grammar of English that makes it bad, as evidenced in part by how many varieties of English use it. That is, what makes it 'bad' is that it is socially evaluated, not anything about the structure of the language. When people say "two negatives make a positive" they are both demonstrably wrong and also weirdly trying to force other people's speech to conform to half-baked mathematical assumptions. Two negatives make a positive when multiplying numbers, but there's no reason language shouldn't be, say additive instead of multiplicative (e.g., -1 + -1 = -2).  Moreover, 'operations on real numbers' is a really bizarre way to think of language.

Instead, many dialects of English, and many other languages have what's called negative concord. That is, elements of a sentence should agree in negation -- meaning if one thing is negative, everything is. English has this ("It don't never mean nothing to no one nohow."), but so do French, Spanish, Italian, Russian, Chinese, and many, many other languages. That is, two negatives make one positive -- if by that you mean two negatives make a person certain you really meant it.

The Cycle that Jespersen observed was about how some languages change how they encode negatives over time. There are a few stages: in the first, you have one negative word. People, for whatever reason, choose to intensify the force of their sentence with another word (I didn't walk a step, I didn't drink a drop, etc.). Later, the intensifier is learned by children learning the language, and interpreted as obligatory. So then you have two words doing the same thing -- encoding negation. Then, the original word may become reanalyzed as optional - the word that was the intensifier becomes the word that 'means' negation. Finally, the original negation may disappear (and the new word takes its place as the sole marker of negation). 

English has gone through this process, so that you get something like:

I ne say >> I ne say not >> I say not >> I do not say

For the moment, we'll ignore the can of worms that is the introduction of "do". Similarly, French has undergone this process. Negation used to be indicated with "ne" and is now indicated with "pas" (which used to mean "step" as in I don't walk a step). Many textbooks and fuddy-duddy teachers will claim that you should say something like "je ne dis pas" to mean "I don't say," however, in modern spoken French, the ne is just...not there.

Note: the "jeo" is not a typo; that's just an older form.

Note: the "jeo" is not a typo; that's just an older form.

Many, many languages have undergone or are undergoing such a process, including Greek (6 times!), number of varieties of Arabic, French, and English. I'm arguing that in one dialect of English, it may be happening again.

U.O.E.N.O it

Here, I'm assuming a basic level of knowledge about AAVE -- that it exists, that it is a valid dialect, etc. (A quick primer can be found here). 

So. There exists a word we'll call eem. If you search for it on Twitter, you'll notice a few things: it's used between 500 and 1,000 times a day. It shows up in the context of negation 98% of the time. It looks a lot like the word even. It can sometimes be spelled een

My argument is that eem is not the same word as even. This is not a trivial thing to posit, since variations on even are common, and people often tweet how they speak. Moreover, that other 2% is made up almost entirely of people saying "eem = even."

Now, it's important to note that while I used Twitter to compile a lot of data quickly, it is by no means my only source of data -- it's a quick way to get a lot of data when you have the right kind of question. Other sources of data are sociolinguistic interviews, TV, movies, and music (there are 50-100 tokens in the extended cut of the song UOENO it (i.e., you don't eem know it), and Childish Gambino uses it in his song sweatpants, among others.).

Why claim that eem is not even? Well, for one, it only shows up as eem when there's negation, or some sort of counterfactual. That is, you can say:

"I don't eem know" or "he stopped before I eem noticed,"

...but you almost never see:

"eem Jamal was at the party"

...and you don't ever see:

*"2, 4, and 6 are all eem numbers." (the asterisk means 'ungrammatical').

In fact, in all instances I've seen of the second example ("eem Jamal was at the party") I haven't been convinced it was a native speaker of AAVE, and not someone who came across a "eem = even" tweet. That said, it's roughly ~1% of tweets that have that (almost exclusively young white women, for what what that's worth), and I have never come across it in speech.

So, eem is not just a phonological reduction of "even," (like sebm for "seven", etc.), although that's likely where it came from, nor is it just a new orthographic convention on Twitter.

Now for the cool part:

Not only do you get a lot of negation with eem, but you get a number of cool other things. Note, examples below have the original first, a rough gloss below that, and a more colloquial 'translation' below that.

(1) There are people who use eem and also then intensify their sentence with even, as in:

  • "I ain't eem even feelin' it."
  • I am NEG NEG even feeling it.
  • 'I don't like it.'

(2) There are people who only negate with eem. I cannot overstate how cool this is. It is trivially easy to find example after example of tweets where the only negation is eem, as in:

  • "Ya'll some troublemakers, but I eem mad tho."
  • You PL  some troublemakers, but I NEG mad, though.
  • You all are some troublemakers, but I'm not mad, though.

 

  • "I'm da shit, I eem care."
  • I'm the shit, I NEG care
  • 'I'm great, I don't care (about anything/anyone/etc.)'

 

  • "Irony is: in most states, strippers can eem get naked! Dey literally dancin in bathing suits rackin da fck up..."
  • Irony is: in most states, strippers can NEG get naked...
  • Irony is: in most states, strippers can not get naked

(3) There are people who use eem as the only marker of negation, and then intensify it with even:

  • "I'ma act like I eem even read that."
  • I FUT act like I NEG even read that
  • I am going to act like I didn't even read that.

 

  • "You...eem even know it."
  • you NEG even know it
  • you don't even know it.

This all suggests that eem is not the same as 'even' (although it's very likely descendent from 'even'),  and that eem is a marker of negation -- in some cases, the only one.

Phonology

There are a number of interesting phonological processes around eem, but the discussion is pretty arcane and thorny, so I'm saving it for my conference talk. The basic gist is that eem can be pronounced in a number of ways, including een, and just a long, nasalized vowel. Because of the patterns in the audio examples I have of it (linguists: /m/ before labials, /n/  or /m/ before coronals, nasalized vowel before vowels, but also /m/, not engma, before velars), I argue that it is underlyingly eem.

The Big Question: Where's this going?

The thing about language change is that we often only know about it in hindsight. Given the tidal wave of data the Internet era has ushered in, we're now able to see trends like this in realtime -- but I don't know of a way to use this to predict the future. 25,000-30,000 tokens of eem per month on Twitter is simultaneously massive -- in fact, so massive it's too much to deal with, since we still need to read each hit to determine syntactic function, presence of other negation, etc. -- and weirdly way too little, given that it's probably much less than 1% of total use of negation in AAVE. Think about how many possible negative sentences could be uttered on any given day by ~40 million people, and 1,000 Tweets a day of eem is piddling. For the sake of comparison, there were more than 183,000 of "not" in the last hour alone. Moreover, the last time this kind of thing happened in English, it took centuries.

What this means, though, is that we are possibly able to track this kind of change in real time, for the first time in history. Either way, if it fizzles out and disappears or if it spreads to completion such that the standard way of negating a sentence becomes eem after a few hundred years, we stand to learn something about language change. I can eem hardly contain my excitement.

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!

The Problem With Twitter Maps

Twitter is trending

I'm a huge fan of dialect geography, and a huge fan of Twitter (@languagejones), especially as a means of gathering data about how people are using language. In fact, social media data has informed a significant part of my research, from the fact that "obvs" is legit, to syntactic variation in use of the n-words. In less than a month, I will be presenting a paper at the annual meeting of the American Dialect Society discussing what "Black Twitter" can tell us about regional variation in African American English (AAVE). So yeah, I like me some Twitter. (Of course, I do do other things: I'm currently looking at phonetic and phonological variation in Mandarin and Farsi spoken corpora).

Image of North America, entirely in Tweets, courtesy of Twitter Visual Insights: https://blog.twitter.com/2013/the-geography-of-tweets

Image of North America, entirely in Tweets, courtesy of Twitter Visual Insights: https://blog.twitter.com/2013/the-geography-of-tweets


Moreover, I'm not alone in my love of Twitter. Recently, computer scientists claim to have found regional "super-dialects" on Twitter, and other researchers have made a splash with their maps of vocatives in the US:


More and more, people are using social media to investigate linguistics. However, there are a number of serious dangers inherent to spatial statistics, which are exacerbated by the use of social media data.

Spatial statistics is developing rapidly as a field, and there are a number of excellent resources on the subject I've been referring to as I dig deeper and deeper into the relationship between language and geography. Any of these books (I'm partial to Geographic Information Analysis) will tell you that people can, and do, fall prey to the ecological fallacy (assuming that some statistical relationship that obtains at one level, say, county level, holds at another level -- say, the individual). Or they ignore the Modifiable Areal Unit Problem -- which arises out of the fact that changing where you draw your boundaries can strongly affect how the data are distributed within those boundaries, even when the change is just in the size of the unit of measurement.

The  statistical consideration that most fascinates me, and seems to be the most likely to be overlooked in dealing with exciting social media data, however, is the problem of sampling.

Spatial Statistics aren't the same as Regular Statistics.

In regular statistics, more often than not, you study a sample. You can almost never study an entire population of interest, but it's not generally a problem. Because of the Law of Large Numbers, the bigger the sample, the more likely you are to be able to confidently infer something about the population the sample came from (I'm using the day-to-day meanings of words like "confidence" and "infer"). However, in the crazy, upside down world of spatial statistics, sampling can bias your results.

In order to draw valid conclusions about some kinds of spatial processes, it is necessary to have access to the entire population in question. This is a huge problem: If you want to use Twitter, there are a number of ways of gathering data that do not meet this requirement, and therefore lead to invalid conclusions (to certain questions). For instance, most people use the Twitter API to query Twitter and save tweets. There are a few ways you can do this. In my work on AAVE, I used code in Python to interact with the Twitter API, and asked for tweets containing specific words -- the API returned tweets, in order, from the last week. I therefore downloaded and saved them consecutively. This means, barring questionable behavior from the Twitter API (which is not out of the question -- they are notoriously opaque about just how representative what you get actually is), I can claim to have a corpus that can be interpreted as a population, not a sample. In my case, it's very specific -- for instance: All geo-tagged tweets that use the word "sholl" during the last week of April, 2014. We should be extremely careful about what and how much we generalize from this.

Many other researchers use either the Twitter firehose or gardenhose. The former is a real-time stream of all tweets. Because such a thing is massive, and unmanageagable, and requires special access and a super-computer, others use the gardenhose. However, the gardenhose is a(n ostensibly random) sample of 10% of the firehose. Depending on what precisely you want to study, this can be fine, or it can be a big problem.

Why is sampling such a problem?

Put simply, random noise starts to look like important clusters when you sample spatial data. To illustrate, this, I have created some random data in R.

I first created 1,000 random x and 1,000 random y values, which I combined to make points with random longitudes (x values) and latitudes (y values). For fun, I made them all with values that would fit inside a box around the US (that is, x values from -65 to -118, and y values from 25 to... Canada!). I then made a matrix combining the two values, so I had 1,000 points randomly assigned within a box slightly larger than the US. That noise looked like this:

" Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1. " "Never tell me the odds!"

" Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1. " "Never tell me the odds!"

Before we even continue, it's important to note two things. First, the above is random noise. We know this because I totally made it up. Second, before even doing anything else, it's possible to find patterns in it:

A density contour plot of random noise. Sure looks like something interesting might be happening in the upper left.

A density contour plot of random noise. Sure looks like something interesting might be happening in the upper left.

Even with completely random noise, some patterns threaten to emerge. What we can do if we want to determine if a pattern like the above is actually random is to compare it to something we know is random. To get technical, it turns out random spatial processes behave a lot like Poisson distributions, so when we take Twitter data, we can determine how far it deviates from random noise by comparing it to a Poisson distribution using a Chi-squared test. For more details on this, I highly recommend the book I mentioned above. I've yet to see anyone do this explicitly (but it may be buried in mathematical appendices or footnotes I overlooked).

This is what happens when we sample 100 points, randomly. That's 10%; the same as the Twitter gardenhose:

a 100 point sample.

a 100 point sample.

And this is what happens when we take a different 100 point random sample:

Another random 100 point sample from the same population.

Another random 100 point sample from the same population.

The patterns are different. These two tell different stories about the same underlying data. Moreover, the patterns that emerge look significantly more pronounced.

To give an clearer, example, here is a random pattern of points actually overlaying the United States I made, after much wailing, gnashing of teeth, and googling of error codes in R. I didn't bother to choose a coordinate projection (relevant XKCD):

And here are four intensity heat maps made from four different random samples drawn from the population of random point data pictured above:

This is bad news. Each of the maps looks like it could tell a convincing story. But contrary to map 3, Fargo, North Dakota is not the random point capital of the world, it's just an artifact of sampling noise. Worse, this is all the result of a completely random sample, before we add any other factors that could potentially bias the data (applied to Twitter: first-order effects like uneven population distribution, uneven adoption of Twitter, biases in the way the Twitter API returns data, etc.; second-order effects like the possibility that people are persuaded to join Twitter by their friends, in person, etc.).

What to do?

The first thing we, as researchers, should all do is think long and hard about what questions we want to answer, and whether we can collect data that can answer those questions. For instance, questions about frequency of use on Twitter, without mention of geography, are totally answerable, and often yield interesting results. Questions about geographic extent, without discussing intensity, are also answerable -- although not necessarily exactly. Then, we need to be honest about how we collect and clean our data. We should also be honest about the limitations of our data. For instance, I would love to compare the use of nuffin  and nuttin (for "nothing") by intensity, assigning a value to each county on the East Coast, and create a map like the "dude" map above -- however, since the two are technically separate data sets based on how I collected the data, such a map would be completely statistically invalid, no matter how cool it looked. Moreover, if I used the gardenhose to collect data, and just mapped all tokens of each word, it would not be statistically valid, because of the sampling problem. The only way that a map like the "dude" map that is going around is valid is if it is based on data from the firehose (which it looks like they did use, given that their data set is billions of tweets). Even then, we have to think long and hard about what the data generalizes to: Twitter users are the only people we can actually say anything about with any real degree of certainty from Twitter data alone. This is why my research on AAVE focuses primarily on the geographic extent of use, and why I avoid saying anything definitive about comparisons between terms or popularity of one over another.

Ultimately,  as social media research becomes more and more common, we as researchers must be very careful about what we try to answer with our data, and what claims we can and cannot make. Moreover, the general public should be very wary of making any sweeping generalizations or drawing any solid conclusions from such maps. Depending on the research methodology, we may be looking at nothing more than pretty patterns in random noise.

 

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!

 

 

LSA talk preview: Semantic Bleaching of Taboo Words, and New Pronouns in AAVE

Note: this post was coauthored with Christopher Hall.

TRIGGER WARNING: this post will discuss profanity, obscenity, taboo language, slurs, and racially charged terms.

I recently received word that an abstract Chris and I submitted to the Linguistic Society of America was accepted for a 30 minute talk at the LSA annual meeting in January of 2015. While exciting, this is also somewhat terrifying, because our research involves not just syntax, but taboo words, dialect divergence, and America's ugly racial history (and present). Outside of academia, there's an enormous amount of potential for misunderstanding, offense, hostility, and other ill feelings. Even among academics there's the potential for hurt feelings.

In brief, our research takes both recent work in syntax and recent work in sociolinguistics, and couples it with good, old-fashioned field-work and new computation methods (read: tens of thousands of tweets). However, the subject matter involves the emergence of a new class of pronouns in one (sub-)dialect of English from words that are considered offensive or taboo in other varieties of English. As such, it's potentially quite charged.

Before describing the research, it is absolutely crucial to note that:

  1. we work as descriptive linguists: this means we observe a real-world phenomenon and describe it.
  2. We neither condone nor disapprove of the data. Our job is simply to describe and analyze natural language as it is used in the world.
  3. Both authors are native speakers of the variety of English in question.

So what's the big deal? Well, we argue that there is an emerging class of words that function as pronouns (remember elementary school English class? A pronoun is a word that stands in for another noun or noun-phrase) in some varieties of African American Vernacular English (AAVE), that are built out of the grammatical reanalysis of phrases including the n- word. Well, sort of the n- word because there's excellent evidence that there are actually at least two n-words, and that some speakers of AAVE differentiate between them and use them in different contexts.

WARNING:  from here out, we will be discussing the use of words some deem extremely offensive. Seriously, just stop here if such discussion will offend you despite the above points. We will be using the actual words, not variants like b-h and n-. You've been warned!

Some preliminaries:

Pronunciation

One of the most potent slurs in American English is the racial epithet nigger (we warned you!). However, many white people oblivious to history and privilege don't hesitate to muse, "why can they [read: "black" people] use it, then?" Their observation - that some black Americans use what sounds like the same word - is valid, although insisting that makes the use of slurs OK is not valid.

AAVE is (generally) what can be called r-less and l-less. That is, in some contexts, especially at the end of words or syllables and when not followed by a vowel, words that may have an r or l are pronounced as though they do not. The stereotypical Boston accent is r-less: "pahk the car in Hahvahd yahd." (Note: "car" comes before a vowel, and therefore the r is pronounced!).

So when some speakers of AAVE use the word nigga, it is understandably interpreted as an r-less variant of a word that underlyingly has an r. However, the supposed r never shows up, not even intervocalically (jargon for "between vowels").

When people maintain that they're two different words, there seems to be good evidence for that. Note to white people: This does not give you license to use either. If you do not speak AAVE, and chances are you don't, you don't get to use either word. You WILL offend people, and no one will like you.

Semantic Bleaching

This is a term that has existed in linguistics for a long time, which we did not invent, so there is actually no pun intended. It means that a word, over time, loses shades of meaning. For our purposes, there is excellent research on "obscenity" in AAVE, the main argument being that many things that are considered obscene in other dialects have been semantically bleached. Spears (1998), for instance, argues that nigga, shit, bitch, and ass have been semantically bleached. In fact, Collins and Postal have shown that there is a particular grammatical construction that relies on the semantic bleaching of ass: the Ass Camouflage Construction (ACC), as in:

  • how ya no-phone-havin'-ass gonna call me?

Not content to just rely on the previous literature, we collected data from our stomping grounds: Harlem and the South Bronx, as well as West Philadelphia (mostly, this required little more than going outside and paying attention, although we did take notes on time, place, and type of use). We also used the Twython library for Python to extract and stored 10,000 tweets using the word nigga. While this is a huge sample by regular regular sociolinguistic norms (where 500 data points is impressive), it's worth keeping in mind that it's about 1/60th of what is tweeted in an average afternoon.

tweets containing nigga from August 19 - September 18, 2014. 16 MILLION tokens.

tweets containing nigga from August 19 - September 18, 2014. 16 MILLION tokens.

In none of the 10,000 we read was the word used as an epithet or slur (although there were some cheeky white people trying to test boundaries).

In fact, we argue that in this dialect, it is now human and male by default, but not always (an example of the not always: "I adopted a cat and I love that nigga like a person"). It is also not  inherently specified for race, like nigger and other epithets are. In fact, race is often added to it, so the authors may be referred to in our neighborhoods as "that white nigga" and "the black nigga who was with him." Others include "asian nigga," and even "African nigga."

Among those who use the term, it is now a generic term like guy.

This shift in meaning seems to have happened some time after 1972-ish, possibly in conjunction with the rise of the Black Power movement, as an attempt to reclaim the word, similar to some feminists reclaiming bitch, and cunt. It was a necessary prerequisite for the super cool grammatical change our paper is actually about.

Grammatical Change: Pronouns or ...Imposters?!

The real point of our paper is about grammatical change. There exists a class of phrases first described by Collins and Postal, called Imposters. These are phrases that grammatically behave as though they are third person (reminder: he, she, it), but actually have first person (I, we) meaning. Great examples are:

  • Daddy is going to buy you an ice cream!
  • This reporter has found himself behind enemy lines.
  • The authors have already used 3 imposters in this very article.

Where the meanings are:

  • I am going to buy you an ice cream!
  • I have found myself behind enemy lines.
  • We have already used 3 imposters in this very article.

The key here is that the noun phrases behave in the syntax of the sentence as though they are 3rd person, but the actual meaning is first person -- we just decode it.

What we do is argue that there are new pronouns in AAVE, but first we have to argue that they're not just imposters. This is not trivial! For instance, Zilles (2005) argues that Brazilian Portuguese is developing a new first person pronoun, a gente ("ah zhen-tshy"), but Taylor (2009) argues that no such thing is happening, and it's just a popular imposter.

The Paper

We argue that a nigga is becoming a pronoun, meaning "I". The corresponding plural is niggas or niggaz. We also argue that there are two second person vocatives (that is, "terms of address") which are used depending on social deference one wants to show: nigga, and my nigga. 

Yes. You read that correctly: we are claiming that saying my nigga signals politeness (...among speakers of this and only this dialect!!! Don't go saying Jones & Hall gave you the green light to say "my nigga" to your black friends!!!).

What's the evidence for pronoun status?

  1. a nigga and my nigga are phonologically reduced. That is, there is a clear difference in pronunciation between the pronoun forms and the terms meaning "a person" and "my friend." To this end, we tend to use anigga and manigga, pronounced /ənɪgə/ and /mənɪgə/ (we leave the original spacing when quoting tweets, though).
  2. No other words can intervene while still retaining the first person meaning. "A friendly nigga said hello" does not  mean "I said hello," whereas "anigga said hello" can. The first means that some friendly guy said hello, but it wasn't the speaker.
  3. anigga binds anaphors. No, that's not some kind of Greek fetish; Anaphors are words like "myself" "himself," "herself," etc. Binding in this case refers to which anaphors show up with the word. anigga  patterns with the first person words, whereas imposters do not. For almost everyone "daddy is going to buy myself an ice cream" is either ungrammatical or sounds like daddy got lost in the middle of his sentence. anigga, on the other hand, is often used with myself, as in "anigga proud of myself."
  4. Other pronouns refer back to anigga. That is, "you read all a nigga's tweets but you still don't know me."
  5. Verbs are conjugated first person, not third person, with anigga. This is totally ungrammatical with imposters, and totally normal for actual pronouns. Example:
    "Finna make myself dinner. a nigga haven't eaten all day." Compare that to "Daddy haven't eaten all day; he's going to make myself dinner." Really, really, abysmally bad.

  6. anigga can be used in certain conditions that imposters - like "a brotha" - cannot. For instance, you can say "anigga arrived," with first person meaning, but the only interpretation available for "a brotha arrived" is third person. It's for this reason that we cannot simply substitute the much-less-likely-to-offend "a brotha" in our discussion of these terms.

That's basically it. In every conceivable grammatical test, anigga patterns with actual pronouns and not with imposters.

We then attempt to pinpoint the origin of it, and find that it must have happened some time between 1970 (The Last Poets) and 1992 (Wu-Tang). In 1993, it's already being used in puns in rap music, as in Wu-Tang Clan's "Shame on a nigga (that tries to run game on a nigga)", where the meaning is "shame on a guy who tries to run game on me." The first unambiguously pronoun appearance we can find in print is from a 1995 interview with ODB ("old dirty bastard") of the Wu-Tang Clan, followed shortly by use in a magazine interview with Slick Rick. This is over 100 years after the first records we can find of the use of anigga as an imposter -- all of which are from exceedingly racist old books from the 1880s.

With regards to the terms of address nigga and manigga, the difference seems to be social deference. When in a position of greater authority, nigga is the term of address used toward another person (As in the first minute of this video of possibly the best cooking show for chefs on a budget, and an excellent example for Spears, 1998). When showing deference, manigga is used. This is why there's a clear difference in meaning between "nigga, please," and "manigga, please." The first is dismissive, the second is pleading.

Non-linguists, feel free to skip this technical paragraph. Currently, we're in the process of tallying use in Urban Fiction as a way of getting at the frequency of use. It's exceptionally difficult to get a large enough sample of material to be able to tally use of these new pronouns compared to other pronouns. If you try and compare to the frequency of "I" on Twitter, for instance, you're then comparing against all varieties of English, not just AAVE. If you use some other word as a proxy for AAVE use (hypothetically, tweets that contain the word nigga), you then have a number of other confounds, like potential bias in your data set, or in the case of using nigga possible lexical priming effects. If you try and do sociolinguistic interviews, you get observer effects that bias the data. Fiction is a good way to get at what the author of a given novel perceives as natural, which we can then compare against other authors and other datasets (eg, Twitter). The goal right now is simply to get a baseline for comparison so we can begin to home in on a plausible range we can later refine.

Concluding thoughts

It's unlikely that this pronoun will ever replace or even truly rival the usual English pronouns, however speakers of this variety of AAVE now have a new way of expressing themselves at their disposal. For the moment, the authors have the dubious distinction of potentially being the world's leading experts on the n- words. So we've got that going for us, which is nice.

 

GOOD NEWS!

I have been accepted to present at not one, but two conferences this January. I will be presenting at the annual meeting of the American Dialect Society and presenting a co-authored paper at the annual meeting of the Linguistic Society of America (which is evidently spectacularly huge). Conveniently, these are being held at the same time, in the same place, in Portland, Oregon.

Although my current research covers many other subjects, both papers are on topics in African American Vernacular English. Posts about them to come soon!

 

What is AAVE?

[UPDATE: This post now has a video companion, see below!]

AAVE is an acronym for African American Vernacular English. Other terms for it in academia are African American Varieties of English, African American English (AAE), Black English (BE) and Black English Vernacular (BEV). [EDIT: since I wrote this post in 2014, a new term has gained a lot of traction with academics: African American Language (AAL), as in the Oxford Handbook of African American Language edited by Sonja Lanehart (2015), or the Corpus of Regional African American Language (CORAAL). I now use either AAE or AAL exclusively, unless I’m specifically talking about an informal, vernacular variety, however “AAVE” has gained traction in social media just as AAL replaced it among academics]

In popular culture, it is largely misunderstood, and thought of as "bad English," "ebonics" (originally coined in 1973 by someone with good intentions, from "ebony" and "phonics," but now starting to become a slur), "ghetto talk" (definitely a slur), and the "blaccent" (a portmanteau word of "black" and "accent") that NPR seems to like using.

Why do I say it's misunderstood? Because it is emphatically not bad English. It is a full-fledged dialect of English, just like, say, British English. It is entirely rule-bound -- meaning it has a very clear grammar which can be (and has been) described in great detail. It is not simply 'ungrammatical'. If you do not conform to the grammar of AAVE, the result is ungrammatical sentences in AAVE.

That said, its grammar is different than many other dialects of English. In fact, it can do some really cool things that other varieties of English cannot. Without further ado, here's a quick run-down of what it is, what it do, and where it be:

Where does it come from?

AAVE was born in the American South, and shares many features with Southern American English. However, it was born out of the horrifically ugly history of slavery in the United States. Black Americans, by and large, did not voluntarily move to North America with like-minded people of a shared language and cultural background, as happened with waves of British, Irish, Italian, German, Swedish, Dutch, &c. &c. immigrants. Rather, people from different cultures and language cultures were torn from their homelands and sold into chattel slavery. Slaves in the US were systematically segregated from speakers of their own languages, lest they band together with other speakers of, say, Wolof (a West African language), and violently seize freedom.

There are two competing hypotheses about the linguistic origins of AAVE, neither of which will linguists ever be likely to fully prove, because the history of the US has completely obscured the origins of the dialect. Because of historical racism, we're left with hypotheses instead of documentation.

The two hypotheses are the Creole Origin Hypothesis, and the Dialect Divergence Hypothesis. Both are politically charged (linguists are people too...). The first is that contact between English speakers and among speakers of other languages led to the formation of a Creole language with an English superstrate but strong pan-African grammatical influences -- meaning lots and lots of English words, but still a distinct language from English. Another example of such a language is Bislama. The second hypothesis is that it is basically a sister dialect of Southern American English which started to diverge in the 1700s and 1800s.

How is it different?

AAVE has a number of super cool grammatical features that non-speakers tend to mistake for 'bad grammar' or 'lazy grammar'. Here is a - by no means exhaustive - list of the key differences between it and the useful hypothetical construct "General American" (GA -- basically, how newscasters speak):

  1. Deletion of verbal copula (not as dirty as it sounds). This means that in some contexts, the word "is/are" can be left out. If you think this is "lazy grammar," speakers of Russian, Arabic, and Mandarin would like to have a word with you. example. "he workin'."

  2. A habitual aspect marker (known as habitual be, or invariant be). Aspect refers to whether an action is completed or on-going. Habitual aspect means that a person regularly/often/usually does a thing, but does not give any indication of whether they are currently in the process of doing that thing. example: "he be workin'" (meaning: he is usually working.)

  3. A remote present perfect marker (stressed been). This communicates that not only is something the case, and not only is it completed (ie. perfective aspect), but it has been for a long time. example: "he been got a job." meaning: he got a job a long time ago.

  4. Negative concord. This means that negation has to all "match." If you've ever studied French, Spanish, Italian, Portuguese, Russian, or any of a whole slew of other languages, you've seen this. It is often stigmatized in English ("don't use double negatives!"), but is totally normal in many, many languages and in many varieties of English. example: He ain't never without a job! Can't nobody say he don't work.

  5. It for the dummy expletive there. What's a dummy expletive? It's that word that's necessary to say things when there isn't really an agent doing the thing in question -- like in "it's raining." Some languages can just say "raining," and be done with it. English is not one of them. In contexts where speakers of other dialects might say there, some AAVE speakers say it. example: "it's a man at the door here to see you." More famous example "Oh, Lord Jesus, It's a fire."

  6. Preterite had. This refers to grammatical constructions that in other dialects do not use had, but use the simple past. It's usually used in narrative. example: "he had went to work and then he had called his client." meaning: he went to work and then he called his client.

  7. Some varieties have 'semantic bleaching' of words that are considered obscenities in other dialects - this is where a word loses shades of meaning over time. Here's a famous example.

There are quite a few other cool grammatical features and quirks, but these are among the major innovations (yes, innovations). There's also tons of lexical variation (read: different words).

Sounds cool, what's the big deal?

Basically, racism and linguistic prejudice. We have a long cultural history of assuming that whatever black people in America do is defective. Couple this with what seems to be a natural predilection toward thinking that however other people talk is wrong, and you've got a recipe for social and linguistic stigma. For instance, in 1996 the Oakland school board took the sensible step of trying to use AAVE as a bridge to teach AAVE-speaking children how to speak and write Standard American English. They also took the less sensible step of declaring AAVE a completely different language. This was wildly misrepresented in the media, leading to a storm of racist, self-congratulatory "ain't ain't a word" pedantry from both white people and older middle-class black people who do not speak the dialect. (author's note: ain't been a word...for over 300 years.)

The use of ebonics  as a derisive slur comes out of this national media shitstorm. Literally nobody even wanted to teach AAVE, they simply wanted to use the native dialect of pre-literate children as a bridge to teach the standard dialect and to teach reading and writing. Like this program, Academic English Mastery, in Watts. How awesome was that?! Instead, it was portrayed as marxist nutjobs trying to force anarchist anti-grammar on helpless (white) American children instead of teaching them standard English.

There is absolutely nothing wrong with AAVE, but it is stigmatized for social and historical reasons, related to race, socioeconomic class, and prestige.

Who speaks it?

In general, black Americans, however there are exceptions to every part of this. Not all black Americans speak it (eg. Bill Cosby, who displays his ignorance of dialect variation often, and with gusto). Some black non-Americans speak it (eg. Drake, who speaks it professionally, and is Canadian). Not all people who speak it are black (eg. the author, Eminem, that white guy in the movie Barbershop). I even know a white linguist from Holland who speaks fluent AAE as a second language (it’s a language like any other, after all, although that kind of speaker is super rare).

In general, it can be assumed that non-black Americans probably don't speak or understand it. You can't necessarily assume, however, that a given black American does speak it. I recently tried to do the math to get a rough idea of how many people speak it, and came up with something like 30 million people, plus or minus about 10 million. I did this by looking at census data, linguistics papers that make estimates about how many black folk do speak it (ie., Rickford 1999), and guesses about how many non-black AAVE speakers might exist. So I basically pulled it out of ... a hat. (note to self: this would make a good research topic. Note to other academics: I called dibs!). Many people who do speak it are extremely adept at code-switching: in the popular imagination, that's deftly switching between dialects or registers as the social situation calls for.

As an aside, one common trope used by those against its recognition as a dialect is "no academic could ever teach a class or publish in it." The argument being that linguists are hypocritical for claiming it is a legitimate dialect, since they could never actually publish in it. This would be simply misguided if it weren't for the fact that linguists like Geneva Smitherman have published articles in AAVE.

Is it spoken the same everywhere?

Yes and no. Certain grammatical features seem to be universally used in AAVE, however there is regional variation in pronunciation. More on this in another post.

One key finding in sociolinguistics that was hard for me to wrap my head around is that a given dialect — Appalachian English, Philadelphia English, Yiddish English, African American English — may have 20 different distinctive features, but individual speakers might not use all 20. So someone who never uses habitual be can still be a native speaker of AAE.

My dissertation research demonstrated that there are at least ten distinctive accents in AAE. Other research shows that there may be regional variation in what syntactic structures are used. For instance, “be done” constructions (as in, “I be done went home when they be gettin’ wild”) used to be common in Philadelphia. We know this because we have recordings! But now some young people report having never heard such sentences, or that they’re the kind of thing their grandparents might say, but not them.

Linguists don’t all agree on what the core features are, although things like habitual be, stressed been, and consonant cluster simplification in syllable codas are good candidates. Features that only exist in AAE but aren’t universal in AAE are relevant too — like replacing the /d/ in words like bleeding with a glottal stop. (that is: [bliʔɪn]). There are also different registers in AAE, so Arthur Spears argues that African American Standard English (AASE) has different features from AAVE. Both are under the umbrella of AAE. (An example might be the pronunciation of /t/ in words like indemnity, where most white speakers of American English would pronounce that /t/ as a alveolar tap (that is: [ɪndɛmnɪɾi]), but many AAE speakers who are speaking formally might produce an aspirated t instead (that is, [ɪndɪmnɪtʰi]). There are tons of other factors that affect whether someone speaks AAVE or AASE in a given circumstance.


Closing thoughts

AAVE is a dialect of English like any other, but suffers extreme stigma due to the history of race in America. It has a systematic, coherent, rule-bound grammar. It has some super cool grammatical features that allow it to communicate complex ideas in fewer words than other dialects of English. While the rise of hip-hop and some reintegration of our cities has exposed more of the mainstream to some varieties of AAVE, it is still, unfortunately, highly stigmatized. Regarding those who still think it is somehow not valid, Oscar Gamble said it best: They don't think it be like it is, but it do.

For more on AAVE, check out this video, where I interview four Black scholars who speak and research African American language use.

 

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!

 

 

Bad Vibrations: the Bizarre Explanation Why the French 'Can't' Learn Languages

The French have a completely absurd myth about language learning that blows my mind.

Having family in France, I'm lucky that I can sometimes visit. In some ways I'm unlucky, in that my family is largely insane, but insane family is a relatively common affliction. So, when a family member asked about my studies and then used that as a segue into pontificating about a totally ridiculous theory of why the French are physiologically incapable of learning English, I just assumed this was another instance of crazy family being, well, crazy.

Then I heard the same theory from a friend of my mother. When we recounted it to her French tutor, expressing our surprise at how two people who were apparently unacquainted shared such a preposterous view, this woman -- an educator, no less -- also supported it. As more and more French people we meet volunteer that they know and believe it, we're realizing it is a well-known, culturally ingrained myth. So what is it?

Apparently, it's simply common knowledge in France that The French cannot learn English (or other foreign languages) because of...different...frequencies...and...stuff.

The general gist of the idea is that different languages occur at different frequencies, and that native speakers of one language are ill-equipped to hear and interpret, and even worse equipped to produce, those frequencies.

It's unclear whether Mercury going into retrograde also affects things.

Now, as someone who likes to play Devil's Advocate, I kept trying to find ways to understand this nonsense. I thought, perhaps they recognize that the building blocks of a spoken language are its phonemes and that those can be thought of as being defined stochastically, so each speaker has a mental target, but every utterance will miss it by some margin of error. Maybe they also know that one can represent a speaker's vowel space by using a graph of the first and second formants (that is, the secondary frequencies produced by speech sounds) plotted as the x and y axes. This seems unlikely, but whatever, maybe it's common knowledge in France. If they recognize that individual productions of a sound will, in the aggregate, cluster around this target, maybe they also know that the target could potentially vary from speaker to speaker.

 

the vowel space derived from acoustic measurement of the first and second formant midpoints of short medial vowels in 50 Northern Mao words. From Aspects of Northern Mao Phonology.

the vowel space derived from acoustic measurement of the first and second formant midpoints of short medial vowels in 50 Northern Mao words. From Aspects of Northern Mao Phonology.

 

Perhaps, then, what they're trying to say is that the target is slightly different from language to language, so an English /i/ and a French /i/ are, on average, slightly different. Then, you can make a bit of a leap and say that the fact of slightly different phonemic targets, coupled with different phonemic inventories makes it hard for an adult to learn a foreign language, because we're basically trained to separate sounds into different mental categories than in our target language, and certain combinations of F1 and F2 frequencies are ambiguous and confusing.

There's only one problem:

That's not what they mean.

No, they actually mean that the entire language as spoken by everyone who speaks any version of it, just...vibrates at a different frequency. That they can't hear. Or reproduce. In fact, there exists website after website after website that propose to train aspiring polyglots (for a fee, of course) how to open their ears and minds to these different frequencies. They often have scientific looking graphs, like this:

A stupid graph.

A stupid graph.

Nevermind the fact that there is a range that all human voices fall into, and that the vowel space is pretty well defined.

Nevermind that it's defined in two dimensions.

Nevermind that there are studies on vowel spaces across languages, and on differences among dialects of the same language (which should then be totally mutually incomprehensible).

Nevermind that women and men have different base frequencies, so according to this theory, women and men speaking the same language should be totally incomprehensible to one another [insert your own joke here].

Nevermind that "North American" isn't a language (seriously?!).

This bunk science is widely accepted as obviously true by the vast majority of French people I've interacted with. There's even a corollary: even if the French could hear and interpret those crazy foreign sounds, they can't make them because their mouths and vocal cords have become adapted to French in such a way that they are now malformed from the point of view of non-French languages (ie., langues non-civilisées).

When I ask how it's possible that I can speak and understand French, the consensus is that the frequency problem is one-way. That is, anyone can learn French (obviously the best, most expressive, most beautiful language -- ideally suited to the historical mission civilatrice), but the French are uniquely ill-suited to learn any other language because of those pesky fréquences.

One of the most interesting aspects of this myth is that it is so (psuedo) scientific. Whereas in the US people will just say they have no need, or will say they're "too old," they don't then lecture about their half-remembered misconceptions about the critical period hypothesis. In France, however, it seems totally unacceptable to say "I tried and failed," or "I never felt much need to learn anything else," or even "I can't because of external factors like age, opportunity, etc." Rather, it is a fundamental flaw of other languages which has been scientifically demonstrated: they simply vibrate at unfortunate frequencies.

I'm not quite sure what I expected from a place where doctors prescribe homeopathy and public intellectual is totally a legitimate job, but this kind of absurdity is wholly, delightfully foreign to me. Now, to have a croissant and a grand crème while I ponder whether simply digitally adjusting acoustic frequencies could create a universal translator.

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!

 

Facebook's "Emotional Contagion" Study Design: We're Mad For All the Wrong Reasons

A new study in the Proceedings of the National Academy of Sciences has been receiving an enormous amount of negative press, as their study of 'emotional contagion,' has been called 'secret mood manipulation,' 'unethical,' and a 'trampl[ing] of human ethics.' Researchers took 689,003 participants, and used the Linguistic Inquiry and Word Count (LIWC) software to manipulate the proportion and valence of 'positive' and 'negative' emotional terms that appeared on users news feeds. They then argued that emotional contagion propagates across social networks. This study has a number of flaws, and the fact that it passed Institutional Review Board (IRB) review is the least of them.

Since there's so much wrong with it, let's start first with why it's not as bad as everyone thinks: there is far more content generated by Facebook users' friends than is viewable, and so the news feed only presents users with a small sample of what their friends posted. All of their friends posts were visible (that is, nothing was suppressed!) on their walls and timelines, as well as on news feed viewings before and after the one week experiment. Facebook is very clear about the fact that they only present a subset of posts on any given user's news feed, and this experiment was simply tinkering with the algorithm for a week. A careful read of the study methodology reveals why it passed IRB review -- it's not massive, secret emotional manipulation, like some kind of google-era attempt at a privately funded MK Ultra. Rather, it was slight tinkering with how Facebook filters posts that it already filters, and is clear about filtering in their terms of use. This is not, however, an attempt at Facebook apologetics. In fact, I think the article was absolutely terrible, but for different reasons. The thing people seem to be missing is that:

Facebook claims they demonstrated emotional contagion, but cannot show that they actually successfully manipulated emotions AT ALL.

That's right, the reason I'm upset is that they didn't manipulate emotions; not because I wanted them to -- as that would potentially be an enormous violation of ethics -- but because they claimed they did and published it in a peer-reviewed journal, without actually proving anything of the sort.

There are so many flaws with the methodology that I'm going to limit myself to bullet points covering the most glaring problems:

  • "Posts were determined to be positive or negative if they contained at least one positive or negative word, as defined by Linguistic Inquiry and Word Count software (LIWC2007) (9) word counting system, which correlates with self-reported and physiological measures of well-being, and has been used in prior research on emotional expression (7, 8, 10)."  -- I'm friends with a ton of jazz musicians. When they call something bad, this is not a negative term, but would be interpreted as such by the LIWC.
  • More generally, depending on the social circle, terms like bad, dope, stupid, ill, sick, wicked, killing, ridiculous, retarded,  and terrible should be grouped differently. There is absolutely no indication that the researchers took slang or dialect variation in English into account.
  • This study does not -- and cannot -- demonstrate actual emotional contagion. They have a much better chance of demonstrating lexical priming than emotional contagion. Except, they can't demonstrate that either, because all of the terms are aggregated, so they only know that words with 'negative valence' are predictors of the use of other words with negative valence.
  • "people ’s emotional expressions on Facebook predict friends’ emotional expressions, even days later (7) (although some shared experiences may in fact last several days)" -- That is, there's no control for friends in social networks sharing a real-world experience and posting about it on Facebook using similar emotional terms.
  • "there is no experimental evidence that emotions or moods are contagious in the absence of direct interaction between experiencer and target."

In other words, the Facebook study does not control for shared experiences being described in similar terms, does not control for different semantic and pragmatic contexts (e.g., "those guys were BAD, son. [Piano player] was STUPID NASTY on the gig last night!" is extremely positive, but would be interpreted by LIWC as extremely negative), and conflates emotional contagion with lexical priming (simply, the increased likelihood of using a given term if it is 'primed' by previous use or by previous use of a related term).

In order for this study to say anything even remotely interesting, the researchers would first have to demonstrate that they can get at actual emotional state through social media posts. Then, they would have to demonstrate that they could reliably determine actual emotional state from social media posts (what is the probability that a Facebook user is experiencing sadness given that they have used descriptive terms about sadness in their posts?). Next, they'd have to separate out confounds (e.g. "nasty" for "good"). Then they'd have to demonstrate that there is in fact a 'contagion' effect. Finally, they'd have to demonstrate that the apparent contagion effect was not just lexical priming (that is, me repeating "sad" because I was primed by another person's use of the word "sad," while not actually feeling sadness). If this post is any indication, they'd also have to figure out a way to control for discussion of emotion -- this post is chock full of negative terms, while being emotionally neutral, since I'm discussing emotional terms.

The real travesty is not that the Facebook study passed IRB; it's that it passed peer review.

This is indicative of a larger problem in the sciences: there is a bias toward dramatic findings, even if they're not terribly well supported. As a linguist, it feels like linguistics suffers more from this than other fields, since there have been a slew of recent dramatic articles published about linguistic topics by non-linguist dabblers who employ terrible methodology (for instance, making claims about linguistic typology predicting economic behavior, but getting all the typologies wrong!). Whether linguistics as field suffers from this more than other fields remains to be proven by a well designed study. That said, when people decide to do research that relies heavily upon understanding linguistic behavior, it behooves the researchers to, I don't know, maybe...consult a linguist.

Ultimately, the Facebook study was (just barely) within the realm of ethical study on human subjects, although their definition of informed consent was more than a little blurry. What's truly terrible about it is the fact that they make very strong claims about emotional contagion on social networks that their research does not justify, and they passed peer review.

 

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!

Obvs is Phonological, and it's Totes Legit

Recently, NPR ran a story called Researchers are Totes Studying how Ppl Shorten Words on Twitter. It was primarily focused on what they called 'clipping,' for which the author of the article provides the example "awks," for "awkward." As far as I know, aside from the researchers interviewed by NPR, no one has done any scholarly work on this phenomenon, and as far as I can find on JSTOR and Google Scholar, no one has published anything on it.

The general consensus among regular folk is that the phenomenon is:

  1. annoying
  2. associated with young white women
  3. the result of character limits on Twitter, or choices about spelling economy in text messages.

The first two are likely in some ways true: I don't have the data to prove it (yet!), but it does seem to be most deployed by young women (who are often the leaders of linguistic change), and -- as is often the case -- because of its association with young women, it is negatively socially evaluated by the general public. My issue is with the third point. Most people take it as so obvious as to be axiomatic that 'clippings' like "obvs," and "totes legit," are the result of spelling choices. Even the Dartmouth researchers interviewed by NPR are influenced by this assumption, and were perplexed to find that people still shorten their words on Twitter even when they have plenty of characters left to write.

Not only is the assumption that it's orthographically motivated wrong, but it's a perfect example of where linguistics can provide clearer insight than can be afforded by Big Data style data mining and statistical analysis without a grounding in the past 100 years or so of the scientific study of language. Perhaps it's confirmation bias that leads people to assume that this phenomenon originated in written communication. The fact is:

Truncations like "totes" for "totally" arise out of the spoken, not written, language.

They can be described entirely in phonological terms, without recourse to writing. Moreover, they are clearly sensitive to phonological environment: specifically, primary stress. It's not entirely clear why a written truncation should be sensitive to stress. If that weren't enough, sometimes what NPR calls 'clippings' are significantly longer than the word they're supposedly an abbreviation of. Case in point:

bee tee dubs is more than 3x longer than "BTW."

So, what's really going on?

Let's break it down. There are a few key features:

  1. Words are truncated after their primary stress. A word like totally has three syllables, but its primary stress is on the first: tótally. The style of truncation under discussion is extremely productive, and can be used on new words. All of the truncations are sensitive to primary stress. When I asked women who use these forms, the consensus was that indécent becomes indeec, expósure becomes expozh, and antidisestablishmentárianism becomes antidisestablishmentairz. Note how spelling changes serve to preserve what remains of the pronunciation of the original word.
  2. As much material as possible from the syllable following the stressed syllable is incorporated into the end of the new word (that is, the onset of the following syllable is resyllabified as part of the coda of the stressed syllable).
  3. A final fricative is added if not already present (marv for marvellous). For most people who employ this kind of language play, there is actually a more restrictive rule: a final sibilant is added. This means that truncations can end with sh, zh, ch, s or z, and if there is no sibilant present, an s or z is added.

Voilà! An explanation that accounts for most of the data, explains forms that are not predicted by spelling rules, and makes correct predictions about novel forms.

The astute, Twitter-savvy reader might not be totally satisfied with the above, however. Such a reader might ask, "but what about forms like legit? Soz (sorry)?  Tommaz (tomorrow)? Bruh (brother)? "

First, it's necessary to point out that truncation is not a new phenomenon in English. Part of what motivated me to look into this phenomenon was outrage that anyone would suggest legit arose from Twitter or texting. Three words can disprove the 'twitter hypothesis': MC Hammer. 

Of course, a quick Google Ngram search will show that legit was in common use in the 1800s. Bumf, slang for tedious paperwork, is actually a truncation of 16th century 'bumfodder,' (i.e., 'toilet paper'). What's new here is the addition of the sibilant. Interestingly, it's now possible to find reanalyzed truncations on Twitter, so alongside legit, one may also see legits.

With regards to soz, appaz, tommaz, there is actually a very simple explanation: these forms are much more popular in the UK, and the speakers are non-rhotic. That is, they speak dialects that "drop the rs" (in point of fact, there is compensatory vowel lengthening in the contexts where r is not pronounced, so the r is not entirely absent). The above description actually perfectly describes how you get soz in a non-rhotic dialect. Underlyingly, it's still sorrs.

Finally, bruh, cuh, luh, and others. These are truncations, but in a different dialect of English: African American Vernacular English (although bruh has been borrowed into other dialects, like twerk, turnt, and shade have been recently). In these cases, the word is truncated after the primary stress, but subsequent material is not added to make a maximally large syllable coda.

This is where things get interesting. Truncation in both AAVE and other dialects of English leads to 'words' that are otherwise ill-formed. This may be part of why some people believe that such truncations are "annoying,"  or that their users are "ruining English." The /-bvz/ in obvs is not otherwise a permissible cluster in English (and most native speakers actually find it quite hard to say. Some 'fix' it by changing it to 'obv' or 'obvi,' the latter being the standard English diminutive or hypocoristic truncation). There, as far as I know, only four words in English that end with /ʒ/: rouge, garage, homage, and louge -- all of which are borrowed words, and some speakers 'correct' them to /dʒ/ (as in "George"). That sound does occur, however, in the middle of words like pleasure, treasure, measure, leisure, and so on...and ends up word final in truncations like plezh, trezh, mezh, leezh, and so on.

So what's the takeaway from all of this? Well, I hope it goes without saying, but young women aren't ruining English, even if they maybe speak a little differently than, say, your high school English teacher. Moreover, truncated forms like 'obvs' have nothing to do with writing. If they were simply shorthand for texting and Twitter, it would be a lot easier to wrt smthng lk ths. Instead, truncated forms are the result of language games that follow specific rules and are based on native mastery of phonology. They're closer to Pig Latin (or French Verlan, or Arrunde Rabbit Talk) than the  babbling of a "speech impaired halfwit."

So next time someone says it was totes a plezh to make your acquaints, or responds to your "how're things?" with "my sitch is pretty deec," recognize that they are playing a language game that requires total, intuitive, mastery of English...and maybs play along, rather than making things totes awks for everyone.

 

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!