A LOOK AT REGIONAL VARIATION IN AFRICAN AMERICAN ENGLISH ACCENTS

Last April, at the height of the first wave of the COVID-19 pandemic, I defended my dissertation. It will come as no surprise to anyone that I’m only now getting around to writing about it — everyone I know who has a PhD needed some distance from their dissertation before they could really condense it and get out of the weeds enough to talk to regular people about it.

My 2020 dissertation was the first ever general description of regional variation in African American English accents. Plenty of other researchers have studied individual phonological variables (like whether or how often you pronounce an /r/ after a vowel, or if you pronounce words with a syllabic /r/ like Nelly saying “it’s getting hot in hurr”), other researchers have studied differences between places (like if you pronounce fewer /r/s if you’re from New York, or more hurrs for heres if you’re from St. Louis), and other researchers had studied entire vowel systems — roughly, how you pronounce all the vowels in English, so what does it really sound like when you say GOOSE and FOOT and is the vowel sound you make there different than someone else’s? — but mainly in one place. (shoutout, though, to Charlie Farrington, who wrote an excellent dissertation, available here, that looked at a single understudied variable — replacement of /t/ or /d/ with a glottal stop — and how it varied across four cities. He used the growing Corpus of Regional African American Language, or CORAAL, and his diss has the excellent title: Language Variation and the Great Migration: Regionality and African American Language). My dissertation was the first work to look at the entire vowel system for African American English speakers across the entire country. 


To do this, I used a standardized reading passage. But it’s not as simple as it sounds, because I had to write a new one (with help and input from lots of linguists who are also native speakers of AAE, to reduce regionalisms from my own personal experience with AAE — y’all know who y’all is).  Existing reading passages were, to quote a friend of mine who I had read one, "wack” (check it out for yourself). The reading passage I used is a short story about Marcus Junior, AKA Junebug, going to the barber shop by himself for the first time, just before his 12th birthday. It was intentionally designed with lots of characters and quotes to encourage using AAE instead of formal classroom English. (I’ve actually been asked about illustrating it and making it into a children’s book — if you’re connected to that world, get at me) I did some technological workarounds so people could go to my website and record themselves reading the passage, “Junebug Goes to the Barber”, and upload it from the comfort of their own homes. I solicited participation from friends, family, extended family, Facebook, Twitter — you name it. I got big pushes from connected people like Jon Jon Johnson, Lee Colston II, @afrothighty, and NPR’s Gene Demby. Ultimately, I got more than 200 recordings, about 180 of which I used for my analysis. That’s not a lot, but it’s also 12 full hours of audio and hundreds of thousands of vowels to measure. I asked people to change things they felt were unnatural, so that means I also had to retranscribe and align each of the recordings manually. The biggest difference was that there is a near universal preference in AAE for “everybody” and the reading passage had a few “everyone”s in it — this word preference is not something that has been written about by any linguists, to my knowledge. Shout out to Gene Demby for getting that conversation started. The whole survey and reading passage are available here.

I wanted to compare to the gold standard, the Atlas of North American English, but our data collection techniques were very different. To compare against the ANAE, I decided to use modern geostatistical methods (kriging, getis-ord Gi* statistic, etc.), and I had to first show that these methods got results at least as good as the ANAE on the ANAE data. So I did that, corroborating the ANAE findings, but also making some new observations about the Northern Cities Vowel Shift along the way, challenging the dominant interpretation of how that shift started and spread. Then I used the same techniques to map pronunciation patterns in AAE. Lastly, instead of drawing dialect region boundaries by hand and superimposing my hand drawn maps to make dialect regions (a classic technique!), I used techniques from computational historical linguistics and from biology to allow the data themselves to tell me where the boundaries and clusters are. I used k-means clustering and hierarchical clustering analyses to determine how many regional varieties of AAE they are, and what their boundaries are.

My participants were overwhelmingly young, female, and well-educated, meaning that for all of my findings about how AAE differs from white Englishes, these findings are conservative, and understate the differences. As any sociolinguist will tell you, in general, the higher the level of education we attain, the more work we do to erase our unique, local accents — insofar as the features of the accent are something we are consciously aware of.

A note on maps: Some of the maps below use a technique that’s used in mining and in weather maps, to interpolate values for visualization. Do not over interpret where there are no people. I don’t have any participants from Wyoming, so the values there are just a computer’s statistical best guess based on what’s near by and what’s farther away. More research is definitely needed, and bigger projects with more people in each city (like the Corpus of Regional African American Language, or CORAAL), will shed more light on these nuanced differences. For all of the maps, the lighter color is usually more intensity of the shift under discussion, and the darker color is usually less intensity (or absence).

A note on audio: Some audio examples here are from my dissertation, others are celebrities, and some are recordings from the street. If they are only labeled with a place, they are from my dissertation research, and I am protecting the participants’ identities.

So what did I find?

This is barely scratching the surface, since this is just the first in a series of blog posts, but my main findings were:

There is no one Black Accent. 

Black folks (and linguists) been knew this. AAE exhibits strong regional variation, so people from NYC sound different from Philly and they both sound different from Atlanta and Chicago. California is different from all of them (but has similarities to DC and Baltimore, by coincidence), and Kansas City is doing its own thing. This sometimes surprises people to hear, but think about celebrities’ voices: Jay-Z (NYC) doesn’t sound like Kevin Hart (Philadelphia), and you’d never confuse either for Ryan Coogler (Richmond, CA). 

This dramatic variation existed even among highly educated people who have a strong command of “classroom” “standard” English. Even during a reading task, which are known to cause people to speak more formally and more carefully than in casual conversation. 

This means that…

Things claimed to be universal in AAE are not.

The PIN-PEN merger has historically been claimed to be a universal feature of AAE. That’s great, except it is absolutely not universal in the Northeast. Yes, I hear it in Harlem. I also hear vernacular AAE speakers who distinguish between PIN and PEN, in both NYC and Philadelphia. (Sharese King has already written about this in California, see below for some NYC examples). 

AAE is supposed to not exhibit the COT-CAUGHT merger, and by and large it doesn’t, even in places where everyone else has it. So for instance, Black folks from California tend to pronounce “on” like white New Yorkers (or sometimes, white Southeasterners) and not like white Californians. Don’t believe me? Listen to how Tiffany Haddish pronounces “on” “dog” and “ball”, or how Snoop Dogg says “on.” Yes, they’re different from one another, but they’re also very different from the pronunciations in other California accents.

But here’s the thing. AAE speakers in parts of Florida, Georgia, and South Carolina often do have the COT-CAUGHT merger, opposite local white people. As an aside, I remember years ago explaining the COT-CAUGHT merger to a friend from Atlanta in a cafe in Harlem, so I expected this finding, but it seemed to really surprise quite a few linguists (when you read this, hi Bri-bri!).

That brings me to the next finding:

AAE has a lot of the same kinds of changes as white dialects, but they follow a completely different geographic distribution, and may have developed completely independently. So white people have the COT-CAUGHT merger in California but not in Georgia, and Black people have it in Georgia but not in California. White people say words like DOWN so it sounds like day-own in parts of the Deep South, black people do it in New York (compare Jay-Z saying “bounce (with me),” “down,” or “uptown” to Robert De Niro saying those same words). White folks are slowly moving where they pronounce words like GOOSE and GOAT further forward in the mouth in the Southeast, moving westward toward Texas, and Black folks do it in the Midatlantic (most especially DC and Baltimore) and in California.

These shared patterns include chain shifts (not just one-off changes) described in the Principles of Linguistic Change, but, again, for totally different regions. For instance, the Back Upgliding Shift, also known as the “second Southern vowel shift” is present in AAE, but it’s not limited to the South, and isn’t present for Black folks in my sample from all the places it is present among white English speakers.

The “back upgliding” shift, or “second south” shift, from the Atlas of North American English

The “back upgliding” shift, or “second south” shift, from the Atlas of North American English

The Back Upgliding Shift in my data.

The Back Upgliding Shift in my data.

For reference, here’s the same shift in the Atlas of North American (white) English:

Screen Shot 2021-03-24 at 5.29.09 PM.png

That’s because:

Black accents pattern with the Great Migration. As black people fled racial terrorism in the South, and migrated across the country, their patterns of movement were very different than the patterns of movement of white people across the country. To over simplify, black people moved south-to-north, white people moved east-to-west. Segregation and Jim Crow only amplified this, so Black people in Chicago tend to sound more like Black people in Mississippi than white people in Chicago. In fact, one linguist made a convincing argument that, at a minimum, you can’t rule out “fear of a black phonology” as a main driver of the Northern Cities Vowel Shift (Van Herk, 2008). If there were already black people in decent numbers, as in New York, there was a founder effect — newcomers learned to speak like the people who were already there. If there wasn’t already a large Black population, like in Chicago, this didn’t really happen. These things play out in complex ways that are dependent on which parts of an accent are really noticeable to people and which aren’t.  

Even more than that, there were already differences in Black accents across the South. Regional variation in Black accents today are the product of modes of travel in the 19th and 20th centuries (rivers and railways). But the starting point was shaped by the location of shipping routes and slave ports where abducted and enslaved Africans were first taken. 

what are the patterns?

I was curious what story the data would tell without me interpreting them, so I used a few different clustering algorithms on people’s vowel spaces. I gave the computer all the vowel measurements for each of the vowel classes for each person, but did not give the computer any geographical data, and I asked for it to group similar with similar. Using Agglomerative Nesting, or AGNES, to look at hierarchical structure without geographic data, the results showed strong geographic patterns. People from The Bronx sounded like other people from The Bronx, and when you measure all of their pronunciations, they’re closer to each other than to people from anywhere else. But people from Brooklyn form the next closest grouping. And people from Philly are closer in their pronunciations to people from Brooklyn and The Bronx than people from Atlanta are. And so on. 

An example sub tree from my dissertation research. (I know this is tiny; I will share more readable versions of the trees in future posts).

An example sub tree from my dissertation research. (I know this is tiny; I will share more readable versions of the trees in future posts).

The question is then, how do you group these clusters? There are a handful of different statistical techniques to determine this from the data, and they all seemed to suggest around 10 groupings. Using knowledge about the real world, it looks like it should probably be about 12: the computer wants to group California with the DMV (D.C., Maryland, and Virginia), probably because they both pronounce the GOOSE and GOAT vowels with the body of the tongue further forward in the mouth (audio examples below); and it wants to group North Carolina and Michigan, which may be one group based on patterns of migration, or may not. 

In the future, I plan to build on this research, and to make more artistic maps — these were for my dissertation, which is a target audience of about 3 people. 

Mapping with 5 clusters really captures the Great Migration, but loses some of the granularity of the East Coast, and important differences up the Mississippi. It also looks almost exactly like the maps I produced of lexical variation in Twitter data in 2015.

aae_kmeans_5clusters.jpeg

Mapping 10 clusters gives a better perspective on regional differences, especially in the Northeast, and shows more granularity up the Mississippi. Chicago and Jackson, Mississippi are more similar to one another than to New York, but this higher level of granularity captures that Chicago and Minneapolis are more alike than Chicago and Jackson, 50 years after the Great Migration.

Agglomerative hierarchical clustering with vowel data (and no geographic data), with 10 clusters.

Agglomerative hierarchical clustering with vowel data (and no geographic data), with 10 clusters.

Remember that in each of these we need to add a little world knowledge: California and DC are probably not a real cluster, they just share common features, likely by chance. Specifically, fronting of the vowels in GOOSE and GOAT. (I have given semi-exaggerated audio examples here).

Hotspots for fronting and raising of /uw/ as in GOOSE and /ow/ as in GOAT on the East Coast.

Hotspots for fronting and raising of /uw/ as in GOOSE and /ow/ as in GOAT on the East Coast.

GOOSE fronting in California.

GOOSE fronting in California.

GOOSE fronting.

GOOSE fronting.

For comparison, here’s fronting of /uw/ in the Atlas of North American (white) English, where fronted /uw/ is circled in Orange.

ANAEuwInAtlas.png

How do you tell where someone is from by their accent?

In the last few years, I’ve been able to pinpoint where new people I meet are from. It’s almost a party trick at this point — I’m no Henry Higgins, but I’ve astonished and impressed quite a few people by pinpointing what state, or part of a state, they’re from. Obviously, I can’t teach everything there is to know, but there are some geographic patterns that are very salient. In future posts, I will do some deep dives into individual local accents.

Here are some of the patterns. These are generalizations and do not mean that all people from that location have that pronunciation. Rather, it reflects where a particular sound is more common.

The “African American Vowel Shift” (AAVS) involves swapped vowel nuclei for /iy/ as in FEET and /i/ as in KIT, swapped nuclei for /ey/ as in FACE and /e/ as in DRESS, and raised /uh/ as in STRUT. AAE speakers with the AAVS are from (eastern) North Carolina, and a broad path upward from the Gulf states to the Great Lakes. Note that it’s gradient, so for instance, Snoop Dogg, from California, has the shifted nuclei of /iy/ and /i/, but in general it was less prominent in my participants from California than it was in the Gulf.

The “African American Vowel Shift” (AAVS)

The “African American Vowel Shift” (AAVS)

The “African American Vowel Shift” (AAVS)

The “African American Vowel Shift” (AAVS)

Fronted GOOSE and GOAT vowels? That’s Washington D.C., Baltimore, and to a lesser extent, California (see above).

MARY-MARRY-MERRY merger, with centralized /r/ for MARRY (so MARY-MERRY are pronounced like “may-ree” and MARRY is pronounced like “Murray”)? Baltimore and DC only. Same goes for “fear” rhyming with “fur,” but this isn’t universal by any means, it’s just the main place this shift is attested at all.

Back GOOSE and GOAT, no PIN-PEN merger, and none or few of the reversals of the AAVS? That’s the Northeast, especially New York City and Philadelphia. (That’s the dark color on the AAVS map). The fact that many, many AAE speaking New Yorkers do not have the PIN-PEN merger should not come as a surprise to anyone who has heard any hip hop from New York since, well, ever (like how Whodini says “friends” in 1982, or how Biggie Smalls says everything).

Strongest PIN-PEN merger in the AAE data.

Strongest PIN-PEN merger in the AAE data.

PIN-PEN merger on the east coast. Strongest in Virginia Beach, weakest in NYC.

PIN-PEN merger on the east coast. Strongest in Virginia Beach, weakest in NYC.

Distributions of some PIN and PEN words among New Yorkers. Notice how you can divide them up pretty well.

Distributions of some PIN and PEN words among New Yorkers. Notice how you can divide them up pretty well.

As an aside, it has always perplexed me how linguists can teach that the PIN-PEN merger is a core, universal feature of AAE, and then go home and listen to hip hop from NYC where entire rhyme schemes are built on not having that merger. That Whodini track is 40 years old, and both cuts I included here for educational purposes were hit songs. The counter-evidence to our textbooks is literally all around us, every day.

COT-CAUGHT Merger? Your best bet is Florida, but you could go as far afield as Georgia and parts of South Carolina. Compare the vowel spaces for Florida and New York, below (AA refers to the COT vowel and AO refers to the CAUGHT vowel).

COT and CAUGHT vowels for Florida.

COT and CAUGHT vowels for Florida.

COT and CAUGHT vowels for New York.

COT and CAUGHT vowels for New York.

Just look at that beautiful separation in the second on the top (from Brooklyn) or the entire second row!

Raised and fronted /uh/ as in STRUT? Best bet is Kansas City or St. Louis, and parts of Oklahoma. This is why a colleague of mine from Oklahoma says he’s country with the same vowel I have in the word book.

aae_wedge_zoomed.jpeg

Vowel in DOWN/TOWN/MOUTH sounds like /æ/ (as in “cat”) or even /e/ (as in “bed”) or /ey/ (as in “say”) at the onset? Atlanta or NYC are most likely. For instance, listen to how the conductor on the 1 train in New York says “town” and “bound” in the clip below (“one-two-five where it always stays live. This is Harlem, one hundred and twenty fifth street. This is a one-three-seven bound uptown one. The next and last stop is 137, stand clear”), or how Jay-Z says “down” in the clip below that, from an interview on the Breakfast Club.

Vowel in CAUGHT/HAWK/DAWN starts with an oo sound? New York and Philadelphia. (and most places if it’s before an n).

Vowel in CAUGHT/HAWK/DAWN starts with /æ/ as in “cat”? Strongest in Mississippi and Alabama, but you’ll also find it in Tennessee, Kansas, Missouri, etc.

There are tons more patterns that I haven’t even touched on (what vowel do you have for “there”? How often do you pronounce /v/ after a vowel as in love or believe? What vowel do you have in words like thing? How often do you pronounce /r/ or /l/ after vowels? If you don’t pronounce it, do you replace it with a /w/ sound?). And these all work together as a coherent system.

Some AAE vowel systems.

Some AAE vowel systems.

Note the difference between bought and bot patterns in the Northeast and Southeast, or the patterns around where bait and bet are in North Carolina and the Gulf states, or where bat is relative to other words in these charts…these are very distinct sound systems.

So what now?

My biggest hope for the future is that researchers stop treating AAE like local divergences from white dialects, and really lean into treating it as its own set of systems instead of writing papers about a single vowel or consonant in a single place or two — this is why I tend to prefer Sonja Lanehart (and others’) approach to AAL, where the L is for Language. Theres so much more to say about this, and about regional variation, but I’ll stop here for now. My full dissertation is available here, but may only be interesting to readers interested in highly technical and detailed explanations of the statistics. I am, however, in the process of turning this material into a more digestible form for people outside of academic linguistics. Over the next few months, I will be writing posts that detail the accents of specific places, and what their unique features are, including some that I observed, but that did not make it into my dissertation (like regional patterns in how people pronounce thing). I hope that my work helps contribute to the growing chorus of voices in and outside of academia who are challenging the myth that there is one black accent, and who are challenging the academic approach that treats African American Language as just a few extra bells and whistles on local white varieties and not as its own rich linguistic variety not defined by its relationship to other language varieties.

-----

©Taylor Jones 2021

Have a question or comment? Share your thoughts below!

The Problem With Twitter Maps

Twitter is trending

I'm a huge fan of dialect geography, and a huge fan of Twitter (@languagejones), especially as a means of gathering data about how people are using language. In fact, social media data has informed a significant part of my research, from the fact that "obvs" is legit, to syntactic variation in use of the n-words. In less than a month, I will be presenting a paper at the annual meeting of the American Dialect Society discussing what "Black Twitter" can tell us about regional variation in African American English (AAVE). So yeah, I like me some Twitter. (Of course, I do do other things: I'm currently looking at phonetic and phonological variation in Mandarin and Farsi spoken corpora).

Image of North America, entirely in Tweets, courtesy of Twitter Visual Insights: https://blog.twitter.com/2013/the-geography-of-tweets

Image of North America, entirely in Tweets, courtesy of Twitter Visual Insights: https://blog.twitter.com/2013/the-geography-of-tweets


Moreover, I'm not alone in my love of Twitter. Recently, computer scientists claim to have found regional "super-dialects" on Twitter, and other researchers have made a splash with their maps of vocatives in the US:


More and more, people are using social media to investigate linguistics. However, there are a number of serious dangers inherent to spatial statistics, which are exacerbated by the use of social media data.

Spatial statistics is developing rapidly as a field, and there are a number of excellent resources on the subject I've been referring to as I dig deeper and deeper into the relationship between language and geography. Any of these books (I'm partial to Geographic Information Analysis) will tell you that people can, and do, fall prey to the ecological fallacy (assuming that some statistical relationship that obtains at one level, say, county level, holds at another level -- say, the individual). Or they ignore the Modifiable Areal Unit Problem -- which arises out of the fact that changing where you draw your boundaries can strongly affect how the data are distributed within those boundaries, even when the change is just in the size of the unit of measurement.

The  statistical consideration that most fascinates me, and seems to be the most likely to be overlooked in dealing with exciting social media data, however, is the problem of sampling.

Spatial Statistics aren't the same as Regular Statistics.

In regular statistics, more often than not, you study a sample. You can almost never study an entire population of interest, but it's not generally a problem. Because of the Law of Large Numbers, the bigger the sample, the more likely you are to be able to confidently infer something about the population the sample came from (I'm using the day-to-day meanings of words like "confidence" and "infer"). However, in the crazy, upside down world of spatial statistics, sampling can bias your results.

In order to draw valid conclusions about some kinds of spatial processes, it is necessary to have access to the entire population in question. This is a huge problem: If you want to use Twitter, there are a number of ways of gathering data that do not meet this requirement, and therefore lead to invalid conclusions (to certain questions). For instance, most people use the Twitter API to query Twitter and save tweets. There are a few ways you can do this. In my work on AAVE, I used code in Python to interact with the Twitter API, and asked for tweets containing specific words -- the API returned tweets, in order, from the last week. I therefore downloaded and saved them consecutively. This means, barring questionable behavior from the Twitter API (which is not out of the question -- they are notoriously opaque about just how representative what you get actually is), I can claim to have a corpus that can be interpreted as a population, not a sample. In my case, it's very specific -- for instance: All geo-tagged tweets that use the word "sholl" during the last week of April, 2014. We should be extremely careful about what and how much we generalize from this.

Many other researchers use either the Twitter firehose or gardenhose. The former is a real-time stream of all tweets. Because such a thing is massive, and unmanageagable, and requires special access and a super-computer, others use the gardenhose. However, the gardenhose is a(n ostensibly random) sample of 10% of the firehose. Depending on what precisely you want to study, this can be fine, or it can be a big problem.

Why is sampling such a problem?

Put simply, random noise starts to look like important clusters when you sample spatial data. To illustrate, this, I have created some random data in R.

I first created 1,000 random x and 1,000 random y values, which I combined to make points with random longitudes (x values) and latitudes (y values). For fun, I made them all with values that would fit inside a box around the US (that is, x values from -65 to -118, and y values from 25 to... Canada!). I then made a matrix combining the two values, so I had 1,000 points randomly assigned within a box slightly larger than the US. That noise looked like this:

" Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1. " "Never tell me the odds!"

" Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1. " "Never tell me the odds!"

Before we even continue, it's important to note two things. First, the above is random noise. We know this because I totally made it up. Second, before even doing anything else, it's possible to find patterns in it:

A density contour plot of random noise. Sure looks like something interesting might be happening in the upper left.

A density contour plot of random noise. Sure looks like something interesting might be happening in the upper left.

Even with completely random noise, some patterns threaten to emerge. What we can do if we want to determine if a pattern like the above is actually random is to compare it to something we know is random. To get technical, it turns out random spatial processes behave a lot like Poisson distributions, so when we take Twitter data, we can determine how far it deviates from random noise by comparing it to a Poisson distribution using a Chi-squared test. For more details on this, I highly recommend the book I mentioned above. I've yet to see anyone do this explicitly (but it may be buried in mathematical appendices or footnotes I overlooked).

This is what happens when we sample 100 points, randomly. That's 10%; the same as the Twitter gardenhose:

a 100 point sample.

a 100 point sample.

And this is what happens when we take a different 100 point random sample:

Another random 100 point sample from the same population.

Another random 100 point sample from the same population.

The patterns are different. These two tell different stories about the same underlying data. Moreover, the patterns that emerge look significantly more pronounced.

To give an clearer, example, here is a random pattern of points actually overlaying the United States I made, after much wailing, gnashing of teeth, and googling of error codes in R. I didn't bother to choose a coordinate projection (relevant XKCD):

And here are four intensity heat maps made from four different random samples drawn from the population of random point data pictured above:

This is bad news. Each of the maps looks like it could tell a convincing story. But contrary to map 3, Fargo, North Dakota is not the random point capital of the world, it's just an artifact of sampling noise. Worse, this is all the result of a completely random sample, before we add any other factors that could potentially bias the data (applied to Twitter: first-order effects like uneven population distribution, uneven adoption of Twitter, biases in the way the Twitter API returns data, etc.; second-order effects like the possibility that people are persuaded to join Twitter by their friends, in person, etc.).

What to do?

The first thing we, as researchers, should all do is think long and hard about what questions we want to answer, and whether we can collect data that can answer those questions. For instance, questions about frequency of use on Twitter, without mention of geography, are totally answerable, and often yield interesting results. Questions about geographic extent, without discussing intensity, are also answerable -- although not necessarily exactly. Then, we need to be honest about how we collect and clean our data. We should also be honest about the limitations of our data. For instance, I would love to compare the use of nuffin  and nuttin (for "nothing") by intensity, assigning a value to each county on the East Coast, and create a map like the "dude" map above -- however, since the two are technically separate data sets based on how I collected the data, such a map would be completely statistically invalid, no matter how cool it looked. Moreover, if I used the gardenhose to collect data, and just mapped all tokens of each word, it would not be statistically valid, because of the sampling problem. The only way that a map like the "dude" map that is going around is valid is if it is based on data from the firehose (which it looks like they did use, given that their data set is billions of tweets). Even then, we have to think long and hard about what the data generalizes to: Twitter users are the only people we can actually say anything about with any real degree of certainty from Twitter data alone. This is why my research on AAVE focuses primarily on the geographic extent of use, and why I avoid saying anything definitive about comparisons between terms or popularity of one over another.

Ultimately,  as social media research becomes more and more common, we as researchers must be very careful about what we try to answer with our data, and what claims we can and cannot make. Moreover, the general public should be very wary of making any sweeping generalizations or drawing any solid conclusions from such maps. Depending on the research methodology, we may be looking at nothing more than pretty patterns in random noise.

 

-----

©Taylor Jones 2014

Have a question or comment? Share your thoughts below!