I recently received word that an article I wrote was accepted for (peer reviewed) publication in the book English in Computer Mediated Communication, edited by Lauren Squires, and published by De Gruyter. I'm very excited to be included, and the book is going to be extremely interesting, with chapters on things like the social meaning of Scottish accents on Youtube, and word formation in Cyberpunk discussions, marginalized voices in The World of Warcraft, and stylistic variation on and off screen among Real Housewives, among other topics. You can pre-order the book here.

My chapter is called Tweets as Graffiti, and in it, I argue that we can learn a lot about how people actually speak from how they write on social media. More importantly, I argue that how people write on social media is not terribly different from how people have always written, especially when it's informal writing, and especially when the writing system doesn't well represent the sounds of the language.

The main thrust of my argument is that Historical linguists have long had the tools to analyze text for clues about sound, and that we can just as easily apply those tools to tweets as to graffiti. I explicitly compare Vulgar Latin graffiti and African American English on Twitter to look at the written representations of speech, but I could have just as easily compared Attic Greek and French tweets, or Middle English and Zulu tweets (now's a good time to mention the best Middle English-y twitter feed ever, Chaucer Doth Tweet).

I first summarize how we know what we know about Vulgar Latin pronunciation, which is interesting in its own right. In the article, I discuss the obvious things: puns, rhymes and meter, borrowings into and from other languages (and how things get spelled), but I also discuss Roman grammar snobs (great band name) misheard prophecy (great band name), dirty puns on the senate floor, medieval scribal shenanigans, and that one time Cicero accidentally said landicam ("clitoris"). Did I mention you can pre-order the book?

Then I point out that things we assume are super modern and totally the fault of "computers" "cell phones" and "kids these days" are totally...not. Rebus spelling ("C U L8r!")? Yeah, Pompeiians were doing that 2,000 years ago, with things like <krus> for the name Carus. In fact, in recent talks I gave at U Penn and at Gettysburg College, I played a game with the audience that I like to call Social Media or Pompeiian Graffiti? Think it sounds easy?

Instagram or Pompeiian Graffiti? "[April 19th] I made bread!"

Yelp or Pompeiian Graffiti? "Two friends dined here and had terrible service."

Twitter or Pompeiian Graffiti? "If you stick your dick in fire, you're gonna get burned."

Yelp or Pompeiian Graffiti? "The food here is poison."

Which of those were bad Yelp reviews and which were graffiti? Not so easy (spoiler: they were all Pompeiian graffiti. Further, different kind of spoiler: given where it was found [a bathroom stall], the "bread" one might have actually been about a turd.).

Finally, I apply the same tools to basilectal (~ "divergent") AAE tweets. Elsewhere, I've discussed Imagined Black English, and that sometimes it's hard for people who don't know AAE to fully get the nuance of the way some words are used. What I'm talking about here is not that. I'm talking about tweets like:

he fahn dinnamug
ioneem b doin nun!

If you speak or understand AAE fluently, your response to the first might be "sholl is," and if you don't, you're probably scratching your head trying to figure out what it could possibly mean (hint: dishware is not involved. The <g> in <mug?> is not usually pronounced, or even always written with a <g>). I'll have an explanation at the bottom for those readers. The point is, these are very divergent from classroom spelling, but they more clearly communicate both spoken pronunciation, and something about the person doing the "speaking." In the chapter, I then go through two case studies, one on what linguists call liquids (/r/ and /l/), which frequently disappear or change in AAE, and one on the glottal stop (the sound in the middle of "uh-oh"), which is used in different places in AAE. For both, I give detailed descriptions of what happens in speech, and what happens in informal writing.

The thing is, like I wrote above, the same tools are applicable to a variety of different languages. The way historical linguists make hypotheses about pronunciation, and then rigorously check them against many different types of sources are just as applicable to modern languages as to ancient ones. And the fact that people make spelling mistakes based on "sounding it out" or, more importantly, intentionally "misspell" things for effect means that we can use social media to do the same kinds of things as historical linguists: we can make and assess hypotheses about how people speak. The only difference is that we have the added benefit of being able to corroborate it, since not all the speakers are dead. Take that, historical linguistics.

So, rather than rehash what's in the chapter, I'm going to demonstrate the approach I argue for, drawing on Scots English, French, and Zulu.

am so confused the noo man there has been hunners of middle ages women addin me on facebook fae blantyre a must have a name aboot maself
— Tam Kennedy (@BigFatTam) March 4, 2016

Here, aside from using words and phrases from Scottish English (e.g., fae "from", the now "recently"), the author specifically indexes their (stereotypically) Scottish pronunciation of the vowel in words like "how" -- Scottish English changes from /aʊ/ to [ɐʉ] ~ [ɜʉ] ~ [əʉ] -- by specifically writing <the noo> (the now) and <aboot>. Similarly, he changes the diphthong in "I" and "myself", and does so by writing <am> (I'm), and <maself> (myself). An that's ignoring that he drops the <g> on "adding" and changes the spelling of "hundreds" to better reflect pronunciation. The result is that you practically hear the tweet in the right accent. Moreover, this isn't just some one-off tweet from some rando who has a great ear and too much time on his hands. Instead, tons of people converge on similar approaches to represent how they speak, which you see when you search for <doon the> or <cannae> or <the noo>.

@jaydeancramb u got daddies debit caird? If so get us a pint ae contactless al be doon the noo
— bosh (@joshowens720) June 21, 2016

And this is by no means remotely limited to English. So for instance, one of my favorite facebook groups to follow for idiomatic French is Codes De Meufs, but you can also find plenty on Twitter. For instance, ignoring the content of the tweet, the spelling here is A.MA.ZING.

Mdrr tllment sa m'fais rire , jcommence a avoir une crampe a la bouche
J'c mme pas si sa existe
— 073 (@eyl_ema) June 16, 2016

Mdr = LOL (literally, mort de rire 'dead from laughing').
Ça me fais rire is rendered as <sa m'fais rire> which is exactly how it sounds when spoken fast (the e in me disappears, and the <ç> is pronounced like an S.)
Same for <jcommence> to represent je commence ("I begin").
My favorite here? <J'c> is pronounced exactly the same as je sais ("I know"), and the whole thing relies on you knowing that no one ever uses ne in the spoken language, so <J'c mme pas> is equivalent to Je ne sais meme pas ("I don't even know").

As a last example, word final vowels in Zulu often disappear in fast speech (usually, they precede a following word that has a word initial vowel, so they just kinda make way for the next vowel [this is not the technical term]). Similarly, some vowels in some words may also disappear. This does not happen in formal writing in Zulu (for those of you with preconceived notions about "tribal" people, yes, there is formal writing in Zulu, and yes, the Zulu are on Twitter). So in this tweet:

@kinishandu @TheBigBangRSA mina kuyangphoqa ukuth ngihlezi ngigqoke ngendlela ehloniphekile kuze abant bezokholelwa kmina
— DJ SQENGQE (@DJ_SQENGQE) March 17, 2016

The word <ukuth> is ukuthi ('that')
<abant> is abantu ('people' whence bantu lanuguages -- the languages people speak),
<kmina> is kumina ('to me')
<kuyangphoqa> is kuyangiphoqa ("it forces me").

(The whole tweet is about how he feels compelled to dress a particular way to be taken seriously.) Lest you think it was just that he was running out of characters, go on and search for <ukuth> on Twitter.

The point is, as I'm fond of reminding everyone, people tweet how they speak. I'm not the first person to say this (I have a pretty decent literature review in the book chapter, but I'm sure there are even more articles and presentations I'm missing), but I think it's important to bear in mind, and it opens up a lot of possibilities for linguistic research. For instance, Gabriel Doyle has demonstrated that unique grammatical patterns in some dialects show up just as well on maps of Twitter data as in the Atlas Of North American English, which uses much more traditional methodology for Dialect Geography. Jacob Eisenstein has demonstrated that people delete the <g> in word final -ING (<workin> vs. <working>) at about the rate you'd expect based on who lives where the tweets are from AND morphosyntactic constraints in those people's spoken language. Jack Grieve has a book mapping syntactic variation, in part drawing on the Twitter data he used to make those heatmaps of favored swearwords by region. And of course, I've been using Twitter and other social media sources to do things like demonstrate that there are patterns present in African American English -- that appear on social media -- that are related to The Great Migration.

Ultimately Twitter is just one platform among many, and social media are just one source of data among many. Ideally, good research using social media data will corroborate it, not just with other social media, but with more traditional sociolinguistic methods (eg., ethnographic research, interviews, etc). However, I think when we write off social media as frivolous (which happens surprisingly often), or as "not speech" (as one anonymous reviewer for a different article helpfully pointed out multiple times), we miss that it is in fact related to speech, and is a valid source of data that can corroborate other forms of evidence, and can even help inspire new avenues of research.

Finally, as promised, for those confused by the above AAE: <fahn> is fine and <dennamug/dinnamug/dennamuh/> are than a mu... which is short for "than a motherfucker." The meaning is "he is extremely attractive."

-----

Have a question or comment? Share your thoughts below!