Discussion

The preceding pages are far from exhaustive, but they show how extensive and varied the social-science applications of word embeddings can be. I close by compiling some open questions and by highlighting the applications that are potentially the most useful to develop into empirical research questions.

There’s a stable core to the concept of community: a sense of unity, of groupness, of belonging. This core meaning has been present since the time of Tönnies and other classical sociologists, and it persists in large online discursive sites like Wikipedia and Twitter today. At the same time, community is a multivalent and ambiguous concept: it spans families, neighborhoods, and religious organizations, yet also resonates with minoritized or marginalized social groups. It can extend as far as the nation, or even into the domain of business and marketing.

Going forward, I’d aim to map some of these discursive dimensions of community onto the theoretical dimensions of the concept. Dimensionality reduction seems insufficient on its own, but clustering on the high-dimensional data may uncover consistent patterns. I’d need either a principled way to determine which words truly fall into the local neighborhood around the vector for “community,” or to do some sensitivity analyses for the number of nearest neighbors. Alternatively, constructing theoretically-informed axes or dimensions, whether these are binary oppositions or not, may be a better way to proceed.

To make this work more empirically interesting, I’d apply these methods to specific, concrete corpora. Texts naturally produced by social actors — for instance, speeches by politicians or activists, news articles or corporate PR — strike me as more promising than interview transcripts or open-ended survey responses, though the latter could be compared to closed-ended questions about experiences of community. With large enough corpora, I might consider training models locally to see how distinct the representations for “community” are when learned from different types of texts. However I proceed, I’d expect to find systematic variation in which actors invoke which senses of community, but my expectations remain to be refined.

These pages have been an early-stage exploration of what community means through a computational lens. As I begin work on my dissertation, I’m sharing these initial thoughts as a starting point for conversation. My intuition is that the best research happens in community — and I hope to find out more about what that means.