TY - JOUR
T1 - Bigram Semantic Distance as an Index of Continuous Semantic Flow in Natural Language
T2 - Theory, Tools, and Applications
AU - Reilly, Jamie
AU - Finley, Ann Marie
AU - Litovsky, Celia P.
AU - Kenett, Yoed N.
N1 - Publisher Copyright:
© 2023 American Psychological Association
PY - 2023/4/20
Y1 - 2023/4/20
N2 - Much of our understanding of word meaning has been informed through studies of single words. Highdimensional semantic space models have recently proven instrumental in elucidating connections between words. Here we show how bigram semantic distance can yield novel insights into conceptual cohesion and topic flow when computed over continuous language samples. For example, “Cats drink milk” is comprised of an ordered vector of bigrams (cat-drink, drink-milk). Each of these bigrams has a unique semantic distance. These distances in turn may provide a metric of dispersion or the flow of concepts as language unfolds. We offer an R-package (“semdistflow”) that transforms any user-specified language transcript into a vector of ordered bigrams, appending two metrics of semantic distance to each pair. We validated these distance metrics on a continuous stream of simulated verbal fluency data assigning predicted switch markers between alternating semantic clusters (animals, musical instruments, fruit). We then generated bigram distance norms on a large sample of text and demonstrated applications of the technique to a classic work of short fiction, To Build a Fire (London, 1908). In one application, we showed that bigrams spanning sentence boundaries are punctuated by jumps in the semantic distance.We discuss the promise of this technique for characterizing semantic processing in real-world narratives and for bridging findings at the single word level with macroscale discourse analyses.
AB - Much of our understanding of word meaning has been informed through studies of single words. Highdimensional semantic space models have recently proven instrumental in elucidating connections between words. Here we show how bigram semantic distance can yield novel insights into conceptual cohesion and topic flow when computed over continuous language samples. For example, “Cats drink milk” is comprised of an ordered vector of bigrams (cat-drink, drink-milk). Each of these bigrams has a unique semantic distance. These distances in turn may provide a metric of dispersion or the flow of concepts as language unfolds. We offer an R-package (“semdistflow”) that transforms any user-specified language transcript into a vector of ordered bigrams, appending two metrics of semantic distance to each pair. We validated these distance metrics on a continuous stream of simulated verbal fluency data assigning predicted switch markers between alternating semantic clusters (animals, musical instruments, fruit). We then generated bigram distance norms on a large sample of text and demonstrated applications of the technique to a classic work of short fiction, To Build a Fire (London, 1908). In one application, we showed that bigrams spanning sentence boundaries are punctuated by jumps in the semantic distance.We discuss the promise of this technique for characterizing semantic processing in real-world narratives and for bridging findings at the single word level with macroscale discourse analyses.
KW - language
KW - semantic distance
KW - semantic memory
UR - http://www.scopus.com/inward/record.url?scp=85158900727&partnerID=8YFLogxK
U2 - 10.1037/xge0001389
DO - 10.1037/xge0001389
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85158900727
SN - 0096-3445
VL - 152
SP - 2578
EP - 2590
JO - Journal of Experimental Psychology: General
JF - Journal of Experimental Psychology: General
IS - 9
ER -