class: title-slide
.measure.mytitle[ # How do vector space models deal with homonymy and polysemy? ## **Mariana Montes & Dirk Geeraerts**  ] --- layout: true .date-footnote[AELCO — Logroño, 29/06/2022] --- # A model of *schaal* .center[  ] --- # A model of *schaal* .pull-left[ <!-- --> ] -- .pull-right[  ] --- .pull-left[ #### Original text .gold.b[(1)] Would you like to **study** *linguistics*? .light-blue.b[(2)] They **study** this in *computational linguistics* too. .green.b[(3)] I eat *chocolate* while I **study**. ] .pull-right[ #### Token-context matrix <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> target </th> <th style="text-align:right;"> language/n </th> <th style="text-align:right;"> word/n </th> <th style="text-align:right;"> english/j </th> <th style="text-align:right;"> speak/v </th> <th style="text-align:right;"> flemish/j </th> <th style="text-align:right;"> eat/v </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;color: #E69F00 !important;"> study<sub>1</sub> </td> <td style="text-align:right;"> 4.37 </td> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> 3.16 </td> <td style="text-align:right;"> 0.41 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: #56B4E9 !important;"> study<sub>2</sub> </td> <td style="text-align:right;"> 5.97 </td> <td style="text-align:right;"> 1.07 </td> <td style="text-align:right;"> 3.16 </td> <td style="text-align:right;"> 0.41 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: #009E73 !important;"> study<sub>3</sub> </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 1.28 </td> <td style="text-align:right;"> 3.08 </td> </tr> </tbody> </table> ] -- .pull-left.dist[ #### Token-token distance matrix <table> <thead> <tr> <th style="text-align:left;"> target </th> <th style="text-align:left;"> study<sub>1</sub> </th> <th style="text-align:left;"> study<sub>2</sub> </th> <th style="text-align:left;"> study<sub>3</sub> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;color: #E69F00 !important;"> study<sub>1</sub> </td> <td style="text-align:left;"> <span style=" color: grey !important;">0</span> </td> <td style="text-align:left;"> <span style=" color: black !important;">0.01</span> </td> <td style="text-align:left;"> <span style=" color: black !important;">1</span> </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: #56B4E9 !important;"> study<sub>2</sub> </td> <td style="text-align:left;"> <span style=" color: black !important;">0.01</span> </td> <td style="text-align:left;"> <span style=" color: grey !important;">0</span> </td> <td style="text-align:left;"> <span style=" color: black !important;">1</span> </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: #009E73 !important;"> study<sub>3</sub> </td> <td style="text-align:left;"> <span style=" color: black !important;">1</span> </td> <td style="text-align:left;"> <span style=" color: black !important;">1</span> </td> <td style="text-align:left;"> <span style=" color: grey !important;">0</span> </td> </tr> </tbody> </table> ] -- .pull-right[ #### t-SNE visualization  ] .footnote[ <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#0266a0ff;" xmlns="http://www.w3.org/2000/svg"> <path d="M448 360V24c0-13.3-10.7-24-24-24H96C43 0 0 43 0 96v320c0 53 43 96 96 96h328c13.3 0 24-10.7 24-24v-16c0-7.5-3.5-14.3-8.9-18.7-4.2-15.4-4.2-59.3 0-74.7 5.4-4.3 8.9-11.1 8.9-18.6zM128 134c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm0 64c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm253.4 250H96c-17.7 0-32-14.3-32-32 0-17.6 14.4-32 32-32h285.4c-1.9 17.1-1.9 46.9 0 64z"></path></svg> van der Maaten & Hinton (2008) <svg viewBox="0 0 640 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#0266a0ff;" xmlns="http://www.w3.org/2000/svg"> <path d="M278.9 511.5l-61-17.7c-6.4-1.8-10-8.5-8.2-14.9L346.2 8.7c1.8-6.4 8.5-10 14.9-8.2l61 17.7c6.4 1.8 10 8.5 8.2 14.9L293.8 503.3c-1.9 6.4-8.5 10.1-14.9 8.2zm-114-112.2l43.5-46.4c4.6-4.9 4.3-12.7-.8-17.2L117 256l90.6-79.7c5.1-4.5 5.5-12.3.8-17.2l-43.5-46.4c-4.5-4.8-12.1-5.1-17-.5L3.8 247.2c-5.1 4.7-5.1 12.8 0 17.5l144.1 135.1c4.9 4.6 12.5 4.4 17-.5zm327.2.6l144.1-135.1c5.1-4.7 5.1-12.8 0-17.5L492.1 112.1c-4.8-4.5-12.4-4.3-17 .5L431.6 159c-4.6 4.9-4.3 12.7.8 17.2L523 256l-90.6 79.7c-5.1 4.5-5.5 12.3-.8 17.2l43.5 46.4c4.5 4.9 12.1 5.1 17 .6z"></path></svg> `nephosem` (QLVL 2021), `Rtsne` (Krijthe 2015) ] --- ## Model of *schaal* .center[ <!-- --> ] ??? 318 tokens --- layout: false class: title-slide .myh.center[ # Manually annotated <br> senses ] --- ## Senses of *schaal* **Schaal 1** - Range (*The scale of Richter*, *a scale from 1 to 10*) - Ratio (*The scale of a map, 1:100*) - Magnitude (*On a large scale*) **Schaal 2** - Plate, dish - Plate of a weighting instrument --- ## (Measuring) range .pull-left[ <!-- --> ] .pull-right[ - 32 tokens **Example** In de gedaante van een aardbeving van 6.3 op de **schaal** van Richter om precies te zijn. In the shape of an earthquake of 6.3 on the **scale** of Richter, to be precise. ] --- ## Size ratio .pull-left[ <!-- --> ] .pull-right[ - 10 tokens **Example** De gebogen hoogbouw staat erbij als een maquette **schaal** één op één. The curved high-rise stands as a model with a **scale** of one to one. ] --- ## Magnitude .pull-left[ <!-- --> ] .pull-right[ - 214 tokens **Example** In Israël werd de moord op grote **schaal** toegejuicht. In Israel the murder was applauded on a large **scale**. ] --- ## Plate .pull-left[ <!-- --> ] .pull-right[ - 43 tokens **Example** Verdeel de tartaar in vier egale porties en leg een eidooier in de **schaal** op de tartaar. Distribute the tartar in four equal portions and lie one egg yolk on the **plate** over the tartar. ] --- ## Plate of weighting instrument .pull-left[ <!-- --> ] .pull-right[ - 19 tokens **Example** Als monarch kun je bij zoiets gewicht in de **schaal** leggen. As monarch you can put weight on the **scale** (= weigh in) in such a situation. ] --- layout: false class: title-slide .myh[ # Automatic clustering ] --- ## Cluster 1 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Cluster 2 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Cluster 3 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Cluster 4 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Cluster 5 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Cluster 6 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Cluster 1 .pull-left[ <!-- --> ] .pull-right[ ### Most characteristic context words <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Context word </th> <th style="text-align:right;"> Frequency </th> <th style="text-align:right;"> Recall </th> <th style="text-align:right;"> Precision </th> <th style="text-align:right;"> Fscore </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Richter/name </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 1.000 </td> </tr> <tr> <td style="text-align:left;"> van/prep </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.324 </td> <td style="text-align:right;"> 0.489 </td> </tr> <tr> <td style="text-align:left;"> 4,8/num </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.182 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.308 </td> </tr> <tr> <td style="text-align:left;"> de/det </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.144 </td> <td style="text-align:right;"> 0.251 </td> </tr> <tr> <td style="text-align:left;"> aardbeving/noun </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.136 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.240 </td> </tr> </tbody> </table> ] --- ## Cluster 2 .pull-left[ <!-- --> ] .pull-right[ ### Most characteristic context words <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Context word </th> <th style="text-align:right;"> Frequency </th> <th style="text-align:right;"> Recall </th> <th style="text-align:right;"> Precision </th> <th style="text-align:right;"> Fscore </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Europees/adj </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.219 </td> <td style="text-align:right;"> 0.700 </td> <td style="text-align:right;"> 0.333 </td> </tr> <tr> <td style="text-align:left;"> op/prep </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.134 </td> <td style="text-align:right;"> 0.236 </td> </tr> <tr> <td style="text-align:left;"> een/det </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 0.312 </td> <td style="text-align:right;"> 0.161 </td> <td style="text-align:right;"> 0.213 </td> </tr> <tr> <td style="text-align:left;"> en/vg </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 0.250 </td> <td style="text-align:right;"> 0.136 </td> <td style="text-align:right;"> 0.176 </td> </tr> <tr> <td style="text-align:left;"> breed/adj </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.094 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.171 </td> </tr> </tbody> </table> ] --- ## Cluster 3 .pull-left[ <!-- --> ] .pull-right[ ### Most characteristic context words <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Context word </th> <th style="text-align:right;"> Frequency </th> <th style="text-align:right;"> Recall </th> <th style="text-align:right;"> Precision </th> <th style="text-align:right;"> Fscore </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> groot/adj </td> <td style="text-align:right;"> 137 </td> <td style="text-align:right;"> 0.986 </td> <td style="text-align:right;"> 0.945 </td> <td style="text-align:right;"> 0.965 </td> </tr> <tr> <td style="text-align:left;"> op/prep </td> <td style="text-align:right;"> 133 </td> <td style="text-align:right;"> 0.957 </td> <td style="text-align:right;"> 0.556 </td> <td style="text-align:right;"> 0.704 </td> </tr> <tr> <td style="text-align:left;"> de/det </td> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 0.345 </td> <td style="text-align:right;"> 0.314 </td> <td style="text-align:right;"> 0.329 </td> </tr> <tr> <td style="text-align:left;"> word/verb </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 0.209 </td> <td style="text-align:right;"> 0.707 </td> <td style="text-align:right;"> 0.322 </td> </tr> <tr> <td style="text-align:left;"> en/vg </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 0.209 </td> <td style="text-align:right;"> 0.492 </td> <td style="text-align:right;"> 0.293 </td> </tr> </tbody> </table> ] --- ## Cluster 4 .pull-left[ <!-- --> ] .pull-right[ ### Most characteristic context words <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Context word </th> <th style="text-align:right;"> Frequency </th> <th style="text-align:right;"> Recall </th> <th style="text-align:right;"> Precision </th> <th style="text-align:right;"> Fscore </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> klein/adj </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 0.600 </td> <td style="text-align:right;"> 0.857 </td> <td style="text-align:right;"> 0.706 </td> </tr> <tr> <td style="text-align:left;"> beperkt/adj </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.233 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.378 </td> </tr> <tr> <td style="text-align:left;"> bescheiden/adj </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.133 </td> <td style="text-align:right;"> 0.800 </td> <td style="text-align:right;"> 0.229 </td> </tr> <tr> <td style="text-align:left;"> gebeur/verb </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.133 </td> <td style="text-align:right;"> 0.667 </td> <td style="text-align:right;"> 0.222 </td> </tr> <tr> <td style="text-align:left;"> op/prep </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 0.967 </td> <td style="text-align:right;"> 0.121 </td> <td style="text-align:right;"> 0.216 </td> </tr> </tbody> </table> ] --- ## Cluster 5 .pull-left[ <!-- --> ] .pull-right[ ### Most characteristic context words <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Context word </th> <th style="text-align:right;"> Frequency </th> <th style="text-align:right;"> Recall </th> <th style="text-align:right;"> Precision </th> <th style="text-align:right;"> Fscore </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> met/prep </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.375 </td> <td style="text-align:right;"> 0.333 </td> <td style="text-align:right;"> 0.353 </td> </tr> <tr> <td style="text-align:left;"> een/det </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 0.542 </td> <td style="text-align:right;"> 0.210 </td> <td style="text-align:right;"> 0.302 </td> </tr> <tr> <td style="text-align:left;"> in/prep </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 0.542 </td> <td style="text-align:right;"> 0.197 </td> <td style="text-align:right;"> 0.289 </td> </tr> <tr> <td style="text-align:left;"> oven/noun </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.125 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.222 </td> </tr> <tr> <td style="text-align:left;"> plat/adj </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.125 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.222 </td> </tr> </tbody> </table> ] --- ## Cluster 6 .pull-left[ <!-- --> ] .pull-right[ ### Most characteristic context words <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Context word </th> <th style="text-align:right;"> Frequency </th> <th style="text-align:right;"> Recall </th> <th style="text-align:right;"> Precision </th> <th style="text-align:right;"> Fscore </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> gewicht/noun </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 1.000 </td> </tr> <tr> <td style="text-align:left;"> leg/verb </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 0.611 </td> <td style="text-align:right;"> 0.786 </td> <td style="text-align:right;"> 0.688 </td> </tr> <tr> <td style="text-align:left;"> in/prep </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 0.273 </td> <td style="text-align:right;"> 0.429 </td> </tr> <tr> <td style="text-align:left;"> de/det </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 0.944 </td> <td style="text-align:right;"> 0.111 </td> <td style="text-align:right;"> 0.199 </td> </tr> <tr> <td style="text-align:left;"> zal/verb </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.167 </td> <td style="text-align:right;"> 0.231 </td> <td style="text-align:right;"> 0.194 </td> </tr> </tbody> </table> ] --- layout: false class: title-slide .myh[ # Variation ] --- ## Eight representative models .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Same tokens across models .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- layout: false class: title-slide .myh[ # Final thoughts ] --- ## Observations - Most senses of *schaal* are quite well distinguished. + And the 'dish' homonym is close together. - *Richter*: + Very well isolated sense... because it's an idiomatic expression. + It is not close to the other senses of the homonym. ### Automatic clustering - Idiomatic expressions are identified by their collocates: *gewicht*, *Richter*... *groot*? - Different collocates in a frequent sense may stand out (*groot*, *klein*, *Europees*...) --- .pull-left[ ## More generally Token-level distributional models for polysemy studies return <s>**semantic**</s> **contextual** patterns: - We cannot rely blindly on the clustering to replace semantic annotation: + clusters can represent parts of senses or be semantically heterogeneous + distributional distinctiveness does not map to semantic distinctiveness - but they are still a rich source of insight: + relative weight of patterns + relationships between patterns + discovery of relevant facets via distributional patterns ] .pull-right[ .measure-narrow.br3.shadow-5.grow[ [](https://cloudspotting.marianamontes.me) ] ] --- layout: false class: title-slide .mythanks[ # Thank you! [mariana.montes@kuleuven.be](mailto:mariana.montes@kuleuven.be) [dirk.geeraerts@kuleuven.be](mailto:dirk.geeraerts@kuleuven.be) <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#0266a0;" xmlns="http://www.w3.org/2000/svg"> <path d="M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z"></path></svg> [https://slides.montesmariana.me/aelco](https://slides.montesmariana.me/aelco) ] --- # References — <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#0266a0ff;" xmlns="http://www.w3.org/2000/svg"> <path d="M448 360V24c0-13.3-10.7-24-24-24H96C43 0 0 43 0 96v320c0 53 43 96 96 96h328c13.3 0 24-10.7 24-24v-16c0-7.5-3.5-14.3-8.9-18.7-4.2-15.4-4.2-59.3 0-74.7 5.4-4.3 8.9-11.1 8.9-18.6zM128 134c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm0 64c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm253.4 250H96c-17.7 0-32-14.3-32-32 0-17.6 14.4-32 32-32h285.4c-1.9 17.1-1.9 46.9 0 64z"></path></svg> .f6[ Campello, Ricardo J. G. B., Davoud Moulavi & Joerg Sander. 2013. Density-Based Clustering Based on Hierarchical Density Estimates. In Jian Pei, Vincent S. Tseng, Longbing Cao, Hiroshi Motoda & Guandong Xu (eds.), *Advances in Knowledge Discovery and Data Mining*, 160–172. Berlin, Heidelberg: Springer. Church, Kenneth Ward & Patrick Hanks. 1989. Word association norms, mutual information, and lexicography. In ACL ’89: *Proceedings of the 27th annual meeting on Association for Computational Linguistic*, 76–83. Association for Computational Linguistics. Firth, John Rupert. 1957. A synopsis of linguistic theory 1930-1955. In John Rupert Firth (ed.), *Studies in Linguistic Analysis*, 1–32. Oxford: Blackwell. Harris, Zellig S. 1954. Distributional structure. *Word.* 10(2–3). 146–162. Heylen, Kris, Thomas Wielfaert, Dirk Speelman & Dirk Geeraerts. 2015. Monitoring polysemy: Word space models as a tool for large-scale lexical semantic analysis. *Lingua 157*. 153–172. Kaufman, Leonard & Peter J. Rousseeuw. 1990. Partitioning Around Medoids (Program PAM). In *Finding Groups in Data: An Introduction to Cluster Analysis*, 68–125. Hoboken, NJ, USA: John Wiley & Sons, Inc. Maaten, L.J.P. van der & G.E. Hinton. 2008. Visualizing high-dimensional data using t-SNE. *Journal of Machine Learning Research 9*. 2579–2605. Montes, Mariana. 2021. *Cloudspotting: visual analytics for distributional semantics*. Leuven: KU Leuven PhD Dissertation. Schütze, Hinrich. 1998. Automatic Word Sense Discrimination. *Computational Linguistics 24*(1). 97–123. ] --- # Code — <svg viewBox="0 0 640 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#0266a0ff;" xmlns="http://www.w3.org/2000/svg"> <path d="M278.9 511.5l-61-17.7c-6.4-1.8-10-8.5-8.2-14.9L346.2 8.7c1.8-6.4 8.5-10 14.9-8.2l61 17.7c6.4 1.8 10 8.5 8.2 14.9L293.8 503.3c-1.9 6.4-8.5 10.1-14.9 8.2zm-114-112.2l43.5-46.4c4.6-4.9 4.3-12.7-.8-17.2L117 256l90.6-79.7c5.1-4.5 5.5-12.3.8-17.2l-43.5-46.4c-4.5-4.8-12.1-5.1-17-.5L3.8 247.2c-5.1 4.7-5.1 12.8 0 17.5l144.1 135.1c4.9 4.6 12.5 4.4 17-.5zm327.2.6l144.1-135.1c5.1-4.7 5.1-12.8 0-17.5L492.1 112.1c-4.8-4.5-12.4-4.3-17 .5L431.6 159c-4.6 4.9-4.3 12.7.8 17.2L523 256l-90.6 79.7c-5.1 4.5-5.5 12.3-.8 17.2l43.5 46.4c4.5 4.9 12.1 5.1 17 .6z"></path></svg> .f6[ Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert & Barbara Borges. 2021. shiny: Web application framework for r. Manual. https://shiny.rstudio.com/. Hahsler, Michael, Matthew Piekenbrock & Derek Doran. 2019. dbscan: Fast density-based clustering with R. Journal of Statistical Software 91(1). 1–30. https://doi.org/10.18637/jss.v091.i01. Krijthe, Jesse. 2018. Rtsne: T-distributed stochastic neighbor embedding using a barnes-hut implementation. https://github.com/jkrijthe/Rtsne. Sevenants, Anthe, Montes, Mariana, & Wielfaert, Thomas. (2022). NephoVis (1.1.0). Zenodo. https://doi.org/10.5281/zenodo.6629350 QLVL. 2021. nephosem. Zenodo. https://doi.org/10.5281/ZENODO.5710426. ] ----- *If you want to apply this methodology, you can find the python code [here](https://montesmariana.github.io/semasioFlow/tutorials/createClouds.html) and the R code [here](https://montesmariana.github.io/semcloud/articles/processClouds.html) (they are used in sequence); the github repository for the [Shiny App](https://marianamontes.shinyapps.io/Level3/), which combines HDBSCAN output, is [here](https://github.com/montesmariana/Level3). The repository for the visualization tool is [here](https://github.com/qlvl/NephoVis)*. --- # Materials - Corpus of Dutch and Flemish Newspapers + 520MW + 1990-2004 - From 8 different nouns + 240-320 random occurrences + Homonyms, at least one polysemous - Manual annotation based on dictionary senses - About 200 models combining different parameter settings