<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title>Corpus Linguistics &amp;amp; Statistics @ UoBham, 11/02/16 by John Williams</title>
      <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o</link>
      <description>Scroll down and along to see all notes</description>
      <language>en-us</language>
      <pubDate>2016-02-11 13:58:30 UTC</pubDate>
      <lastBuildDate>2025-11-30 21:24:49 UTC</lastBuildDate>
      <webMaster>hello@padlet.com</webMaster>
      <image>
         <url></url>
      </image>
      <item>
         <title>Michaela Mahlberg: Intro</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94590102</link>
         <description><![CDATA[<ul><li>Language as a social phenomenon</li><li>Meaning and form are associated (lexico-grammar)</li><li>CL prioritizes lexis<ul><li>- in texts and between texts</li></ul></li><li>Meaning is based on evidence of interaction: selection of linguistically relevant patterns, depends on RQ</li></ul><div><br>"Scholars don't pay enough attention to what non-scholars think about the world." (Proctor 2012)<br><br></div><div><br></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 14:07:27 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94590102</guid>
      </item>
      <item>
         <title>Simon Preston: Corpus Analysis from Math perspective</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94595397</link>
         <description><![CDATA[<div><br></div><ul><li>Use old existing (easier?) solutions</li><li>Corpus analysis in terms of input (math representation of corpus X) &amp; output - f(X)</li><li>Deciding on X (eg. 'bag of words' representation --&gt; matrix) forces us to decide what we retain and discard</li><li>Try to represent it as a relationship between simpler matrices ('matrix factorization')</li><li>'bag of words' representation discards information about the order of words</li><li>get round that with co-occurrence matrix --&gt; network visualization</li><li>Challenges:<ul><li>How to analyse time-structured corpora &amp; co-occurrence networks (or combinations thereof - 'time-dependent networks')</li></ul></li></ul><div><br></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 14:22:34 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94595397</guid>
      </item>
      <item>
         <title>Thompson, Murakami, Hunston: Topic Modelling</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94603737</link>
         <description><![CDATA[<div><br></div><ul><li>'Bottom-up' approach, no prior model of corpus</li><li>'Topic' defined in terms of probability distribution of fixed vocabulary</li><li>We can model a document in terms of rolling a 'die' for choosing a topic, then rolling second die for the choice of words within that. 'Topic modelling' sort of reverses that process and tries to recreate the (irregular) shape of these dies.</li><li>Tested on research papers in the environmental domain, with a roster of 60 topics ('documents' may be component parts of papers, 'text chunks' --&gt; topics may be recurring at characteristic points in research papers)</li><li>Topics may be prominent at particular time periods</li><li>Topics can be envisaged in terms of individual words,co-occurrence of invididual words, or n-grams</li></ul>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 14:43:26 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94603737</guid>
      </item>
      <item>
         <title>Viola Wiegand: Identifying surveillance discourses</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94613171</link>
         <description><![CDATA[<ul><li>The journal 'Surveillance &amp; Society' mined for key words in the domain</li><li>Looked for common features for common patterns across the 13 volumes of the journal</li><li>Used 'key keywords', 'lockwords' &amp; co-occurrence</li><li>'KKs' are keywords that occur across a large number of texts in the corpus</li><li>'Lockwords' - words that are stable in frequency across texts</li><li>Found 28 items that were both KK and lockwords --&gt; looked at co-occurrent pairs, these can be mapped ('co-occurrence networks')</li><li>Linguistic interpretation still necessary - "No purely statistical analysis of language can reveal meaning".</li></ul>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 15:06:00 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94613171</guid>
      </item>
      <item>
         <title>Yves van Gennip: Graphical representations of a corpus, clustering</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94629564</link>
         <description><![CDATA[<ul><li>Co-occurrence: need to make decisions about window-size, directionality, weighting - before making a graph</li><li>Graph = network of nodes/vertices and edges/links</li><li>Thickness of edge can indicate weightedness</li><li>Using a block matrix to indicate strength of co-occurrence</li></ul>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 15:48:06 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94629564</guid>
      </item>
      <item>
         <title>Hennessy: Time-dependency in corpora</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94638677</link>
         <description><![CDATA[<ul><li>'binning' = grouping documents in an axis of your matrix (eg. corresponding to time periods, then making simpler matrices based on each bin</li><li>width of bins depends on RQ &amp; data</li><li>bins can overlap</li><li>'kernel' - a kind of scaled bin to give a more 'realistic' view of effect size</li><li>this is a form of statistical smoothing</li><li>possible kernels come in a set of classic mathematical shapes</li></ul><div><br></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 16:11:33 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94638677</guid>
      </item>
      <item>
         <title>Smyth, Bull: The right to read = The right to mine</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94644894</link>
         <description><![CDATA[<div>(Librarians)<br><br></div><ul><li>Data-Asset-Method: Harnessing the Infinite Archive --&gt; a set of protocols for storing corpora uniformly, available for networks of researchers</li><li>UK copyright legislation 2014: computational analysis and transformation of data does not infringe copyright ('the right to mine' - for non-commercial purposes; may not apply to some international collaboration)</li><li>lot of publishers making their content available digitally</li><li>CORE portal for accessing open-access articles</li><li>CROSSREF is making these articles available for data mining</li></ul>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 16:28:20 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94644894</guid>
      </item>
      <item>
         <title>Laurence Anthony: DIY corpus tools creations: pros and cons</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94652968</link>
         <description><![CDATA[<div>Design of new corpus tools tends to be dominated by computer scientists<br>Corpus users tend to do research 'inside the box' --&gt; AntConc downloads still going up --&gt; Why aren't the new generation developing their own tools? Why are we not all programming?<br><br>LA presents both sides of the debate:<br>Pros:<br>- (Biber) if you learn programming, you can do what you want, you are in the driver's seat, it can be cheaper<br>- (Gries) "inflexible software creates inflexible research"<br>- (Davies) distinguishes 'corpus users' from 'corpus creators'<br>Advice:<br>- Pick a popular language, eg. Python, Scratch (on Raspberry Pi), Java, R (very ugly)<br>- Read a programming book<br>- Join Stack Overflow<br><br>Against:<br>- Most corpus users can 'get by' with current tools<br>- Researchers in many fields do not develop their own tools: it tends to be the fruit of collaboration between researchers &amp; engineers<br>- Programmers are a different world. DIY tools risk being less accurate, slower<br>Advice:<br>- Decide your research question before selecting your tool/method<br>- Learn to use a *good* text editor, eg, Notepad++, TextWrangler<br>- Read the user guide<br>- Be proactive in contacting specialists<br>- Provide motivation for getting specialists involved, treat them as part of the team<br>- Understand the limitations of your tools and potential alternatives<br>"Life is short"<br><br>LA comes down on pro-programming side, but with teams integrated from the start. Even if users use only standard tools, they would benefit from a bit of programming nous.<br><br><br><br>Check out: wordwanderer.org</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-02-11 16:47:37 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94652968</guid>
      </item>
      <item>
         <title>Programme</title>
         <author>johnxwilliams1</author>
         <link>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94722307</link>
         <description><![CDATA[<div>Accuracy of my notes not guaranteed, esp where statistics is concerned !  Apologies for any misrepresentation</div>]]></description>
         <enclosure url="https://padlet-uploads.storage.googleapis.com/65389709/13f073904b0c612e966c965e5bbbb53b9345ec6c/5b9716026897072ebdd888c6f520f179.png" />
         <pubDate>2016-02-11 20:00:16 UTC</pubDate>
         <guid>https://padlet.com/johnxwilliams1/y7v657kn6p3o/wish/94722307</guid>
      </item>
   </channel>
</rss>
