{"id":46,"date":"2014-05-28T12:50:04","date_gmt":"2014-05-28T12:50:04","guid":{"rendered":"http:\/\/www.darrenumney.com\/blog\/?p=46"},"modified":"2014-08-03T08:54:48","modified_gmt":"2014-08-03T07:54:48","slug":"datafest-2013","status":"publish","type":"post","link":"https:\/\/darrenumney.com\/wordpress\/?p=46","title":{"rendered":"Datafest-2013"},"content":{"rendered":"<p>It&#8217;s creeping towards the summer and this is festival time. I&#8217;ve only been to one music festival of note and that was Latitude in 2010. I failed to\u00a0pick up a programme in advance and so spent some time during\u00a0the weekend without sufficient data to decide which of the many stages to go to. I find if you&#8217;re not careful this can lead to smoking too many cigarettes and drinking too much beer while listening to Nick Cave being Grinderman. To avoid that here in HS2 land I&#8217;ve been putting together a programme of Hansard activity &#8211; I wanted to get a feel for how much debate had taken place about this high speed railway.<\/p>\n<p><!--more--><\/p>\n<p>It&#8217;s apparently\u00a0not very\u00a0easy to search for synonyms (e.g. HS2 and High Speed Rail) or to filter out specific types of references (e.g news releases from debates) on the parliament.uk website:\u00a0<a href=\"http:\/\/www.parliament.uk\/site-information\/using-this-website\/searchhelp\/\" target=\"_blank\">the\u00a0&#8220;phrase and boolean functionality&#8221; has been removed<\/a>.\u00a0In\u00a0what could be read\u00a0as an abrogation of sovereignty the\u00a0webmaster\u00a0recommends the use of Google to perform advance searches. But\u00a0when I\u00a0followed the guidelines offered I found that the search giant wasn&#8217;t really\u00a0much help. I constrained\u00a0the search to the parliament\u00a0website, I limited the url to Hansard, I used curly quotes around my search term &#8220;high speed rail&#8221; and I filtered the dates to 1989-1990\u00a0in the hope of tracing some early developments. <a href=\"https:\/\/www.google.co.uk\/search?num=100&amp;es_sm=91&amp;tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F1989%2Ccd_max%3A1%2F1%2F1999&amp;q=inurl%3Acmhansrd+site%3Awww.parliament.uk++%22high-speed+rail%22+&amp;oq=inurl%3Acmhansrd+site%3Awww.parliament.uk++%22high-speed+rail%22+&amp;gs_l=serp.3...150680.152494.0.152861.4.4.0.0.0.0.88.300.4.4.0....0...1c.1j2.45.serp..4.0.0.WE1BWsj3Hyw\" target=\"_blank\">Nothing<\/a>. I extended the date range progressively up to\u00a02010-2014 and switched to &#8220;HS2&#8221;. <a href=\"https:\/\/www.google.co.uk\/search?num=100&amp;es_sm=91&amp;tbs=cdr%3A1%2Ccd_min%3A01%2F01%2F2010%2Ccd_max%3A01%2F01%2F2014&amp;q=inurl%3Acmhansrd+site%3Awww.parliament.uk++%22HS2%22&amp;oq=inurl%3Acmhansrd+site%3Awww.parliament.uk++%22HS2%22&amp;gs_l=serp.3...6459.6962.0.7122.3.3.0.0.0.0.80.208.3.3.0....0...1c.1.45.serp..3.0.0.Rufjlqgi2FY\" target=\"_blank\">Still nothing<\/a>. I went back to the basic search on the parliament website and tried throwing a few of the Google tips into the search box there and after some experimentation was able to come up with what seemed like a reasonable way to proceed. A search for HS2 and &#8220;cmhansrd&#8221; (this is how the website differentiates between the Commons and the Lords Hansard records in its URLs) <a href=\"http:\/\/www.parliament.uk\/search\/results\/?q=hs2+cmhansrd\">produces 592 results<\/a>\u00a0(a screenshot is below).<\/p>\n<p><a href=\"http:\/\/www.darrenumney.com\/blog\/wp-content\/uploads\/2014\/05\/hansardresults.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-60\" src=\"http:\/\/www.darrenumney.com\/blog\/wp-content\/uploads\/2014\/05\/hansardresults.jpg\" alt=\"hansardresults\" width=\"503\" height=\"302\" \/><\/a><\/p>\n<p>A similar search for &#8220;high speed rail&#8221; gives 908. Duplicate these two searches\u00a0for the House of Lords (using &#8220;ldhansrd&#8221;) and the total number of results\u00a0generated is 1768.<\/p>\n<p>But four separate searches, each displaying a maximum of 100 results at a time, isn&#8217;t conducive to the\u00a0&#8220;getting a feel for the debate&#8221; that I was hoping for. Well, it feels <em>pretty big<\/em> but there is a popular notion that size isn&#8217;t everything.\u00a0Next stage then was to consider how best to collate all of the results into one place &#8211; at least it would be handy to know <em>how big<\/em> it is. This was necessary not only for the sake of a good overview of the data but also to be able to work out how to weed out the\u00a0inevitable\u00a0duplicates\u00a0that were appearing across the HS2 and high speed rail results.<\/p>\n<p>Here\u00a0is the eventual method in its entirety.<br \/>\n1: Identify relevant terms<br \/>\n2: Perform search and view the source text of the results in the web browser<br \/>\n3: Copy the displayed results section of source text into a text editor<br \/>\n4: Repeat for each page\u00a0of results and each separate search<br \/>\n5: Use regular expression matching to remove extraneous html code and then to add structure to the\u00a0relevant data points (HS2\/HSR, Commons\/Lords, Debate\/Text\/Written, URL, date, summary text)<br \/>\n6: Copy structured text into Excel<br \/>\n7: Use formula to detect and remove identical urls<\/p>\n<p>This process converts 20\u00a0visually styled (and next to useless) web pages (as reproduced in the image above) into one spreadsheet. This spreadsheet consists of\u00a01432 individual references to HS2 and High Speed Rail that have occurred in either the House of Commons or House of Lords debating chambers since 1989.<\/p>\n<p>Now what was I wanting them for?\u00a0Wordpress doesn&#8217;t like that many lines of code all in one go, at least not in the visual editor and the web browsers are also not keen. If you&#8217;re still reading this you probably are keen and if you want to\u00a0test out your browser <a href=\"http:\/\/www.darrenumney.com\/pages\/1432.html\" target=\"_blank\">the full list is here<\/a>. Actually this list is what I wanted &#8211; a simple way to access every instance from a single location.<\/p>\n<p>To give a very brief quantitative summary of what&#8217;s here the following is interesting, in a trainspotterly kind of way. There are no surprises. A background noise of references from 1989 onwards matches the on\/off debate around high speed rail that has been documented elsewhere &#8211; the House of Commons Library do a very good job at providing <a href=\"http:\/\/www.parliament.uk\/briefing-papers\/RP11-75\/high-speed-two-hs2-the-debate\" target=\"_blank\">background papers on this kind of thing<\/a>. The debate starts to pick up in 2003\u00a0when the Channel Tunnel Rail Link falteringly becomes\u00a0High Speed 1, leading up to\u00a0the Section 2 St Pancras link opening in 2007. The Labour government set up HS2 Ltd in 2009 but they failed to regain power in the next general election. The subsequent review of HS2 by the newly formed coalition took place\u00a0in 2010. Ongoing consultations since then have kept up the level of interest. The inclusion of the\u00a0<a href=\"http:\/\/services.parliament.uk\/bills\/2013-14\/highspeedrailpreparation\/stages.html\">HS2 (Preparation) Bill<\/a>\u00a0 into the parliamentary business schedule\u00a0for 2013 ensured that year saw\u00a0more references than the last two years added together.<\/p>\n<p><a href=\"http:\/\/www.darrenumney.com\/blog\/wp-content\/uploads\/2014\/05\/hansardresults1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-61\" src=\"http:\/\/www.darrenumney.com\/blog\/wp-content\/uploads\/2014\/05\/hansardresults1-1024x536.jpg\" alt=\"hansardresults\" width=\"700\" height=\"366\" \/><\/a><\/p>\n<p>Although I started this piece in a field in Suffolk surrounded by music fans I don&#8217;t think the main purpose of the exercise is to create a top of the pops of parliamentary business. But the debate takes shape in a particular way when viewed like this and there are various ways that this shape reflects events across the wider discourse. This is seen in the potted history above but would also make sense when positioned against parallel developments such as road projects &#8211; the M1 is a particularly good example.<\/p>\n<p>There are also multiple perspectives from within the debates themselves. The relationship between the use of terms HS2 and HSR and the relationship between the House of Commons and the House of Lords are two structural points of possible interest.\u00a0If I want to stay out of Marlboro country I need to get back to that list of links and start putting some qualitative flesh onto these quantitative bones.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s creeping towards the summer and this is festival time. I&#8217;ve only been to one music festival of note and that was Latitude in 2010. I failed to\u00a0pick up a programme in advance and so spent some time during\u00a0the weekend without sufficient data to decide which of the many stages to go to. I find [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-46","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/46","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=46"}],"version-history":[{"count":19,"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/46\/revisions"}],"predecessor-version":[{"id":163,"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/46\/revisions\/163"}],"wp:attachment":[{"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=46"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=46"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/darrenumney.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=46"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}