Technical Details + Methods



Website Build

This project was built using a combination of HTML, CSS, JavaScript, and various libraries and frameworks to create an interactive and visually appealing experience. The website is designed to be responsive, ensuring that it functions well on both desktop and mobile devices. The primary technologies and tools used in the development of this project include:

Technology Description
Github + Github Pages Website creation, version control, hosting the data, and open-access sharing
Bootstrap 5 Framework and template for responsive design and layout
HTML5 Used for structuring and presenting content
CSS3 Style sheets
JavaScript Integrating the interactive elements
Leaflet.js Primarily for the interactive map and visualization
GraphCommons Used for the embedded networks and their analysis
Python Where much of the computational work for data processing and analysis is done
Jupyter Notebooks Hosts the code and computing environment for data analysis
OpenRefine Used for Phase 1 data cleaning and transformation
Flourish Geolocation and mapped data visualization


Methods

In the last 25 years, there has been a noticeable shift from historical records remaining piecemeal or fragmented in boxes in archives to a sea of digitized images and texts in the form of PDFs, JPEGS, CSVs, or PNGs. A number of manuscripts are now available online. This project takes advantage of this new digital landscape to reconstruct the networks of correspondence and communication that underpinned political and military efforts during the Williamite-Jacobite conflict in Scotland from 1688 to 1692.

The initial report from the project can be found here. This project is a marriage of early modern Scottish, and in fact British imperial history and computational data science. It brings together methods of network science, prosopography, and traditional early modern political history surrounding communication. The visualizations show a story of connectivity over time. We chose this conflict because the sources do a very good job at capturing the difficulties in the reconstruction and administration of Scottish governance during such a chaotic period.

Phase one required turning qualitative data into quantitative data. We started by recording all of the pertinent data from the PDF versions of documents. We then turned this data into spreadsheet data, here we used categories: Id, Sender, Receiver, Location from, Location to, Latitude and Longitude, Type and Date. The data was then cleaned using OpenRefine to split up latitude and longitude and keywords into different columns; we also made sure there were no blank tiles and duplicates. After creating a master spreadsheet information file, we then set about creating different sheets for different visualizations including people, places, keywords, nodes, and then edges (relationships). Given the large dataset of letters, it allowed for exploratory data analysis and investigation on different digital tools to identify the best representation of the relationships presented in the papers.

One of the main tools we ended up using was the programming language Python; which contains a large number of libraries that extend the capabilities of the language, allowing for complex visualizations of the network data. One of the most prominent libraries used was networkx, which allowed the creation of network graphs along with the application of the Girvan-Newman algorithm to detect communities within the network. This algorithm works by repeatedly removing edges on the shortest path within the network. Additionally, the nodes are given corresponding colors to highlight their community, enabling an easier identification of groups in the network. The algorithm is important for understanding the network graph because a node with higher betweenness centrality would have more control over the network, since more information will pass through that node. The implementation of these libraries and creation of visuals were carried out on Jupyter notebook, which is an open-source software for interactive computing. In addition, we experimented with tools such as Leaflet, Flourish, and Gephi for further analysis of the letters.

Later we converted this data into different relational tables for graphcommons which included: Nodes (node type, node ID, description, loyalty) and Edges (from type, from name, from location, edge type, to type, to name, to location, date, weight). This allowed us to parse the networking data differently and have it ready for exploration. In expanding our dataset we're now working with a network of 611 distinct nodes. The question we faced was: How can we make use of this metadata and work which is in a qualitative state of around 529 people?