For a recent class project I was instructed to compare different text analyzing tools and give a description of the pros and cons of each, along with my thoughts on using them. The different tools I will be comparing are Juxta Commons, Voyant, and TAPoRware.
I chose to visualize a text I worked on in my undergraduate philosophy days: Immanuel Kant’s Critique of Pure Reason. As the original text was written in German, a language I only know a couple words in, I have had to rely on translations when reading it. So what would be interesting for me to see is how various people have translated the work and looking at how close they are to each other, or how different they are.
As the original text approaches 500 pages, I narrowed down the scope of this project by focusing on the introduction to the work. The introduction is a little over twenty pages of text. This still provides me with plenty of material to examine for the project.
Selection for Comparison:
- Critique of Pure Reason by Immanuel Kant Translated by J. M. D. Meiklejohn
- Critique of Pure Reason by Immanuel Kant Translated by Norman Kemp Smith
Before I could begin examining the works, I needed to manipulate the text and make them as similar as I could in structure without changing anything of substance. I copied the text from their respective sources and then saved them into a .txt file. For the Maiklejohn translation I needed to rework the line length to make it more comparable with the Smith translation. I also deleted the footnotes and added spacing between the paragraphs for the Maiklejohn translation. For the Smith translation I needed to remove the page numbering and the various commentary, notes, and explanations that were in the online text.
Results of Juxtacommons:
Juxtacommons is a ” tool that allows you to compare and collate versions of the same textual work.” (Juxtacommons.org) Within Juxta there are multiple tools that are useful. These tools include: Heat Maps, Side by Side Views, Histograms, Parallel Segmentation, Edition Starter, and Versioning Machine. The last two of these, at the time of this writing, are experimental. I will try and explain more about this tool as I go through it, but for a better explanation see A User Guide to Juxta Commons.
A special note before using these tools. As some of these take a while to visualize, it would be advisable to have something else to work on while you wait. Also, you will need to create an account to use their website, but don’t worry- it is free.
What we can see here is the degree of variance between the text selected and the other texts that we have in our comparison “witness” list. “The text is color-coded to indicate the degree of variance from the base witness evident at any particular area of the text.” (“User Guide to Juxta Commons” ) In the two screen shots above, we can see that there are numerous differences in the two sections that we can see.
Side By Side View:
The Side by Side tool takes the results of the Heat Map and then throws them on a plate next to each other for easy comparison. By hovering over the shapes in the middle of the tool, the highlighted sections of the text become clearer. This allows for easy focusing on a certain passage or line of the text. What I keep seeing in the results between the two translators is that one of the authors would focus in on one part of what Kant was saying, while the other translator would sometimes focus on another aspect of what Kant was saying within the same sentence. (One of the jokes my undergrad philosophy professor would tell about Kant was that even German’s preferred reading him in translation because of how confusing his use of language was.)
With this tool it become clear from the start the variances of the way the two translators looked at the text. Though they are both still expressing Kant’s idea, how they choose to express that appears differently. For instance, in the heading to this introduction Meiklejohn translates the heading as “Of the Difference between Pure and Empirical Knowledge” while Smith translates the same heading as “The Distinction between Pure and Empirical Knowledge.” Excluding the preposition at the start of Meiklejohn’s heading, the only difference between the two of them is the noun choice between “Distinction” and “Difference.” Do you think there is any significant meaning difference between the two titles?
If we continue, we can see a clearer distinction in the way the second sentence is translated. Meiklejohn translates it as:
“For how is it possible that the faculty of cognition should be awakened into exercise otherwise than by means of objects which affect our senses , and partly of themselves produce representations, partly rouse our powers of understanding into activity, to compare to connect , or to separate these , and so to convert the raw material of our sensuous impressions into a knowledge of objects , which is called experience?”
while Smith translates the same sentence as:
“For how should our faculty of knowledge be awakened into action did not objects affecting our senses partly of themselves produce representations, partly arouse the activity of our understanding to compare these representations , and, by combining or separating them , work up the raw material of the sensible impressions into that knowledge of objects which is entitled experience?”
Besides the fact that Kant is long winded, can we pull out anything from the text through this side by side comparison? Because I read things visually, the way each author expresses this idea forms pictures in my head (e.g. the phrases “awakened into action” vs. “awakened into excercise”, or “to convert the raw material of our sensuous impressions into a knowledge of objects” vs. “work up the raw material of the sensible impressions into that knowledge of objects.” So when I look at these two translations right next to one another, I can see the differences in the verb and adjective selections and try to discern something about what that means.
What I like about using Juxta Commons is that if one of the tools is not understandable, by taking a look at one or two of the other tools in their toolbox the text becomes more understandable. It is then possible to go back and look at the tool that was incomprehensible at it’s first iteration.
Now that we have looked at Juxta Commons, let’s see what Voyant has to offer.
Results of Voyant:
Unlike Juxta Commons, Voyant only allows for one text to be analyzed at a time.
One of the first things I did in Voyant, after uploading my first file, was go to the settings for Cirrus to add a list of stop words from appearing in the word cloud. The list of words that I chose not to include was: the, of, a, do, which, as, b, any, not, has, any, so, such, which, and, to, in, be, it, its, by, an, at, h, or, with, and from.
Let’s take a closer look at the Cirrus, the word cloud for this text:
Even though I pulled out most of the superfluous words from the Meiklejohn text ,as described earlier, there are still some words that aren’t as insightful that remained. Cirrus has a limit of 75 characters as stop words. The word frequency of the remaining words is still valuable. We can discern just from this word cloud what the major topics within this text, or at least this section of the text thereof. Some of the more important words that stick out of the text include: knowledge, reason, conception, a priori, and experience.
If you take a look at the summary located at the bottom left side in the first picture for this section on Voyant, you will see that according to Voyant’s analysis there are 6,776 words and 1,181 unique words. Now opening up the Corpus box located at the bottom right of the screen we are provided with a density ((the number of unique words/ number of words)+1000). The density of words for this text is 174.3 which is pretty high.
In this next picture we can see two more aspects of Voyant’s toolkit. By selecting terms in the Words in Entire Corpus tool, we can obtain a graph of word trends and how they are dispersed throughout the text.
What I can extract from this tool is seeing where the author really begins to explore various aspects of his work. If we look at the use of conception (i.e. the green line of the graph) we can see that that term is used more throughout the middle of the text we are examining, whereas the term reason (i.e. the pink line of the graph) appears more towards the end of the text we are examining.
Knowing this allows us to better ascertain where in a text we might find a particular section that is relevant to our work. For instance, if we were looking at writing a paper on Kant’s use of the term reason in the introduction to the Critique of Pure Reason we would know where in the text we could go in and find it.
Without even needing to go back to the original text Voyant will do that for us by their Keywords in Context tool.
In the above picture we can look at how the term reason is applied throughout the text. We can look at what words the term is associated with. If you can see it, there are two terms that jump out to me that are often situated in front of the term reason: pure and human.
Let’s go ahead and take a look at our last Text Analyzing Tool- TAPoR.
Results of TAPoRware:
Before we get started with this tool, let’s remember that TAPoR is just a portal for text analysis tools. That said, let’s take a quick look at what we can find on their site. Unlike Voyant, where you only need to upload your text once, TAPoR requires that you upload your text everytime you want to utilize a different tool.
The first tool I decided to use was Pattern Distribution. What I liked about this tool was that it showed me the word count of a particular word throughout a set percentile of the work. Again, by looking at this data, and seeing where the word is most likely to appear in the text, I can then focus in on a particular facet of my research.
The next tool from TAPoR we will look at is Speech Tagger. Once the file is uploaded, we get to pick which words we want to focus on, or highlight, within the text.
Though this tool sounds like it would be beneficial from a grammarian viewpoint, I was unable to figure out how to use it. I tried selecting many different options and colors, but every time the results were the same. If someone else has had more success with this and can provide a better description, please leave a comment.
The last tool we’ll be looking at from TAPoR is List Words. Once a text is uploaded we are given options on how to proceed. For convenience sake I just selected modified Glasgow Stop Words rather than create my own set. That said, let’s see the results:
According to the results the most common word found in the text, excluding those from the stop list, is knowledge. I would have thought that “reason” would have been higher than fifth in rank. One of the benefits I see of using this tool would be in analyzing Kant’s vocabulary, or his word choices. I could also see it being beneficial to those studying philology.
So What? or How can these tools be relevant for our library?
That is a good question. How I think these tools could be of use in the library field is in assisting researchers to understand a text. All of these tools could help to bring a text to a greater understanding, but the life force behind the text will remain inherent in the text. These tools can only assist in chiseling away at a text until it’s true form is found.
There is a story about the famous Renaissance artist Michelangelo where he was asked about how he was able to carve such a wonderful statue. It was as if he had carved a real person out of marble. His response to the question was “I saw the angle in the marble, and carved until I set him free.”
That is what libraries can offer. The ability to set texts free, or the knowledge that they contain. Since we are a service oriented institution, we can assist users with these tools to help them understand all of the information they are receiving from texts.
- Nines, “A User Guide to Juxta Commons.” Accessed March 3, 2014. http://juxtacommons.org/guide