Master thesis done with R and tikzDevice

My master thesis is finished now and can be downloaded, along with the LaTeX source for it. I will elaborate on the content of my thesis on my Dutch blog and will discuss the technical aspect of producing the document with LaTeX here. Do note that the master thesis is entirely in English because the Public Administration master program at Leiden University is an English language program, so if you want to know more about the content of the thesis just read it.

This is the first time I’ve ever done quantitative research and used statistics. I thought the statistics part was going to be challenging. While I did read a lot on the subject to understand it and learned a lot in the process, I did not need to make calculations or complex formulas myself at all. All that is done with software. Leiden University uses the proprietary SPSS, but I preferred using a free software solution which I could use on my own PC. That’s one reason how I got to use R, with the second reason being that I had chosen a very good thesis supervisor who had knowledge of it. He taught me just what I needed to get started with it in a very short amount of time. While there are GUI’s available for R, I use it from the command line just like LaTeX.

R can be used not only to do statistical analysis of data, it can also draw visual representations of the data, such as the histogram, scatter plots and correlation matrix in my thesis document. R can write graphics output to many formats, but for PDF documents vector graphics which scale nicely should have your preference. PGF/TiKZ is often used to produce vector graphics for LaTeX and I learned that R can use the tikzDevice package to create TiKZ figures. It took some time to figure out how to get everything done properly and to get some problems fixed, but I’m very satisfied with the result right now. The combination of R with tikzDevice rocks! The only thing I could have possibly improved is using the ggplot2 package. It can handle the overplotting in some of my scatter plots better than the standard scatter plots.

When I started working with tikzDevice I missed a basic tutorial explaining how to specify width and height for TiKZ images drawn by R, among other issues. Especially getting the histogram right was very annoying to figure out, because R’s default way to draw one didn’t make sense. For one scatter plot I had to find a fix to avoid the scientific notation appearing with large numbers. Others who begin using it should find that my R scripts which are attached to the PDF document of my thesis are very good examples to get started with. According to the statistics of my weblog I get a lot of visitors who come for info on LaTeX, so I assume this will be very helpful to many people who find this post through search engines.

Regarding LaTeX itself, all the important stuff is noted in comments in the source document for the PDF. I’m satisfied that I have the surname prefixes done right in the bibliography now. The biblatex package shouldn’t need an obscure fix to get it right however, it should work like that out of the box. On the other hand the URL line and page breaks in the bibliography are still awful, and I don’t know how to fix it. I’m not so content with the section names which appear in the header on right pages either. In some cases they work because sections are long, but in most cases they are useless because a new section starts on almost every page. But after all, I think that I would score high marks for layout if that were scored separately for theses.

Edit 21/08/2012: I’ve uploaded the latest revision of my master thesis. It has data on several more respondents but this did not lead to notable changes in the conclusions. More importantly, in this version the ggplot2 package was used to do the histogram and scatter plots, they look a lot better now. The scatter plots no longer suffer from overplotting. I also had the thesis defense today, my thesis was graded with a 9 and my supervisor complimented me for the layout of the thesis. But I’m a perfectionist: the legends of three scatter plots contain numbers with decimals even though the data has only rounded numbers. I’ve asked for help on solving this already and will upload another version when I have fixed that.

Edit 04/09/2012: Now the final revision is uploaded with 64 respondents and fixed scatter plots.

One thought on “Master thesis done with R and tikzDevice”

Leave a Reply

Your email address will not be published. Required fields are marked *