Why R ?
R is a framework and language for creating statistical, data mining and data visualization applications.
It seems like this will feature more in the business world since the strong drive towards advanced analytics; that and that now data science has now become “a thing”. Microsoft also acquired Revolution Analytics; a company that focuses heavily on the R language.
R will most likely play an important role in the future for data driven applications, especially when it comes to the Microsoft’s data offerings. It can already be used within Azure ML.
For the Business Intelligence professional, it just makes sense to have some sort of literacy around this language. A developer can now supplement their toolbox with functionality that may not be as easily done with the Microsoft SQL BI stack. For example with a small amount of code one can easily bring in some twitter content, mine it for text, create a word cloud, and share that content to your users.
Time sheet Overview
For my first mini-project I created a small script that does the following:
- Imports a time sheet for an incompleted month from Toggle (CSV)
- Projects the days left till the end of the month
- Removes the public holidays which are scraped from a web site
- Plots the projected hours for the current month vs the total hours of a typical month
First install the needed packages
Import the timesheet data
This is what the data set looks like so far:
Add a projection column
Here’s what the total.projected.time data set looks like:
Removed the public holidays by reading the web page’s HTML. In this case they’re South African holidays. A special character was giving me problems, hence the find/replace.
Add the typical monthly hours. i.e. 8 hours per day excluding weekends and holidays
Below is the final data set. When I downloaded this it was the evening of the 20th, and so I’m using 8 as an average of what I would typical work per day to end off the month. Looks like I’m lagging behind.
To plot the above figures I used the ggplot library:
Here’s the plot result:
I think I’ll try Python next. It is also a language of choice for data scientists, and even though it doesn’t have as long a data analysis history as R, it’s making quick strides with libraries like Pandas.