Posted in PowerBI, R

Getting your R script “out there” with PowerBI

PB_cur_splash

PowerBI offers a really convenient way to consume the insights that are modelled by a data scientist or other analyst that is literate in R.

The problem I faced with using R-Studio was that of making it available for consumption. I would ideally want users to pull the data rather than have to push the results of a script to them. RStudio does provide a nice looking solution to this with Shiny, but it requires some work – both to setup and to learn.

PowerBI on the other hand, takes next to no time to get started, and has a very small learning curve. Your dashboard can be deployed to the cloud and consumed without being restricted by geography.

I just did the following:

I created a new model in PowerBI.

I then used the R Script option in the Get Data menu (R must be installed on your machine):

 

get r script

I took the script I used in the previous blog post, removed the outputs (since I’m handling that from PowerBI), and pasted it into the R Script dialogue box:

pastscriptR

The Navigator box comes up, and turns each data frame into a data set.

Just select the dataset you want to work with and click “Load”r nav

The dataset is now available to be consumed in a PowerBI report/dashboard

datasetrpowbi

 

Now you’ve got all the visualization niceties you need to make a quick dashboard.

I took the opportunity to take a look at the new TimeBrush visualization in the Visuals Gallery. This allowed me to make a simple line graph interactive with regards to the period:

Here’s the full data period:

fl pi

Versus 2014-2016

fl pi

 

 

 

 

 

Advertisements
Posted in R

Oil Price Trend in South African Rands – R

Rplot.pngSouth Africa has recently being experiencing some hikes in the petrol price. We're paying more for petrol now that oil is under $50 a barrel then we were when it was over a $100 a barrel.

We haven’t been able to take advantage of the low oil prices because our currency has weakened considerably since last year.

Below is an R script for plotting the oil price in ZAR using the Quandl package for exchange rates and oil prices. The output shows that, even factoring in the weakened currency, our petrol price should look better than it did in 2014. This gives an indication of the effect of taxes and levies on the petrol price, and the risk that the country is in if the oil price goes back to 2014 levels.

First the packages

Quandl has a great repository of free and paid data sets, which can be consumed via Python, R, Excel, and even through your browser. I used this package to source historic exchange rates and oil prices.

The Lucr package accesses more recent exchange rate information. It uses the Open Exchange Rates site. You need a key. Luckily there’s a free option.

The ggplot2 package provides the graph plotting.


#install.packages("Quandl")
#install.packages("lucr")
#install.packages("ggplot2")

library(Quandl)
library(lucr)
library(ggplot2)

Get the exchange rates


lucr.key <- "GeTyOurOwNkEy"

###Get Historic exchange data
stg.daily.exchange.rates <- Quandl("FRED/DEXSFUS")

#Add latest exchange rate
stg.latest.exchange.rate <- currency_convert(1, from = "USD", to = "ZAR", key = lucr.key)

latest.exchange.rate <- data.frame(Sys.Date(),stg.latest.exchange.rate)
names(latest.exchange.rate) <- c("DATE","VALUE")

stg2.daily.exchange.rates <- rbind(stg.daily.exchange.rates, latest.exchange.rate)

daily.exchange.rates <- stg2.daily.exchange.rates[order(stg2.daily.exchange.rates$DATE),]

str(daily.exchange.rates)

remove(stg.daily.exchange.rates)
remove(stg2.daily.exchange.rates)

Get the oil price in USD

 
###Get historic oil prices
stg.oil.prices <- Quandl("CHRIS/CME_CL1") #Quandl("OPEC/ORB")
stg.oil.prices <- data.frame(stg.oil.prices$Date,stg.oil.prices$Last)
names(stg.oil.prices) <- c("Date","Value")

#Carry latest oil price to latest exchange rate

stg.oil.price.carryforward <- data.frame( latest.exchange.rate$DATE,
                                          stg.oil.prices[stg.oil.prices$Date == max(stg.oil.prices$Date),2]
                                          
)
names(stg.oil.price.carryforward) <- c("Date","Value")

oil.prices <- rbind(stg.oil.prices, stg.oil.price.carryforward)

remove(stg.oil.price.carryforward)
remove(stg.oil.prices)

Because Quandl exchange rates for South Africa are a bit behind (about a week),
I took the latest currency from lucr and applied that to the latest oil price
that I received from Quandl (the previous day).


###Oil Price in ZAR
stg.oil.prices.ZAR <- merge(x = oil.prices, y = daily.exchange.rates, by.x = "Date", by.y = "DATE")

names(stg.oil.prices.ZAR) <- c("Date", "Oil.Price","ZAR.per.Dollar")

oil.prices.ZAR <- data.frame(
  stg.oil.prices.ZAR$Date,
  stg.oil.prices.ZAR$Oil.Price * stg.oil.prices.ZAR$ZAR.per.Dollar,
  stg.oil.prices.ZAR$Oil.Price,
  stg.oil.prices.ZAR$ZAR.per.Dollar
)
remove(stg.oil.prices.ZAR)
names(oil.prices.ZAR) <- c("Date","Oil.Price.ZAR","Oil.Price.USD","Exchange.Rate")


Here’s a sample set of rows:
Capture
Then we’ll use the ggplot2 library to plot our graphs:

display.from.date <- "2005-01-01"

display.oil.prices.ZAR.long <- oil.prices.ZAR[oil.prices.ZAR$Date >= display.from.date,]

display.from.date <- "2013-01-01"

display.oil.prices.ZAR.short <- oil.prices.ZAR[oil.prices.ZAR$Date >= display.from.date,]


qplot(x = display.oil.prices.ZAR.long$Date, y = display.oil.prices.ZAR.long$Oil.Price.ZAR,
      xlab = "Date", ylab = "Oil Price per Barrel (ZAR)", main = "Oil Price Over Time",
      data = display.oil.prices.ZAR.long,
      colour = -Exchange.Rate,
      geom = "line")


qplot(x = display.oil.prices.ZAR.short$Date, y = display.oil.prices.ZAR.short$Oil.Price.ZAR,
      xlab = "Date", ylab = "Oil Price per Barrel (ZAR)", main = "Oil Price Over Time",
      data = display.oil.prices.ZAR.short,
      colour = -Exchange.Rate,
      geom = "line")

The “long” plot

Rplot

The “short” plot

Rplot01

It looks like levies and taxes are playing a large part on the petrol price. Even with the South African Rand’s weak rate, we’re in a better position than we were in 2014.

Next, I’ll take advantage of PowerBI’s R capabilites by using the R script as a data layer, and leveraging PowerBI’s visuals to communicate the information.

 

Posted in SQL, SSAS, SSIS, SSRS, Uncategorized

SQL Server Upgrade Caveats

breaking

Microsoft has some useful information regarding breaking changes, and discontinued functionality for those upgrading versions.

I’ve listed on a high level what to watch out for, plus the link to the Microsoft page if you need more info.

Remember that these issues are accumulative. So any issues in 2012 will affect a 2014/2016 upgrade if you’re still on 2008.

Also, 2016 is still in CTP, so things could still change before RTM.

 

Posted in R, SQL, SSAS, SSIS, SSRS

Reasons to wait for SQL Server 2016

download

Preamble

If you’ve got an old implementation of SQL Server that you use for BI and DWH, say 2008 R2, and you’re planning to upgrade soon, you might be tempted to upgrade to SQL Server 2014.

Many IT departments will wait for SP1 to be released before upgrading to a newer version.

2016 RTM is set to be released in the second quarter of 2016, with SP1 out of sight.

Should you just upgrade to 2014 or wait for SP1 of 2016 to be released?

You might consider going through both, but after doing the first upgrade you might lose the will to do the second upgrade – after all you’ve got a lot vying for your time and attention.

The company that I’m working at is currently having this very dilemma.

Features

Here’s a list of features that only come with 2016, that I thought make it worthwhile to wait:

  • Database
    • PolyBase
    • Stretch database – a nice option for archived, relatively inactive data
    • Column store index performance enhancements for Analytical queries
    • Query Store – a new way to troubleshoot performance issues.
    • Row level security.
  • SSIS 
    • Most changes happened in 2012. The only change in 2014 is rich XML validation in the XML Task.
    • Support for Power Query
    • Deployment enhancements
    • Debugging enhancements
    • Package management enhancements
    • Connectivity enhancements e.g. Hadoop, new Excel, SAP BW, and Azure
    • Better design experience
    • Support for R-Services
  • DQS
    • Nothing for 2014 or 2016
  • SSAS
    • Direct Query enhancements for tabular models
    • Tabular Model Scripting Language (TMSL) supported in Management Studio
    • Calculated tables in SSDT
    • Partition management for Tabular models
    • DBCC for Analysis Services
    • New DAX Functions for Tabular mode
    • Parallel processing for multiple table partitions in Tabular models
    • 2014 : “With exception to added functionality supporting Power View Reports against Multidimensional Models, SQL Server 2014 Analysis Services is unchanged from the previous release.”
  • MDS
    • Custom indexes on entities
    • Extend Business Rules with SQL  (Business Rules Extension)
    • The ability to submit changes for approval through  “Change Sets”
    • Scalability and performance enhancements on the server (compressed data etc).
    • Excel Addin loads data faster and can handle larger entities. (15x faster)
    • Transaction log maintenance
    • Enhanced tracing for debugging and troubleshooting.
    • Improved manageability
    • Entity Sync Relationship  – share entity data between models
    • Many-to-Many Relationships in Derived Hierarchies
    • 2014: No enhancements for 2014
  • SSRS
    • New SSRS Web Portal
    • Mobile Report Publisher
    • SQL Server mobile reports hosted in Reporting Services available in Power BI Mobile app
    • Pin Report Items to a Power BI Dashboard
    • PowerPoint Rendering and Export
    • New charts
  • R Services
    • Run R scripts using TSQL
    • Advanced Analytics Extensions – extend Transact-SQL with supported languages such as R, Python, and JSON.

 

Posted in R

R language first try

images

Why R ?

R is a framework and language for creating statistical, data mining and data visualization applications.

It seems like this will feature more in the business world since the strong drive towards advanced analytics; that and that now data science has now become “a thing”.  Microsoft also acquired Revolution Analytics; a company that focuses heavily on the R language.

R will most likely play an important role in the future for data driven applications, especially when it comes to the Microsoft’s data offerings. It can already be used within Azure ML.

For the Business Intelligence professional, it just makes sense to have some sort of literacy around this language. A developer can now supplement their toolbox with functionality that may not be as easily done with the Microsoft SQL BI stack. For example with a small amount of code one can easily bring in some twitter content, mine it for text, create a word cloud, and share that content to your users.

Time sheet Overview

For my first mini-project I created a small script that does the following:

  • Imports a time sheet for an incompleted month from Toggle (CSV)
  • Projects the days left till the end of the month
  • Removes the public holidays which are scraped from a web site
  • Plots the projected hours for the current month vs the total hours of a typical month

Code

First install the needed packages

ScreenClip

Import the timesheet data

ScreenClip

This is what the data set looks like so far:

ScreenClip

Add a projection column

ScreenClip

Here’s what the total.projected.time data set looks like:

ScreenClip

Removed the public holidays by reading the web page’s HTML. In this case they’re South African holidays. A special character was giving me problems, hence the find/replace.

ScreenClip

Add the typical monthly hours. i.e. 8 hours per day excluding weekends and holidays

ScreenClip

Below is the final data set. When I downloaded this it was the evening of the 20th, and so I’m using 8 as an average of what I would typical work per day to end off the month. Looks like I’m lagging behind.

ScreenClip

To plot the above figures I used the ggplot library:

ScreenClip

Here’s the plot result:

Rplot02

I think I’ll try Python next. It is also a language of choice for data scientists, and even though it doesn’t have as long a data analysis history as R, it’s making quick strides with libraries like Pandas.