OpenReliability.org

Using R Packages on Windows

Using R Packages on Windows

Getting started with R on Windows

If you have arrived at our site with no previous experience with R and are running Windows (because that is the way our offices all seem to work) you might find R a little bit different than other applications. R is an open source project having roots in the UNIX, more specifically Linux, world of computing. So R is a bit of an alien immigrating to Windows. At times R documentation is written with a different expectation of background understanding. 

 

The exploration of R should start with a visit to the R-Project home page. Right in the middle of this page we have an obvious link to download R. Pressing on this link we are asked to select a CRAN mirror, basically a download server. The best selection in most cases will be the first one, http://cran.rstudio.com/ as this will automatically redirect to servers worldwide. Heck, we could have started right here. 

 

On the page we now see there is an obvious link for us install R for the first time. Actually we will most likely use this for update installations as well. Now we see the link we really wanted all along: Download R 3.3.1 for Windows (70 megabytes, 32/64 bit), , as of this writing. This will place a familiar Windows installer in the downloads folder. Use any of the multiple navigation means to get there and run this installer. 

 

As the installer proceeds you are asked if you want a customized startup, or to accept defaults. Here, the default is less preferred. Selecting custom startup provides an opportunity to specify SDI, separate windows, for the interface. Other defaults are most likely desirable. The rest is simple next, next … until the installation completes. 

 

If, as most likely in 2016, you are running a 64-bit Windows 7, 8 or 10 system you will now have two versions of R installed. R is a multi-platform application, and these 64-bit operating systems are actually hybrid, capable of 32-bit processes as well as 64-bit. This dual architecture can present some challenges, but most of the time it simply works. As of the release of R version 3.0 in March 2013 the 64-bit version of R is no longer limited to 32-bit memory addressing limitations. This gives the 64-bit version an advantage with big data, as system memory increases beyond the 32-bit limit of 2 GB. Otherwise there is little difference in use. 

 

Currently (late 2016) the 32-bit version of R is most useful if the RExcel application is to be used. RExcel requires a 32-bit version of Excel (2003 to 2010) and the 32-bit version of R. This is to be covered later. 

Installing Our Software Packages

Since some of our technical code packages are written in C++, it is best to be sure that the required C++ library packages are installed first. Contributed R packages such as these are installed from the CRAN servers. This installation is performed through a running R instance, as Administrator. So, now, chose one of the R program shortcuts, either 32-bit or 64-bit, it doesn’t matter, right click an select "Run as administrator" to run it. 

 

This brings up a terse looking terminal-like application. But there are some menu items at the top of the window. Select the Packages menu item, and go down to the "Install package(s) …" selection. A choice of CRAN mirror is required. Here, the first selection for ocloud is most likely the best as this should identify closest servers globally. At this point the very large list of contributed packages to R is presented. We need to install Rcpp and RcppArmadillo. It is possible to choose RcppArmadillo by scrolling down the very large alphabetized list first and making this selection will also install Rcpp as a dependency. Otherwise Rcpp can be installed first, followed by RcppArmadillo. 

 

With these required C++ libraries on the local machine it is possible to install the development packages from R-forge to test the OpenReliability packages. These locations are:

 

The recommended way to load a package into an R installation is by use of a Windows binary download. For each R-Forge package a .zip file can be found for download under a Windows logo. These are not to be unziped as one might expect! Rather, they are to be opened through the R environment. Run an R instance - as administrator- to assure common library maintenance. The R Console menu selection for such local package installation is found at the bottom of the Packages menu in the R GUI console. A typical navigation dialog will open to permit selection of the downloaded package of interest. 

 

When installing the abrem package it is necessary to install its dependent packages pivotals, abremPivotals, debias first. Of course the RcppArmadillo package as discussed earlier is prerequisite for all. 

 

Source downloads have a long extension of .tar.gz (commonly referred in Linux world as a "tar ball"). These files CAN BE opened by a program such as WinZip or WinRar with contents preserved in a particular folder arrangement. Building a binary package from such sources is covered below.

 

Simply installing packages does not yet make them ready for use. It is still necessary to load the package libraries into the running instance of R. This activity is required each time a new R window, or instance, is opened, if the subject package is to be used during that session. Loading a library is done by typing (or copying) a line such as:

library(abrem)

into the R session at the ‘>’ prompt. At times with example code you may see a command line such as require(abrem). The require() function does essentially the same thing as library(). One of these commands is only required once, but no harm is done if called multiple times. Example scripts will often start with a library(xxx) function call, just to be clear. 

A Basic R Primer

If you have followed the installation steps so far you have already noticed that an instance of R really only presents a stark command line prompt. For anyone old enough to remember, this is the kind of presentation one used to expect from old DOS programs. In fact, R appears to act quite similar to the BASIC programming language that was left in the dust long ago. 

 

One easily seen likeness to BASIC is the immediate mode response. Type 1+2 at the prompt and upon pressing the Enter key the result appears. Curiously there is a square bracketed [1] appearing before the result. This is because everything in R is some kind of object. The simple result that we asked for has been returned as a vector with only one element in it. The [1] tells us that the first value seen to the right is the first (and only in this case) element of the result vector. 

 

Most of our input objects will be limited to vectors or dataframes. A vector is like a single dimensioned array in other languages, while a dataframe is dimensioned, more like a data table in a database. All elements of a vector must be of the same type, the usual choices being numeric or text. All elements in a column of a dataframe must follow this rule for vectors. In fact, we will note that a dataframe is actually a list of vectors displayed in a table form with vectors making up the columns. Copy and paste this series of lines to the R prompt:

time<-c(149971, 70808, 133518, 145658,
175701, 50960, 126606, 82329)
event<-c(rep(1,8))
life_data<-data.frame(time,event)
life_data

After pressing Enter the dataframe is displayed in table form. This might be a likely input for an R function that will fit life data to some distribution like a Weibull, for instance. The time values would be time-to-failure, while the event values of 1 indicate complete failure. A suspension, or right-censored data point, would be indicated as a zero for this kind of input. 

 

There are a few things observable about R so far. The standard operator for assignment is a less-than character followed by a dash forming a sort of arrow. An equals sign character would also work for assignment, but eventually this may lead to some confusion. Usually the equals sign is reserved for assignment of argument values within a function calling statement. Separating the two uses can be helpful for reviewing and debugging code. Let's follow with the line of code:

hist(life_data$time, breaks=5)

After pressing Enter this time a graphic window pops up displaying a crude histogram of our small amount of data. Note that in order to feed the hist() function a vector argument, the time column of the life_data dataframe was extracted using a dollar sign. Since we had created a vector named time, that vector could alternatively have been used directly giving the same result. The hist() function also accepts a named argument 'breaks' which can be assigned a single numeric value. We can easily call up a help document for the hist function by entering:

?hist

This help command pulls up the documentation for this function. This help command will be used often as you explore R code examples. Help documentation is also provided for any package by following the question mark with the package name. Package help documentation is usually quite breif, but it should list the functions in the package that will have individual help pages. 

 

The intent has been here to break the ice a bit for initial use of R. There are many learning resources available for free on the internet. One of the best free books is hosted by CRAN at http://cran.r-project.org/doc/contrib/usingR.pdf. R has many capabilities and the list of contributed packages is quite daunting. Maybe just learning to use our packages will kick start a further interest into this amazing and powerful resource. 

Code Editors and IDE's

Working within R involves writing out lines of commands that chain together forming scripts to perform some desired action. It is desirable to save this work, often for trial and error, and at times for cut and paste assembly of new scripts. For this activity some sort of text editor is necessary. While it is entirely possible to use a word processor or even simple WordPad for this purpose, some specific editors provide more facilitating services. 

 

One favorite, open-source, light-weight editor is Notepad++ This editor will provide color coding for several programming languages including R and C++, which is useful for our purposes. Notepad++ will maintain multiple open files on tabs even when they have the same name (but different file location, of course). This facilitates copy and paste from one file to another. 

 

A more complete integrated development environment (IDE) intended to give the R user the look and feel of Matlab™ is RStudio. Many R developers prefer this environment. 

 

Still, for code development this author has a preference for the Excel spreadsheet as an editor. (This confounds many of my developer friends.) Code lines can be built in adjacent cells and then a range can be highlighted for copy and paste to other editors, or directly into the R console. The use of Excel also offers the opportunity to copy and paste example scripts in a range just outside the location of your intended script development. Then a copy of the example lines can be edited to suit within the range of a developing script. This works well for trial and error coding, which seems to happen a lot. Unsuccessful or alternate trials can be saved nearby to help focus on what might be a next likely best trial. 

 

Since R is an interpreted language, individual lines of a script or blocks of lines can be selected from an Excel range and copied to run in the console. The output of various objects under construction can be run and copy / pasted back to the spreadsheet just outside of the range of code development to visualize the progress of the script. No other editor or IDE can match this capability. 

 

An Excel add-in called RExcel can simplify this line-by-line debugging capability further using what is called its "scratchpad mode". The RExcel add-in extends the context menu selections (on right mouse click) to include options to "Run Code" for a highlighted range. Then, a second right mouse click can be made outside the range of code development to place a copy of output using a "Get R Output" menu selection. A cautionary note however, RExcel is not open-source and unless used strictly for student or unpaid personal use a paid license is required, based on the honors system. 

 

Also keep in mind that currently RExcel only works with the 32-bit version of R and a 32-bit version of Excel (2003 - 2010). 

Installing RExcel

Before beginning, assure that your version of Microsoft Office, or Excel alone is 32-bit. RExcel has not yet been implemented on the 64-bit platform. 

 

There are a number of pages in the statconn web site that will download and install several components that make up a complete RExcel installation. Unfortunately, doing some of this in the wrong order can lead to failure. The best, most recent, installation instructions are found at the statconn wiki http://homepage.univie.ac.at/erich.neuwirth/php/rcomwiki/doku.php?id=wiki:how_to_install. Erich Neuwirth has worked hard on all of this material and even spent some personal time with this writer after the disrupting events of the R-3.0 roll out. 

 

It is strongly recommend that users of our packages should start with a latest R version installation. This is because this version is required for simplest download of the compiled binaries available from R-forge. For this reason the initial steps with a batch file on the statconn wiki page should be ignored and attention should be moved to the section titled "How to install RExcel when R is already installed".

 

Here is a repeat of the stepwise instructions for installing R Excel after an existing R installation is in place. You must refer to the statconn wiki should any problems arise with these links, which have been reproduced faithfully at time of writing: 

 

  1. 1.
    statconnDCOMserver.latest.exe
    This .exe is a Windows installer, just like many others. Running this will install a service on the Windows platform. This software is specifically licensed, you can use the Noncommercial Home & Student version only if you qualify.
     
    Problems have occurred if another version of statconnDCOM has previously been installed on the target system. This can happen by downloading and executing things from the statconn site by trial and error before following these steps carefully. If in doubt, go to Control Panel -> Uninstall a program and uninstall any statconnDCOM entry before running the installer for latest version.
  2. 2.
    R packages rscproxy and rcomThese are R packages, but are not open source, rather licensed just as the DCOM server above. For this reason these packages are not available from CRAN. A three command-line script is provided that must be run on an R instance that has been started as Administrator (on Windows Vista/7/8), just as instructed for installation of our packages from R-forge.

    install.packages(c("rscproxy","rcom"),
    repos="http://rcom.univie.ac.at/download",lib=.Library)
    #
    library(rcom)
    #
    comRegisterRegistry()

    It is preferable to run these one command line at a time so you can verify success at each step. The last command returns a result of zero when it is happy.

  3. 3.
    RExcelInst.latest.exeThis is the Excel add-in installer. It runs on Windows like any other installer. Note that you should wish to take expert control over this installation, rather than accept all defaults. For code editing purposes the foreground server option for R is required. This choice is offered during non-default installation.
    At this point you should have a functional RExcel installation. Although a shortcut to RExcel may be on your desktop, this is not expected to have any particular use. Open any spreadsheet (workbook) that you previously saved and you should note an Add-Ins tab for a new ribbon control, if you did not have one before. Selecting the Add-Ins tab should now reveal an RExcel selection offering a drop down menu. At the top of the menu is a "Start R" selection. Normal left clicking on "Start R" should start a 32-bit R session that is now linked to Excel. (Note that if you had a previously running 32-bit R session the selection may say "Connect R".)
    You should be able to see the R session icon in the Windows task bar. If not, for some reason you may not have obtained the foreground server option. It should be possible to re-set the server from background to foreground from the RExcel dropdown menu item "Set R server". It is necessary that no current connection from Excel to R exists in order to access the "Set R server" selection. Either shut both Excel and any visible R session down and restart Excel to see this option, or possibly just "Disconnect R" from the RExcel menu.

 

At this point there is an opportunity to install the R Commander package, Rcmdr, following the statconn wiki further. The Rcmdr package provides a GUI service layer on the R system. RExcel is designed to work with R Commander having the ability to place R Commander menus in the Add-ins ribbon control of Excel. Similar to the way R runs in a foreground server, R Commander has a separate run object that can be brought up to focus from a grouping with the R instance on the Windows task bar. It is easy to start R Commander from the Excel Add-ins ->RExcel menu (once R has been started and connected). Alternatively a standalone instance of R Commander can be started from the R console with a line entry calling the function:

Commander()

 

We have interest in building an R Commander plug-in for the abrem application package, but this is a topic that someone must study and apply. Maybe someone reading this page will offer to help in this area. For now examination of Rcmdr is an optional point of interest, with respect to use of our current packages. 

Building From Source Packages on Windows

Compiling from sources is a common place occurrence on Linux and for nearly all open source distributions.  It is not so common for typical users on Windows, and may be frightening to those who have never done it before.  Very often it is hard to find a comprehensive list of the things you need to have and things you need to do to implement a successful package compilation.  Too many assumptions are made about previous knowledge of the person approaching a build for the first time.  Below is an attempt to take this completely step-by-step on Windows (Mac OSX users have a similar process, but it is not covered here.): 

 

One of the first considerations for a build that includes compiling from C or C++ sources (Fortran is also an option with R, but not used here), is to have the right compiler.  R itself is written in C and has been compiled with the GNU compiler suite.  Packages that will link to R must use this same compiler.  A very common question that arises is "Can't I use the compilers in Microsoft Visual Studio?"  The answer is always no.  There is no workaround on this because R sources have not been written to permit an R build on this resource.  There is no build of R using MSVC for instance. 

 

The R-project maintains a handy download of the required  Windows tool chain (compiler plus other necessities) in a download called  <a href="http://cran.r-project.org/bin/windows/Rtools/">Rtools</a>.  This tool chain is tweaked to match version updates of R, so it becomes a moving target with each R version update as well.  It is okay to take the latest version even though it may not be "frozen".  The RtoolsXX.exe that you download here is the installer of preference.  By default it copies contents to C:\Rtools which it will create for you.  It can also set a registry entry, but uninstall, if desired, can be simply performed by deleting this directory.  Future updates will simply overwrite what is there. 

 

To set up the windows system to use this compiler for R packages some entries are required in the PATH environment.  Windows ships with only crude access to the System Environment Variables.  An open source alternative, http://www.rapidee.com/en/download Rapid Environment Editor, is a very user friendly alternative.   Be careful not to install unwanted stuff along with the featured installer.  There should be a "decline" choice for these unwanted things.  The PATH additions required at this point are entries for C:\Rtools\bin;C\Rtools\gcc-#.#.#\bin; C:\Program Files\R\R-#.#.#\bin

 

These entries should be placed in front of other existing entries in the PATH environment variable.  Above the hash symbol "#" has been used to represent a holding place for whatever current version you have at time of installation.  Notice it has been assumed that R was installed in the default Program Files directory.  Rapid Environment Editor needs to be started as Administrator to make changes.  REE provides a handy button for restarting this way if you forgot.  REE also provides the ability to browse for the file directory entries, so you don't need to memorize version numbers (to replace hashes above) anyway. A reboot of the Windows system is probably required to assure changes saved by REE are indeed active. 

 

At this point a small test can be run to verify that the compiler is indeed installed and working properly.  Assuming package Rcpp has been installed, the following script will then be able to compile a small sample of C++ code (using Rcpp which is one of our dependencies) and execute it.
[Note: compiling using the sourceCpp function will not execute through RExcel.  It is necessary to copy and paste the script through the sourceCpp command line into the R console.]

 

library(Rcpp)
src-'
#include <Rcpp>;
// [[Rcpp::export]]
SEXP X5(SEXP arg1){
Rcpp::NumericVector input_vec(arg1);
int N = input_vec.size();
Rcpp::NumericVector output_vec(clone(input_vec)) ;
for(int i=0; i<N; i++)  {output_vec[i]=output_vec[i]*5;}
return output_vec;
}
'
sourceCpp(code=src)
input<-4
X5result<-data.frame(input=input,output=X5(input))
X5result 

 

The X5 function is quite trivial in that it simply multiplies a vector argument by 5.  This function will exist only for the life of the R session it was created in and cannot be saved for future use.  Use of this 'inline' coding method is primarily limited to function development.  Notice that src is simply an R vector holding the C++ source code as a continuous character string.  Alternatively this could have been read in from a text file. 

 

Well, that was fun.  Not only did we prove the compiler setup, we saw a simple method for C++ code development.  There is one more thing we need to do to build complete packages.  The system still needs to be able to build the help documentation (man) pages.  For this, a download of MikTeX is required as found on the Rtools download page.   After running the MikTeX installer it is necessary to add the miktex\bin directory into the PATH environment variable.  It can be placed after the entry for the R binary directory.
[Note: I was building packages without help pages for 2or 3 years until I realized I needed this PATH entry.   No documentation seemed to ever clarify the point.  It must be considered common knowledge.] 

 

With a package build system so set up, it is now possible to build packages, including those that require compilation. A source package should be unpacked somewhere in the file system. [ I like to have a C:/Rpack folder to hold various package sources.] A package is contained under its root folder, which carries the name of the package. A copy of the command shell, cmd.exe, can be placed in the Rpack folder. Then a batch file, named build.bat, can be constructed with the following command line: 

 

R CMD INSTALL --build %1

 

This way, any package placed in the Rpack folder can be built by first running cmd.exe - as administrator - then entering 'build [package_name]' into the console. The result of a successful operation will be the appearance of a binary for package installation with the .zip extension in the same directory that cmd.exe was executed from.