Inspired by Ruby's Rack Project, Jeffery Horner released his R package "Rook" [1] earlier this year. After trying to get several Rook applications running, I realized that Rook had avoided some certain disadvantages of Rapache. Rook is much more flexible and easier to learn.
Theoretically speaking, once the proper plugin is done, your app could then be deployed under any web servers such as apache/lighthttpd/nginx, etc. Another significant advantage of Rook is, it's friendly for debugging. As Rook takes Rhttpd as the default server, you could preview your app on-the-fly, without any complicated deploying process.
Here's a test app, which implements the creative binary file visualization method described in the VizSec and Virol papers [2] and [3]. We choose to visualize the CVE-2011-0611 samples, which were retrieved from [4]. By using the Rook::File application simultaneously, we could serve static (png) files.
Load required pkgs:
require(Rook) require(digest) require(kohonen)
Write a Rook app (visbin.R):
newapp = function(env) {
req = Rook::Request$new(env)
res = Rook::Response$new()
res$write('Choose a Binary file to Train:\n')
res$write('<form method="POST" enctype="multipart/form-data">\n')
res$write('<input type="file" name="data">\n')
res$write('xdim:\n')
res$write('<form method="POST">\n')
res$write('<input type="text" name="xdim" value="12">\n')
res$write('ydim:\n')
res$write('<form method="POST">\n')
res$write('<input type="text" name="ydim" value="25">\n')
res$write('ncolors:\n')
res$write('<form method="POST">\n')
res$write('<input type="text" name="ncolors" value="8">\n')
res$write('<input type="submit" name="Go!">\n</form>\n<br>')
myNormalize = function (target) {
return((target - min(target))/(max(target) - min(target)))
}
if (!is.null(req$POST())) {
data = req$POST()[["data"]]
hash = digest(data$tempfile, algo = "md5", file = TRUE)
destFile = file(data$tempfile, "rb")
k = floor((file.info(data$tempfile)$size/16)) - 2
doneFile = readBin(con = destFile, what = "raw", n = 2 * 8 * k)
close(destFile)
tmpFile0 = rbind(doneFile[seq(1, (2 * 8 * k) - 1, 2)], doneFile[seq(2, (2 * 8 * k), 2)])
tmpFile1 = paste(tmpFile0[1, ], tmpFile0[2, ], sep = "")
initMat = matrix(strtoi(tmpFile1, 16L), ncol = 8, byrow = TRUE)
normMat = myNormalize(initMat)
trainedSOM = kohonen::som(normMat, grid = somgrid(xdim = req$POST()[["xdim"]], ydim = req$POST()[["ydim"]], "hexagonal"))
png(paste("/tmp/", hash, ".png", sep = ""))
plot(trainedSOM, type = "dist.neighbours", palette.name = rainbow, ncolors = as.numeric(req$POST()[["ncolors"]]), main = "")
dev.off()
res$write(paste("<img src='", s$full_url("pic"), "/", hash, ".png'", " />", sep = ""))
}
res$finish()
}
Initialize/Run the app:
s = Rhttpd$new()
s$add(app = newapp, name = "visbin")
s$add(app = File$new("/tmp"), name = "pic")
s$start()
s$browse("visbin")
Firstly the app hashes the uploaded files then trains SOM models. As the training result differs each time, we may train more times to get the better one.
We use the U-Matrix to visualize the Self-Organizing Maps, The U-Matrix value of a particular unit is the average distance between the unit and its closest neighbors, then color was used to represent the value. Actually, the number of the color palette is critical, too much or too little may interfere the detection of potential cluster patterns.
There exists much more methods for dimensional reduction and visualization with R packages, you may refer to the R News (R Journal) paper [5].
It clearly shows that a cluster pattern appears in the lower right corner. It's reasonable to suspect the file was injected with some data that shouldn't be there.
The paper says it got bad results when visualizing macro viruses (embedded in Microsoft Office files). Actually, the CVE-2011-0611 sample are doc/xls files, but they are not macro viruses. They're hosts injected with harmful Adobe swf files. From this point of view, they're just like the infected executable files. So theory still applies.
A detail is, after uploading, the data$tempfile has a different MD5 with the original file, it gains extra hex 0D 0A (seems a new line) in the end. I don't quite understand how this happens. As we had deleted the last two lines of the file to form a proper matrix, the training data is not identical with the binary sample. Nothing influences for this case.
In summary, Rook connects the 3000+ available R package and web application development, just 40 lines of code were used to achieve a not-so-simple goal, it's really amazing.
References
[1] Rook - a web server interface for R.
[2] Visualizing Windows Executable Viruses Using Self-Organizing Maps, VizSec, 2004.
[3] Non-signature Based Virus Detection, Journal in Computer Virology, 2:163–186, 2006.
[4] Contagio Malware Dump. Apr. 8 CVE-2011-0611 Flash Player Zero day - SWF in DOC/ XLS - Disentangling Industrial Policy.
[5] Dimensional Reduction for Data Mapping, R News, Vol. 3/3, 2003.

Cool!
WOW, Rook is gorgeous!It's easier to make people learn your visualization on the web
I did not realize it was so easy to use... You definitely should go to ChinaR @ Shanghai to present this package