R World News

R World News


R World News — Episode 1

May 09, 2016

*R World News - Episode 1*

The `forecast` package
([*https://cran.rstudio.com/web/packages/forecast/index.html*](https://cran.rstudio.com/web/packages/forecast/index.html))
by Rob Hyndman is now at version 7.1 and has includes bug fixes along
with some improvements, such as support for multivariate linear models.
One of the more notable additions is built-in support for plotting
forecast objects using ggplot2. There are 11 autoplot() S3 methods for
various forecast objects, 8 “gg” methods for directly starting ggplot2
plot constructs, 2 fortify() methods for transforming forecast objects
into data.frames and a new geom\_forecast() which lets you easily
incorporate forecast object with other geom\_s or annotations.

The ggnetwork package
([*https://briatte.github.io/ggnetwork/*](https://briatte.github.io/ggnetwork/))
by Francois Briatte has made the jump from devtools into CRAN and
provides support for the graphical display of virtually anything you can
build with the network or igraph packages. This is the second package
featured in today’s episode that takes advantage of the newly enhanced
object model of ggplot2, making it straightforward to add scales, Geoms,
Stats and even Coords (coordinate systems). The package authors provide
a number of Geoms and themes, including the core ones: geom\_edges() and
geom\_nodes().

rprojroot
([*http://krlmlr.github.io/rprojroot/*](http://krlmlr.github.io/rprojroot/))
is a new utility package by Kirill Müller designed to ease the pain of
referencing scripts or files in project subdirectories. Whether you’re
building a package, working in an RStudio project or just in a
git-managed directory, rprojroot has a simple interface to finding the
directory root and letting you make the subdirectory & file references
from that point. As the package author says, this solves a seemingly
trivial but annoying problem that most of us encounter at one time or
another.

The next two packages work great together when you want to process a
corpus or three in R. tokenizers
([*https://cran.rstudio.com/web/packages/tokenizers/index.html*](https://cran.rstudio.com/web/packages/tokenizers/index.html)),
by Lincoln Mullen & Dmitriy Selivanov, provides a consistent interface
for breaking up a corpus into components such as n-grams, words, word
stems, lines, sentences, paragraphs and more. It uses the robust stringi
package for much of its core functionality and it returns plain R
vectors vs custom objects, making the transformed texts easy to use and
manipulate.

The tidytext package
([*https://cran.rstudio.com/web/packages/tidytext/index.html*](https://cran.rstudio.com/web/packages/tidytext/index.html))
by Julia Silge, David Robinson & Gabriela De Queiroz uses the tokenizers
(and a few other packages) to tranform a corpus into tidy data.frames
that enable the use of dplyr and dplyr-like idioms in further
processing. tidytext also provides tools for sentiment analysis and
transforming objects to/from term/document matrix objects.

Finally, Kurt Hornik & Florian Schwendinger beat hrbrmstr to the next
package: pandocfilters
([*https://cran.rstudio.com/web/packages/pandocfilters/README.html*](https://cran.rstudio.com/web/packages/pandocfilters/README.html)).
This works with something called the abstract syntax tree (AST)
generated each time pandoc is called to transform one document format to
another. The AST a JSON file with a node for each token. You can write
transformation functions for one or more node types in plain R code and
then have pandoc process the resultant, modified AST into the desired
format. One of the basic examples shown by the authors is to write a
filter to transform all text nodes to lower-case, but you can do
anything to any node type and even create ASTs from scratch, all with R
code.

Plus a featurette on the [`feather`](https://blog.rstudio.org/2016/03/29/feather/) package.