[r-cran-jsonlite] 01/03: debian/rules: do not install extra LICENSE file, do not install upstream NEWS file twice

Joost van Baal joostvb at moszumanska.debian.org
Mon Dec 21 15:26:31 UTC 2015


This is an automated email from the git hooks/post-receive script.

joostvb pushed a commit to branch master
in repository r-cran-jsonlite.

commit fbebd741f21a195ad9fefe18e6853524aa9959d8
Author: Joost van Baal-Ilić <joostvb at uvt.nl>
Date:   Mon Dec 21 15:16:03 2015 +0000

    debian/rules: do not install extra LICENSE file, do not install upstream NEWS file twice
---
 debian/changelog                |   6 +-
 debian/rules                    |   3 +
 vignettes/json-apis.Rmd.orig    | 184 -------------
 vignettes/json-mapping.Rnw.orig | 583 ----------------------------------------
 vignettes/json-paging.Rmd.orig  |  92 -------
 5 files changed, 7 insertions(+), 861 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index 1368972..e2a1157 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -3,8 +3,10 @@ r-cran-jsonlite (0.9.17-2) unstable; urgency=low
   * UNRELEASED
   * Initial release. Closes: #808148
   * debian/copyright: completed.
-
-FIXME: install inst/CITATION
+  * debian/rules: do not install extra LICENSE file, do not install upstream
+    NEWS file twice
+  * FIXME , fix clean target.
+  * FIXME: install inst/CITATION
 
  -- Joost van Baal-Ilić <joostvb+shiny at uvt.nl>  Mon, 21 Dec 2015 14:55:54 +0000
 
diff --git a/debian/rules b/debian/rules
index 451258e..e7dd3ec 100755
--- a/debian/rules
+++ b/debian/rules
@@ -5,3 +5,6 @@ include /usr/share/dpkg/buildflags.mk
 makeFlags="LDFLAGS=$(LDFLAGS)"
 
 include /usr/share/R/debian/r-cran.mk
+
+common-install-arch::
+	rm -vf $(debRlib)/$(cranNameOrig)/LICENSE $(debRlib)/$(cranNameOrig)/NEWS
diff --git a/vignettes/json-apis.Rmd.orig b/vignettes/json-apis.Rmd.orig
deleted file mode 100644
index da5df05..0000000
--- a/vignettes/json-apis.Rmd.orig
+++ /dev/null
@@ -1,184 +0,0 @@
----
-title: "Fetching JSON data from REST APIs"
-date: "`r Sys.Date()`"
-output:
-  html_document
-vignette: >
-  %\VignetteIndexEntry{Fetching JSON data from REST APIs}
-  %\VignetteEngine{knitr::rmarkdown}
-  \usepackage[utf8]{inputenc}
----
-
-```{r echo=FALSE}
-library(knitr)
-opts_chunk$set(comment="")
-
-#this replaces tabs by spaces because latex-verbatim doesn't like tabs
-#no longer needed with yajl
-#toJSON <- function(...){
-#  gsub("\t", "  ", jsonlite::toJSON(...), fixed=TRUE);
-#}
-```
-
-This section lists some examples of public HTTP APIs that publish data in JSON format. These are great to get a sense of the complex structures that are encountered in real world JSON data. All services are free, but some require registration/authentication. Each example returns lots of data, therefore not all output is printed in this document.
-
-```{r message=FALSE}
-library(jsonlite)
-```
-
-## Github
-
-Github is an online code repository and has APIs to get live data on almost all activity. Below some examples from a well known R package and author:
-
-```{r}
-hadley_orgs <- fromJSON("https://api.github.com/users/hadley/orgs")
-hadley_repos <- fromJSON("https://api.github.com/users/hadley/repos")
-gg_commits <- fromJSON("https://api.github.com/repos/hadley/ggplot2/commits")
-gg_issues <- fromJSON("https://api.github.com/repos/hadley/ggplot2/issues")
-
-#latest issues
-paste(format(gg_issues$user$login), ":", gg_issues$title)
-```
-
-## CitiBike NYC
-
-A single public API that shows location, status and current availability for all stations in the New York City bike sharing imitative.
-
-```{r}
-citibike <- fromJSON("http://citibikenyc.com/stations/json")
-stations <- citibike$stationBeanList
-colnames(stations)
-nrow(stations)
-```
-
-## Ergast
-
-The Ergast Developer API is an experimental web service which provides a historical record of motor racing data for non-commercial purposes.
-
-```{r}
-res <- fromJSON('http://ergast.com/api/f1/2004/1/results.json')
-drivers <- res$MRData$RaceTable$Races$Results[[1]]$Driver
-colnames(drivers)
-drivers[1:10, c("givenName", "familyName", "code", "nationality")]
-```
-
-
-## ProPublica
-
-Below an example from the [ProPublica Nonprofit Explorer API](http://projects.propublica.org/nonprofits/api) where we retrieve the first 10 pages of tax-exempt organizations in the USA, ordered by revenue. The `rbind.pages` function is used to combine the pages into a single data frame.
-
-
-```{r, message=FALSE}
-#store all pages in a list first
-baseurl <- "https://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"
-pages <- list()
-for(i in 0:10){
-  mydata <- fromJSON(paste0(baseurl, "&page=", i), flatten=TRUE)
-  message("Retrieving page ", i)
-  pages[[i+1]] <- mydata$filings
-}
-
-#combine all into one
-filings <- rbind.pages(pages)
-
-#check output
-nrow(filings)
-filings[1:10, c("organization.sub_name", "organization.city", "totrevenue")]
-```
-
-
-## New York Times
-
-The New York Times has several APIs as part of the NYT developer network. These interface to data from various departments, such as news articles, book reviews, real estate, etc. Registration is required (but free) and a key can be obtained at [here](http://developer.nytimes.com/docs/reference/keys). The code below includes some example keys for illustration purposes.
-
-```{r}
-#search for articles
-article_key <- "&api-key=c2fede7bd9aea57c898f538e5ec0a1ee:6:68700045"
-url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=obamacare+socialism"
-req <- fromJSON(paste0(url, article_key))
-articles <- req$response$docs
-colnames(articles)
-
-#search for best sellers
-bestseller_key <- "&api-key=5e260a86a6301f55546c83a47d139b0d:3:68700045"
-url <- "http://api.nytimes.com/svc/books/v2/lists/overview.json?published_date=2013-01-01"
-req <- fromJSON(paste0(url, bestseller_key))
-bestsellers <- req$results$list
-category1 <- bestsellers[[1, "books"]]
-subset(category1, select = c("author", "title", "publisher"))
-
-#movie reviews
-movie_key <- "&api-key=5a3daaeee6bbc6b9df16284bc575e5ba:0:68700045"
-url <- "http://api.nytimes.com/svc/movies/v2/reviews/dvd-picks.json?order=by-date"
-req <- fromJSON(paste0(url, movie_key))
-reviews <- req$results
-colnames(reviews)
-reviews[1:5, c("display_title", "byline", "mpaa_rating")]
-
-```
-
-## CrunchBase
-
-CrunchBase is the free database of technology companies, people, and investors that anyone can edit.
-
-```{r eval=FALSE}
-key <- "f6dv6cas5vw7arn5b9d7mdm3"
-res <- fromJSON(paste0("http://api.crunchbase.com/v/1/search.js?query=R&api_key=", key))
-head(res$results)
-```
-
-## Sunlight Foundation
-
-The Sunlight Foundation is a non-profit that helps to make government transparent and accountable through data, tools, policy and journalism. Register a free key at [here](http://sunlightfoundation.com/api/accounts/register/). An example key is provided.
-
-```{r}
-key <- "&apikey=39c83d5a4acc42be993ee637e2e4ba3d"
-
-#Find bills about drones
-drone_bills <- fromJSON(paste0("http://openstates.org/api/v1/bills/?q=drone", key))
-drone_bills$title <- substring(drone_bills$title, 1, 40)
-print(drone_bills[1:5, c("title", "state", "chamber", "type")])
-
-#Congress mentioning "constitution"
-res <- fromJSON(paste0("http://capitolwords.org/api/1/dates.json?phrase=immigration", key))
-wordcount <- res$results
-wordcount$day <- as.Date(wordcount$day)
-summary(wordcount)
-
-#Local legislators
-legislators <- fromJSON(paste0("http://congress.api.sunlightfoundation.com/",
-  "legislators/locate?latitude=42.96&longitude=-108.09", key))
-subset(legislators$results, select=c("last_name", "chamber", "term_start", "twitter_id"))
-```
-
-## Twitter
-
-The twitter API requires OAuth2 authentication. Some example code:
-
-```{r}
-#Create your own appication key at https://dev.twitter.com/apps
-consumer_key = "EZRy5JzOH2QQmVAe9B4j2w";
-consumer_secret = "OIDC4MdfZJ82nbwpZfoUO4WOLTYjoRhpHRAWj6JMec";
-
-#Use basic auth
-library(httr)
-secret <- RCurl::base64(paste(consumer_key, consumer_secret, sep = ":"));
-req <- POST("https://api.twitter.com/oauth2/token",
-  add_headers(
-    "Authorization" = paste("Basic", secret),
-    "Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8"
-  ),
-  body = "grant_type=client_credentials"
-);
-
-#Extract the access token
-token <- paste("Bearer", content(req)$access_token)
-
-#Actual API call
-url <- "https://api.twitter.com/1.1/statuses/user_timeline.json?count=10&screen_name=Rbloggers"
-req <- GET(url, add_headers(Authorization = token))
-json <- content(req, as = "text")
-tweets <- fromJSON(json)
-substring(tweets$text, 1, 100)
-```
-
diff --git a/vignettes/json-mapping.Rnw.orig b/vignettes/json-mapping.Rnw.orig
deleted file mode 100644
index 3d9fd44..0000000
--- a/vignettes/json-mapping.Rnw.orig
+++ /dev/null
@@ -1,583 +0,0 @@
-%\VignetteEngine{knitr::knitr}
-%\VignetteIndexEntry{A mapping between JSON data and R objects}
-
-<<echo=FALSE>>=
-#For JSS
-#opts_chunk$set(prompt=TRUE, highlight=FALSE, background="white")
-#options(prompt = "R> ", continue = "+  ", width = 70, useFancyQuotes = FALSE)
-@
-
-%This is a template.
-%Actual text goes in sources/content.Rnw
-\documentclass{article}
-\author{Jeroen Ooms}
-
-%useful packages
-\usepackage{url}
-\usepackage{fullpage}
-\usepackage{xspace}
-\usepackage{booktabs}
-\usepackage{enumitem}
-\usepackage[hidelinks]{hyperref}
-\usepackage[round]{natbib}
-\usepackage{fancyvrb}
-\usepackage[toc,page]{appendix}
-\usepackage{breakurl}
-
-%for table positioning
-\usepackage{float}
-\restylefloat{table}
-
-%support for accents
-\usepackage[utf8]{inputenc}
-
-%support for ascii art
-\usepackage{pmboxdraw}
-
-%use vspace instead of indentation for paragraphs
-\usepackage{parskip}
-
-%extra line spacing
-\usepackage{setspace}
-\setstretch{1.25}
-
-%knitr style verbatim blocks
-\newenvironment{codeblock}{
-  \VerbatimEnvironment
-  \definecolor{shadecolor}{rgb}{0.95, 0.95, 0.95}\color{fgcolor}
-  \color{black}
-  \begin{kframe}
-  \begin{BVerbatim}
-}{
-  \end{BVerbatim}
-  \end{kframe}
-}
-
-%placeholders for JSS/RJournal
-\newcommand{\pkg}[1]{\texttt{#1}}
-\newcommand{\code}[1]{\texttt{#1}}
-\newcommand{\proglang}[1]{\texttt{#1}}
-
-%shorthands
-\newcommand{\JSON}{\texttt{JSON}\xspace}
-\newcommand{\R}{\proglang{R}\xspace}
-\newcommand{\C}{\proglang{C}\xspace}
-\newcommand{\toJSON}{\code{toJSON}\xspace}
-\newcommand{\fromJSON}{\code{fromJSON}\xspace}
-\newcommand{\XML}{\pkg{XML}\xspace}
-\newcommand{\jsonlite}{\pkg{jsonlite}\xspace}
-\newcommand{\RJSONIO}{\pkg{RJSONIO}\xspace}
-\newcommand{\API}{\texttt{API}\xspace}
-\newcommand{\JavaScript}{\proglang{JavaScript}\xspace}
-
-
-%trick for using same content file as chatper and article
-\newcommand{\maintitle}[1]{
-  \title{#1}
-  \maketitle
-}
-
-%actual document
-\begin{document}
-
-
-\maintitle{The \jsonlite Package: A Practical and Consistent Mapping Between \JSON Data and \R Objects}
-
-<<echo=FALSE, message=FALSE>>=
-library(jsonlite)
-library(knitr)
-opts_chunk$set(comment="")
-
-#this replaces tabs by spaces because latex-verbatim doesn't like tabs
-toJSON <- function(...){
-  gsub("\t", "  ", jsonlite::toJSON(...), fixed=TRUE);
-}
-@
-
-\begin{abstract}
-A naive realization of \JSON data in \R maps \JSON \emph{arrays} to an unnamed list, and \JSON \emph{objects} to a named list. However, in practice a list is an awkward, inefficient type to store and manipulate data. Most statistical applications work with (homogeneous) vectors, matrices or data frames. Therefore \JSON packages in \R typically define certain special cases of \JSON structures which map to simpler \R types. Currently no formal guidelines or consensus exists on how \R data  [...]
-\end{abstract}
-
-
-\section{Introduction}
-
-\emph{JavaScript Object Notation} (\JSON) is a text format for the serialization of structured data \citep{crockford2006application}. It is derived from the object literals of \proglang{JavaScript}, as defined in the \proglang{ECMAScript} programming language standard \citep{ecma1999262}. Design of \JSON is simple and concise in comparison with other text based formats, and it was originally proposed by Douglas Crockford as a ``fat-free alternative to \XML'' \citep{crockford2006json}. Th [...]
-
-The emphasis of this paper is not on discussing the \JSON format or any particular implementation for using \JSON with \R.  We refer to \cite{nolan2014xml} for a comprehensive introduction, or one of the many tutorials available on the web. Instead we take a high level view and discuss how \R data structures are most naturally represented in \JSON. This is not a trivial problem, particularly for complex or relational data as they frequently appear in statistical applications. Several \R  [...]
-
-%When relying on \JSON as the data interchange format, the mapping between \R objects and \JSON data must be consistent and unambiguous. Clients relying on \JSON to get data in and out of \R must know exactly what to expect in order to facilitate reliable communication, even if the data themselves are dynamic. Similarly, \R code using dynamic \JSON data from an external source is only reliable when the conversion from \JSON to \R is consistent. This document attempts to take away some of [...]
-
-\subsection{Parsing and type safety}
-
-The \JSON format specifies 4 primitive types (\texttt{string}, \texttt{number}, \texttt{boolean}, \texttt{null}) and two \emph{universal structures}:
-
-\begin{itemize} %[itemsep=3pt, topsep=5pt]
-  \item A \JSON \emph{object}: an unordered collection of zero or more name-value
-   pairs, where a name is a string and a value is a string, number,
-   boolean, null, object, or array.
-  \item A \JSON \emph{array}: an ordered sequence of zero or more values.
-\end{itemize}
-
-\noindent Both these structures are heterogeneous; i.e. they are allowed to contain elements of different types. Therefore, the native \R realization of these structures is a \texttt{named list} for \JSON objects, and \texttt{unnamed list} for \JSON arrays. However, in practice a list is an awkward, inefficient type to store and manipulate data in \R.  Most statistical applications work with (homogeneous) vectors, matrices or data frames. In order to give these data structures a \JSON re [...]
-
-<<>>=
-txt <- '[12, 3, 7]'
-x <- fromJSON(txt)
-is(x)
-print(x)
-@
-
-This seems very reasonable and it is the only practical solution to represent vectors in \JSON. However the price we pay is that automatic simplification can compromise type-safety in the context of dynamic data. For example, suppose an \R package uses \fromJSON to pull data from a \JSON \API on the web and that for some particular combination of parameters the result includes a \texttt{null} value, e.g: \texttt{[12, null, 7]}. This is actually quite common, many \API's use \texttt{null} [...]
-
-The lesson here is that we need to be very specific and explicit about the mapping that is implemented to convert between \JSON data and \R objects. When relying on \JSON as a data interchange format, the behavior of the parser must be consistent and unambiguous. Clients relying on \JSON to get data in and out of \R must know exactly what to expect in order to facilitate reliable communication, even if the content of the data is dynamic. Similarly, \R code using dynamic \JSON data from a [...]
-
-% \subsection{A Bidirectional Mapping}
-%
-% - bidirectional: one-to-one correspondence between JSON and \R classes with minimal coersing.
-% - relation is functional in each direction: json interface to \R objects, and \R objects can be used to manipulate a JSON structure.
-% - Results in unique coupling between json and objects that makes it natural to manipulate JSON in \R, and access \R objects from their JSON representation.
-% - Mild assumption of consistency.
-% - Supported classes: vectors of type numeric, character, logical, data frame and matrix.
-% - Natural class is implicit in the structure, rather than explicitly encode using metadata.
-% - Will show examples of why this is powerful.
-
-\subsection[Reference implementation: the jsonlite package]{Reference implementation: the \jsonlite package}
-
-The \jsonlite package provides a reference implementation of the conventions proposed in this document. It is a fork of the \RJSONIO package by Duncan Temple Lang, which builds on \texttt{libjson} \texttt{C++} library from Jonathan Wallace. \jsonlite uses the parser from \RJSONIO, but the \R code has been rewritten from scratch. Both packages implement \toJSON and \fromJSON functions, but their output is quite different. Finally, the \jsonlite package contains a large set of unit tests t [...]
-
-<<eval=FALSE>>=
-library(testthat)
-test_package("jsonlite")
-@
-
-Note that even though \JSON allows for inserting arbitrary white space and indentation, the unit tests assume that white space is trimmed.
-
-\subsection{Class-based versus type-based encoding}
-\label{serializejson}
-
-The \jsonlite package actually implements two systems for translating between \R objects and \JSON. This document focuses on the \toJSON and \fromJSON functions which use \R's class-based method dispatch. For all of the common classes in \R, the \jsonlite package implements \toJSON methods as described in this document. Users in \R can extend this system by implementing additional methods for other classes. This also means that classes that do not have the \toJSON method defined are not  [...]
-
-The alternative to class-based method dispatch is to use type-based encoding, which \jsonlite implements in the functions \texttt{serializeJSON} and \code{unserializeJSON}. All data structures in \R get stored in memory using one of the internal \texttt{SEXP} storage types, and \code{serializeJSON} defines an encoding schema which captures the type, value, and attributes for each storage type. The resulting \JSON closely resembles the internal structure of the underlying \C data types, a [...]
-
-\subsection{Scope and limitations}
-
-Before continuing, we want to stress some limitations of encoding \R data structures in \JSON. Most importantly, there are limitations to the types of objects that can be represented. In general, temporary in-memory properties such as connections, file descriptors and (recursive) memory references are always difficult if not impossible to store in a sensible way, regardless of the language or serialization method. This document focuses on the common \R classes that hold \emph{data}, such [...]
-
-Then there are limitations introduced by the format. Because \JSON is a human readable, text-based format, it does not support binary data, and numbers are stored in their decimal notation. The latter leads to loss of precision for real numbers, depending on how many digits the user decides to print. Several dialects of \JSON exists such as \texttt{BSON} \citep{chodorow2013mongodb} or \texttt{MSGPACK} \citep{msgpack}, which extend the format with various binary types. However, these form [...]
-
-Finally, as mentioned earlier, \fromJSON is not a perfect inverse function of \toJSON, as is the case for \code{serialializeJSON} and \code{unserializeJSON}. The class based mappings are designed for concise and practical encoding of the various common data structures. Our implementation of \toJSON and \fromJSON approximates a reversible mapping between \R objects and \JSON for the standard data classes, but there are always limitations and edge cases. For example, the \JSON representati [...]
-
-% \subsection{Goals: Consistent and Practical}
-%
-% It can be helpful to see the problem from both sides. The \R user needs to interface external \JSON data from within \R.  This includes reading data from a public source/API, or posting a specific \JSON structure to an online service. From perspective of the \R user, \JSON data should be realized in \R using classes which are most natural in \R for a particular structure. A proper mapping is one which allows the \R user to read any incoming data or generate a specific \JSON structures  [...]
-%
-% Both sides come together in the context of an RPC service such as OpenCPU. OpenCPU exposes a HTTP API to let 3rd party clients call \R functions over HTTP. The function arguments are posted using \JSON and OpenCPU automatically converts these into \R objects to construct the \R function call. The return value of the function is then converted to \JSON and sent back to the client. To the client, the service works as a \JSON API, but it is implemented as standard \R function uses standar [...]
-%
-% \begin{itemize}
-%   \item{Recognize and comply with existing conventions of encoding common data structures in \JSON, in particular (relational) data sets.}
-%   \item{Consistently use a particular schema for a class of objects, including edge cases.}
-%   \item{Avoid R-specific peculiarities to minimize opportunities for misinterpretation.}
-%   \item{Mapping should optimally be reversible, but at least coercible for the standard classes.}
-%   \item{Robustness principle: be strict on output but tolerant on input.}
-% \end{itemize}
-
-
-\section[Converting between JSON data and R classes]{Converting between \JSON data and \R classes}
-
-This section lists examples of how the common \R classes are represented in \JSON. As explained before, the \toJSON function relies on method dispatch, which means that objects get encoded according to their \texttt{class} attribute. If an object has multiple \texttt{class} values, \R uses the first occurring class which has a \toJSON method. If none of the classes of an object has a \toJSON method, an error is raised.
-
-\subsection{Atomic vectors}
-
-The most basic data type in \R is the atomic vector. Atomic vectors hold an ordered, homogeneous set of values of type \texttt{logical} (booleans), \texttt{character} (strings), \texttt{raw} (bytes), \texttt{numeric} (doubles), \texttt{complex} (complex numbers with a real and imaginary part), or \texttt{integer}. Because \R is fully vectorized, there is no user level notion of a primitive: a scalar value is considered a vector of length 1. Atomic vectors map to \JSON arrays:
-
-<<>>=
-x <- c(1, 2, pi)
-toJSON(x)
-@
-
-The \JSON array is the only appropriate structure to encode a vector, even though vectors in \R are homogeneous, whereas the \JSON array is actually heterogeneous, but \JSON does not make this distinction.
-
-\subsubsection{Missing values}
-
-A typical domain specific problem when working with statistical data is presented by missing values: a concept foreign to many other languages. Besides regular values, each vector type in \R except for \texttt{raw} can hold \texttt{NA} as a value. Vectors of type \texttt{double} and \texttt{complex} define three additional types of non finite values: \texttt{NaN}, \texttt{Inf} and \texttt{-Inf}. The \JSON format does not natively support any of these types; therefore such values values n [...]
-
-<<>>=
-x <- c(TRUE, FALSE, NA)
-toJSON(x)
-@
-
-The other option is to encode missing values as strings by wrapping them in double quotes:
-
-<<>>=
-x <- c(1,2,NA,NaN,Inf,10)
-toJSON(x)
-@
-
-Both methods result in valid \JSON, but both have a limitation: the problem with the \texttt{null} type is that it is impossible to distinguish between different types of missing data, which could be a problem for numeric vectors. The values \texttt{Inf}, \texttt{-Inf}, \texttt{NA} and \texttt{NaN} carry different meanings, and these should not get lost in the encoding. The problem with encoding missing values as strings is that this method can not be used for character vectors, because  [...]
-
-\begin{itemize}
- \item Missing values in non-numeric vectors (\texttt{logical}, \texttt{character}) are encoded as \texttt{null}.
- \item Missing values in numeric vectors (\texttt{double}, \texttt{integer}, \texttt{complex}) are encoded as strings.
-\end{itemize}
-
-We expect that these conventions are most likely to result in the correct interpretation of missing values. Some examples:
-
-<<>>=
-toJSON(c(TRUE, NA, NA, FALSE))
-toJSON(c("FOO", "BAR", NA, "NA"))
-toJSON(c(3.14, NA, NaN, 21, Inf, -Inf))
-
-#Non-default behavior
-toJSON(c(3.14, NA, NaN, 21, Inf, -Inf), na="null")
-@
-
-\subsubsection{Special vector types: dates, times, factor, complex}
-
-Besides missing values, \JSON also lacks native support for some of the basic vector types in \R that frequently appear in data sets. These include vectors of class \texttt{Date}, \texttt{POSIXt} (timestamps), \texttt{factors} and \texttt{complex} vectors. By default, the \jsonlite package coerces these types to strings (using \texttt{as.character}):
-
-<<>>=
-toJSON(Sys.time() + 1:3)
-toJSON(as.Date(Sys.time()) + 1:3)
-toJSON(factor(c("foo", "bar", "foo")))
-toJSON(complex(real=runif(3), imaginary=rnorm(3)))
-@
-
-When parsing such \JSON strings, these values will appear as character vectors. In order to obtain the original types, the user needs to manually coerce them back to the desired type using the corresponding \texttt{as} function, e.g. \code{as.POSIXct}, \code{as.Date}, \code{as.factor} or \code{as.complex}. In this respect, \JSON is subject to the same limitations as text based formats such as \texttt{CSV}.
-
-\subsubsection{Special cases: vectors of length 0 or 1}
-
-Two edge cases deserve special attention: vectors of length 0 and vectors of length 1. In \jsonlite these are encoded respectively as an empty array, and an array of length 1:
-
-<<>>=
-#vectors of length 0 and 1
-toJSON(vector())
-toJSON(pi)
-
-#vectors of length 0 and 1 in a named list
-toJSON(list(foo=vector()))
-toJSON(list(foo=pi))
-
-#vectors of length 0 and 1 in an unnamed list
-toJSON(list(vector()))
-toJSON(list(pi))
-@
-
-This might seem obvious but these cases result in very different behavior between different \JSON packages. This is probably caused by the fact that \R does not have a scalar type, and some package authors decided to treat vectors of length 1 as if they were a scalar. For example, in the current implementations, both \RJSONIO and \pkg{rjson} encode a vector of length one as a \JSON primitive when it appears within a list:
-
-<<>>=
-# Other packages make different choices:
-cat(rjson::toJSON(list(n = c(1))))
-cat(rjson::toJSON(list(n = c(1, 2))))
-@
-
-When encoding a single dataset this seems harmless, but in the context of dynamic data this inconsistency is almost guaranteed to cause bugs. For example, imagine an \R web service which lets the user fit a linear model and sends back the fitted parameter estimates as a \JSON array. The client code then parses the \JSON, and iterates over the array of coefficients to display them in a \texttt{GUI}. All goes well, until the user decides to fit a model with only one predictor. If the \JSON [...]
-
-\subsection{Matrices}
-
-Arguably one of the strongest sides of \R is its ability to interface libraries for basic linear algebra subprograms \citep{lawson1979basic} such as \texttt{LAPACK} \citep{anderson1999lapack}. These libraries provide well tuned, high performance implementations of important linear algebra operations to calculate anything from inner products and eigen values to singular value decompositions, which are in turn building blocks of statistical methods such as linear regression or principal co [...]
-
-<<>>=
-x <- matrix(1:12, nrow=3, ncol=4)
-print(x)
-print(x[2,4])
-@
-
- A matrix is stored in memory as a single atomic vector with an attribute called \texttt{"dim"} defining the dimensions of the matrix. The product of the dimensions is equal to the length of the vector.
-
-<<>>=
-attributes(volcano)
-length(volcano)
-@
-
- Even though the matrix is stored as a single vector, the way it is printed and indexed makes it conceptually a 2 dimensional structure. In \jsonlite a matrix maps to an array of equal-length subarrays:
-
-<<>>=
-x <- matrix(1:12, nrow=3, ncol=4)
-toJSON(x)
-@
-
-We expect this representation will be the most intuitive to interpret, also within languages that do not have a native notion of a matrix. Note that even though \R stores matrices in \emph{column major} order, \jsonlite encodes matrices in \emph{row major} order. This is a more conventional and intuitive way to represent matrices and is consistent with the row-based encoding of data frames discussed in the next section. When the \JSON string is properly indented (recall that white space  [...]
-
-\begin{verbatim}
-[ [ 1, 4, 7, 10 ],
-  [ 2, 5, 8, 11 ],
-  [ 3, 6, 9, 12 ] ]
-\end{verbatim}
-
- Because the matrix is implemented in \R as an atomic vector, it automatically inherits the conventions mentioned earlier with respect to edge cases and missing values:
-
-<<>>=
-x <- matrix(c(1,2,4,NA), nrow=2)
-toJSON(x)
-toJSON(x, na="null")
-toJSON(matrix(pi))
-@
-
-
-\subsubsection{Matrix row and column names}
-
-Besides the \texttt{"dim"} attribute, the matrix class has an additional, optional attribute: \texttt{"dimnames"}. This attribute holds names for the rows and columns in the matrix. However, we decided not to include this information in the default \JSON mapping for matrices for several reasons. First of all, because this attribute is optional, either row or column names or both could be \texttt{NULL}. This makes it difficult to define a practical mapping that covers all cases with and w [...]
-
-When row or column names of a matrix seem to contain vital information, we might want to transform the data into a more appropriate structure. \cite{tidydata} calls this \emph{``tidying''} the data and outlines best practices on storing statistical data in its most appropriate form. He lists the issue where \emph{``column headers are values, not variable names''} as the most common source of untidy data. This often happens when the structure is optimized for presentation (e.g. printing), [...]
-
-<<>>=
-x <- matrix(c(NA,1,2,5,NA,3), nrow=3)
-row.names(x) <- c("Joe", "Jane", "Mary");
-colnames(x) <- c("Treatment A", "Treatment B")
-print(x)
-toJSON(x)
-@
-
-Wickham recommends that the data be \emph{melted} into its \emph{tidy} form. Once the data is tidy, the \JSON encoding will naturally contain the treatment values:
-
-<<>>=
-library(reshape2)
-y <- melt(x, varnames=c("Subject", "Treatment"))
-print(y)
-toJSON(y, pretty=TRUE)
-@
-
-In some other cases, the column headers actually do contain variable names, and melting is inappropriate. For data sets with records consisting of a set of named columns (fields), \R has more natural and flexible class: the data-frame. The \toJSON method for data frames (described later) is more suitable when we want to refer to rows or fields by their name. Any matrix can easily be converted to a data-frame using the \code{as.data.frame} function:
-
-<<>>=
-toJSON(as.data.frame(x), pretty=TRUE)
-@
-
-For some cases this results in the desired output, but in this example melting seems more appropriate.
-
-\subsection{Lists}
-
-The \texttt{list} is the most general purpose data structure in \R.  It holds an ordered set of elements, including other lists, each of arbitrary type and size. Two types of lists are distinguished: named lists and unnamed lists. A list is considered a named list if it has an attribute called \texttt{"names"}. In practice, a named list is any list for which we can access an element by its name, whereas elements of an unnamed lists can only be accessed using their index number:
-
-<<>>=
-mylist1 <- list("foo" = 123, "bar"= 456)
-print(mylist1$bar)
-mylist2 <- list(123, 456)
-print(mylist2[[2]])
-@
-
-\subsubsection{Unnamed lists}
-
-Just like vectors, an unnamed list maps to a \JSON array:
-
-<<>>=
-toJSON(list(c(1,2), "test", TRUE, list(c(1,2))))
-@
-
-Note that even though both vectors and lists are encoded using \JSON arrays, they can be distinguished from their contents: an \R vector results in a \JSON array containing only primitives, whereas a list results in a \JSON array containing only objects and arrays. This allows the \JSON parser to reconstruct the original type from encoded vectors and arrays:
-
-<<>>=
-x <- list(c(1,2,NA), "test", FALSE, list(foo="bar"))
-identical(fromJSON(toJSON(x)), x)
-@
-
- The only exception is the empty list and empty vector, which are both encoded as \texttt{[ ]} and therefore indistinguishable, but this is rarely a problem in practice.
-
-\subsubsection{Named lists}
-
-A named list in \R maps to a \JSON \emph{object}:
-
-<<>>=
-toJSON(list(foo=c(1,2), bar="test"))
-@
-
- Because a list can contain other lists, this works recursively:
-
-<<tidy=FALSE>>=
-toJSON(list(foo=list(bar=list(baz=pi))))
-@
-
- Named lists map almost perfectly to \JSON objects with one exception: list elements can have empty names:
-
-<<>>=
-x <- list(foo=123, "test", TRUE)
-attr(x, "names")
-x$foo
-x[[2]]
-@
-
- In a \JSON object, each element in an object must have a valid name. To ensure this property, \jsonlite uses the same solution as the \code{print} method, which is to fall back on indices for elements that do not have a proper name:
-
-<<>>=
-x <- list(foo=123, "test", TRUE)
-print(x)
-toJSON(x)
-@
-
- This behavior ensures that all generated \JSON is valid, however named lists with empty names should be avoided where possible. When actually designing \R objects that should be interoperable, it is recommended that each list element is given a proper name.
-
-\subsection{Data frame}
-
-The \texttt{data frame} is perhaps the most central data structure in \R from the user point of view. This class holds tabular data in which each column is named and (usually) homogeneous. Conceptually it is very similar to a table in relational data bases such as \texttt{MySQL}, where \emph{fields} are referred to as \emph{column names}, and \emph{records} are called \emph{rows}. Like a matrix, a data frame can be subsetted with two indices, to extract certain rows and columns of the data:
-
-<<>>=
-is(iris)
-names(iris)
-print(iris[1:3, c(1,5)])
-print(iris[1:3, c("Sepal.Width", "Species")])
-@
-
- For the previously discussed classes such as vectors and matrices, behavior of \jsonlite was quite similar to the other available packages that implement \toJSON and \fromJSON functions, with only minor differences for missing values and edge cases. But when it comes to data frames, \jsonlite takes a completely different approach. The behavior of \jsonlite is designed for compatibility with conventional ways of encoding table-like structures outside the \R community. The implementation  [...]
-
-\subsubsection{Column based versus row based tables}
-
-Generally speaking, tabular data structures can be implemented in two different ways: in a column based, or row based fashion. A column based structure consists of a named collection of equal-length, homogeneous arrays representing the table columns. In a row-based structure on the other hand, the table is implemented as a set of heterogeneous associative arrays representing table rows with field values for each particular record. Even though most languages provide flexible and abstracte [...]
-
-The data frame class in \R is implemented in a column based fashion: it constitutes of a \texttt{named list} of equal-length vectors. Thereby the columns in the data frame naturally inherit the properties from atomic vectors discussed before, such as homogeneity, missing values, etc. Another argument for column-based implementation is that statistical methods generally operate on columns. For example, the \code{lm} function fits a \emph{linear regression} by extracting the columns from a [...]
-
-Unfortunately \R is an exception in its preference for column-based storage: most languages, systems, databases, \API's, etc, are optimized for record based operations. For this reason, the conventional way to store and communicate tabular data in \JSON seems to almost exclusively row based. This discrepancy presents various complications when converting between data frames and \JSON. The remaining of this section discusses details and challenges of consistently mapping record based \JSO [...]
-
-\subsubsection{Row based data frame encoding}
-
-The encoding of data frames is one of the major differences between \jsonlite and implementations from other currently available packages. Instead of using the column-based encoding also used for lists, \jsonlite maps data frames by default to an array of records:
-
-<<>>=
-toJSON(iris[1:2,], pretty=TRUE)
-@
-
- This output looks a bit like a list of named lists. However, there is one major difference: the individual records contain \JSON primitives, whereas lists always contain \JSON objects or arrays:
-
-<<>>=
-toJSON(list(list(Species="Foo", Width=21)), pretty=TRUE)
-@
-
- This leads to the following convention: when encoding \R objects, \JSON primitives only appear in vectors and data-frame rows. Primitives within a \JSON array indicate a vector, and primitives appearing inside a \JSON object indicate a data-frame row. A \JSON encoded \texttt{list}, (named or unnamed) will never contain \JSON primitives. This is a subtle but important convention that helps to distinguish between \R classes from their \JSON representation, without explicitly encoding any  [...]
-
-\subsubsection{Missing values in data frames}
-
-The section on atomic vectors discussed two methods of encoding missing data appearing in a vector: either using strings or using the \JSON \texttt{null} type. When a missing value appears in a data frame, there is a third option: simply not include this field in \JSON record:
-
-<<>>=
-x <- data.frame(foo=c(FALSE, TRUE,NA,NA), bar=c("Aladdin", NA, NA, "Mario"))
-print(x)
-toJSON(x, pretty=TRUE)
-@
-
- The default behavior of \jsonlite is to omit missing data from records in a data frame. This seems to be the most conventional method used on the web, and we expect this encoding will most likely lead to the correct interpretation of \emph{missingness}, even in languages without an explicit notion of \texttt{NA}.
-
-\subsubsection{Relational data: nested records}
-
-Nested datasets are somewhat unusual in \R, but frequently encountered in \JSON. Such structures do not really fit the vector based paradigm which makes them harder to manipulate in \R.  However, nested structures are too common in \JSON to ignore, and with a little work most cases still map to a data frame quite nicely. The most common scenario is a dataset in which a certain field within each record contains a \emph{subrecord} with additional fields. The \jsonlite implementation maps t [...]
-
-<<tidy=FALSE>>=
-options(stringsAsFactors=FALSE)
-x <- data.frame(driver = c("Bowser", "Peach"), occupation = c("Koopa", "Princess"))
-x$vehicle <- data.frame(model = c("Piranha Prowler", "Royal Racer"))
-x$vehicle$stats <- data.frame(speed = c(55, 34), weight = c(67, 24), drift = c(35, 32))
-str(x)
-toJSON(x, pretty=TRUE)
-myjson <- toJSON(x)
-y <- fromJSON(myjson)
-identical(x,y)
-@
-
- When encountering \JSON data containing nested records on the web, chances are that these data were generated from \emph{relational} database. The \JSON field containing a subrecord represents a \emph{foreign key} pointing to a record in an external table. For the purpose of encoding these into a single \JSON structure, the tables were joined into a nested structure. The directly nested subrecord represents a \emph{one-to-one} or \emph{many-to-one} relation between the parent and child  [...]
-
-<<>>=
-y <- fromJSON(myjson, flatten=TRUE)
-str(y)
-@
-
-\subsubsection{Relational data: nested tables}
-
-The one-to-one relation discussed above is relatively easy to store in \R, because each record contains at most one subrecord. Therefore we can use either a nested data frame, or flatten the data frame. However, things get more difficult when \JSON records contain a field with a nested array. Such a structure appears in relational data in case of a \emph{one-to-many} relation. A standard textbook illustration is the relation between authors and titles. For example, a field can contain an [...]
-
-<<tidy=FALSE>>=
-x <- data.frame(author = c("Homer", "Virgil", "Jeroen"))
-x$poems <- list(c("Iliad", "Odyssey"), c("Eclogues", "Georgics", "Aeneid"), vector());
-names(x)
-toJSON(x, pretty = TRUE)
-@
-
- As can be seen from the example, the way to store this in a data frame is using a list of character vectors. This works, and although unconventional, we can still create and read such structures in \R relatively easily. However, in practice the one-to-many relation is often more complex. It results in fields containing a \emph{set of records}. In \R, the only way to model this is as a column containing a list of data frames, one separate data frame for each row:
-
-<<tidy=FALSE>>=
-x <- data.frame(author = c("Homer", "Virgil", "Jeroen"))
-x$poems <- list(
-  data.frame(title=c("Iliad", "Odyssey"), year=c(-1194, -800)),
-  data.frame(title=c("Eclogues", "Georgics", "Aeneid"), year=c(-44, -29, -19)),
-  data.frame()
-)
-toJSON(x, pretty=TRUE)
-@
-
- Because \R doesn't have native support for relational data, there is no natural class to store such structures. The best we can do is a column containing a list of sub-dataframes. This does the job, and allows the \R user to access or generate nested \JSON structures. However, a data frame like this cannot be flattened, and the class does not guarantee that each of the individual nested data frames contain the same fields, as would be the case in an actual relational data base.
-
-
-\section{Structural consistency and type safety in dynamic data}
-
-Systems that automatically exchange information over some interface, protocol or \API require well defined and unambiguous meaning and arrangement of data. In order to process and interpret input and output, contents must obey a steady structure. Such structures are usually described either informally in documentation or more formally in a schema language. The previous section emphasized the importance of consistency in the mapping between \JSON data and \R classes. This section takes a  [...]
-
-\subsection{Classes, types and data}
-
-Most object-oriented languages are designed with the idea that all objects of a certain class implement the same fields and methods. In strong-typed languages such as \proglang{S4} or \proglang{Java}, names and types of the fields are formally declared in a class definition. In other languages such as \proglang{S3} or \proglang{JavaScript}, the fields are not enforced by the language but rather at the discretion of the programmer. One way or another they assume that members of a certain  [...]
-
-Some data interchange formats such as \texttt{XML} or \texttt{Protocol Buffers} take a formal approach to this matter, and have well established \emph{schema languages} and \emph{interface description languages}. Using such a meta language it is possible to define the exact structure, properties and actions of data interchange in a formal arrangement. However, in \JSON, such formal definitions are relatively uncommon. Some initiatives for \JSON schema languages exist \citep{jsonschema},  [...]
-
-\subsection{Rule 1: Fixed keys}
-
-When using \JSON without a schema, there are no restrictions on the keys (field names) that can appear in a particular object. However, a source of data that returns a different set of keys every time it is called makes it very difficult to write software to process these data. Hence, the first rule is to limit \JSON interfaces to a finite set of keys that are known \emph{a priory} by all parties. It can be helpful to think about this in analogy with for example a relational database. He [...]
-
-A beautiful example of this in practice was given by Mike Dewar at the New York Open Statistical Programming Meetup on Jan. 12, 2012 \citep{jsonkeys}. In his talk he emphasizes to use \JSON keys only for \emph{names}, and not for \emph{data}. He refers to this principle as the ``golden rule'', and explains how he learned his lesson the hard way. In one of his early applications, timeseries data was encoded by using the epoch timestamp as the \JSON key. Therefore the keys are different ea [...]
-
-\begin{verbatim}
-[
-  { "1325344443" : 124 },
-  { "1325344456" : 131 },
-  { "1325344478" : 137 }
-]
-\end{verbatim}
-
- Even though being valid \JSON, dynamic keys as in the example above are likely to introduce trouble. Most software will have great difficulty processing these values if we can not specify the keys in the code. Moreover when documenting the API, either informally or formally using a schema language, we need to describe for each property in the data what the value means and is composed of. Thereby a client or consumer can implement code that interprets and process each element in the data [...]
-
-\begin{verbatim}
-[
-  { "time": "1325344443" : "price": 124 },
-  { "time": "1325344456" : "price": 131 },
-  { "time": "1325344478" : "price": 137 }
-]
-\end{verbatim}
-
- This structure will play much nicer with existing software that assumes fixed keys. Moreover, the structure can easily be described in documentation, or captured in a schema. Even when we have no intention of writing documentation or a schema for a dynamic \JSON source, it is still wise to design the structure in such away that it \emph{could} be described by a schema. When the keys are fixed, a well chosen example can provide all the information required for the consumer to implement c [...]
-
-In the context of \R, consistency of keys is closely related to Wikcham's concept of \emph{tidy data} discussed earlier. Wickham states that the most common reason for messy data are column headers containing values instead of variable names. Column headers in tabular datasets become keys when converted to \JSON. Therefore, when headers are actually values, \JSON keys contain in fact data and can become unpredictable. The cure to inconsistent keys is almost always to tidy the data accord [...]
-
-\subsection{Rule 2: Consistent types}
-
-In a strong typed language, fields declare their class before any values are assigned. Thereby the type of a given field is identical in all objects of a particular class, and arrays only contain objects of a single type. The \proglang{S3} system in \R is weakly typed and puts no formal restrictions on the class of a certain properties, or the types of objects that can be combined into a collection. For example, the list below contains a character vector, a numeric vector and a list:
-
-<<>>=
-#Heterogeneous lists are bad!
-x <- list("FOO", 1:3, list("bar"=pi))
-toJSON(x)
-@
-
- However even though it is possible to generate such \JSON, it is bad practice. Fields or collections with ambiguous object types are difficult to describe, interpret and process in the context of inter-system communication. When using \JSON to exchange dynamic data, it is important that each property and array is \emph{type consistent}. In dynamically typed languages, the programmer needs to make sure that properties are of the correct type before encoding into \JSON. For \R, this means [...]
-
- Note that consistency is somewhat subjective as it refers to the \emph{meaning} of the elements; they do not necessarily have precisely the same structure. What is important is to keep in mind that the consumer of the data can interpret and process each element identically, e.g. iterate over the elements in the collection and apply the same method to each of them. To illustrate this, lets take the example of the data frame:
-
-<<>>=
-#conceptually homogenous array
-x <- data.frame(name=c("Jay", "Mary", NA, NA), gender=c("M", NA, NA, "F"))
-toJSON(x, pretty=TRUE)
-@
-
-The \JSON array above has 4 elements, each of which a \JSON object. However, due to the \texttt{NA} values, some records have more fields than others. But as long as they are conceptually the same type (e.g. a person), the consumer can iterate over the elements to process each person in the set according to a predefined action. For example each element could be used to construct a \texttt{Person} object. A collection of different object classes should be separated and organized using a n [...]
-
-<<tidy=FALSE>>=
-x <- list(
-  humans = data.frame(name = c("Jay", "Mary"), married = c(TRUE, FALSE)),
-  horses = data.frame(name = c("Star", "Dakota"), price = c(5000, 30000))
-)
-toJSON(x, pretty=TRUE)
-@
-
- This might seem obvious, but dynamic languages such as \R can make it dangerously tempting to generate data containing mixed-type collections. Such inconsistent typing makes it very difficult to consume the data and creates a likely source of nasty bugs. Using consistent field names/types and homogeneous \JSON arrays is a strong convention among public \JSON \API's, for good reasons. We recommend \R users to respect these conventions when generating \JSON data in \R.
-
-
-%references
-\bibliographystyle{plainnat}
-\bibliography{references}
-
-%end
-\end{document}
diff --git a/vignettes/json-paging.Rmd.orig b/vignettes/json-paging.Rmd.orig
deleted file mode 100644
index 25b5a8f..0000000
--- a/vignettes/json-paging.Rmd.orig
+++ /dev/null
@@ -1,92 +0,0 @@
----
-title: "Combining pages of JSON data with jsonlite"
-date: "`r Sys.Date()`"
-output:
-  html_document
-vignette: >
-  %\VignetteIndexEntry{Combining pages of JSON data with jsonlite}
-  %\VignetteEngine{knitr::rmarkdown}
-  \usepackage[utf8]{inputenc}
----
-
-
-```{r echo=FALSE}
-library(knitr)
-opts_chunk$set(comment="")
-
-#this replaces tabs by spaces because latex-verbatim doesn't like tabs
-toJSON <- function(...){
-  gsub("\t", "  ", jsonlite::toJSON(...), fixed=TRUE);
-}
-```
-
-```{r echo=FALSE, message=FALSE}
-library(jsonlite)
-```
-
-The [jsonlite](https://cran.r-project.org/package=jsonlite) package is a `JSON` parser/generator for R which is optimized for pipelines and web APIs. It is used by the OpenCPU system and many other packages to get data in and out of R using the `JSON` format.
-
-## A bidirectional mapping
-
-One of the main strengths of `jsonlite` is that it implements a bidirectional [mapping](http://arxiv.org/abs/1403.2805) between JSON and data frames. Thereby it can convert nested collections of JSON records, as they often appear on the web, immediately into the appropriate R structure. For example to grab some data from ProPublica we can simply use:
-
-```{r eval=FALSE}
-library(jsonlite)
-mydata <- fromJSON("https://projects.propublica.org/forensics/geos.json", flatten = TRUE)
-View(mydata)
-```
-
-The `mydata` object is a data frame which can be used directly for modeling or visualization, without the need for any further complicated data manipulation.
-
-## Paging with jsonlite
-
-A question that comes up frequently is how to combine pages of data. Most web APIs limit the amount of data that can be retrieved per request. If the client needs more data than what can fits in a single request, it needs to break down the data into multiple requests that each retrieve a fragment (page) of data, not unlike pages in a book. In practice this is often implemented using a `page` parameter in the API. Below an example from the [ProPublica Nonprofit Explorer API](http://projec [...]
-
-```{r}
-baseurl <- "https://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"
-mydata0 <- fromJSON(paste0(baseurl, "&page=0"), flatten = TRUE)
-mydata1 <- fromJSON(paste0(baseurl, "&page=1"), flatten = TRUE)
-mydata2 <- fromJSON(paste0(baseurl, "&page=2"), flatten = TRUE)
-
-#The actual data is in the filings element
-mydata0$filings[1:10, c("organization.sub_name", "organization.city", "totrevenue")]
-```
-
-To analyze or visualize these data, we need to combine the pages into a single dataset. We can do this with the `rbind.pages` function. Note that in this example, the actual data is contained by the `filings` field:
-
-```{r}
-#Rows per data frame
-nrow(mydata0$filings)
-
-#Combine data frames
-filings <- rbind.pages(
-  list(mydata0$filings, mydata1$filings, mydata2$filings)
-)
-
-#Total number of rows
-nrow(filings)
-```
-
-## Automatically combining many pages
-
-We can write a simple loop that automatically downloads and combines many pages. For example to retrieve the first 20 pages with non-profits from the example above:
-
-```{r, message=FALSE}
-#store all pages in a list first
-baseurl <- "https://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc"
-pages <- list()
-for(i in 0:20){
-  mydata <- fromJSON(paste0(baseurl, "&page=", i))
-  message("Retrieving page ", i)
-  pages[[i+1]] <- mydata$filings
-}
-
-#combine all into one
-filings <- rbind.pages(pages)
-
-#check output
-nrow(filings)
-colnames(filings)
-```
-
-From here, we can go straight to analyzing the filings data without any further tedious data manipulation.

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/r-cran-jsonlite.git



More information about the debian-science-commits mailing list