[r-cran-recipes] 01/02: New upstream version 0.1.0

Andreas Tille tille at debian.org
Sun Oct 22 19:15:30 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-cran-recipes.

commit eaf55c46ed8c9c7a2bc3e53593b435e500cac155
Author: Andreas Tille <tille at debian.org>
Date:   Sun Oct 22 21:14:37 2017 +0200

    New upstream version 0.1.0
---
 DESCRIPTION                            |  34 ++
 MD5                                    | 176 ++++++++++
 NAMESPACE                              | 233 +++++++++++++
 NEWS.md                                |  29 ++
 R/BoxCox.R                             | 180 ++++++++++
 R/YeoJohnson.R                         | 209 ++++++++++++
 R/bag_imp.R                            | 205 +++++++++++
 R/bin2factor.R                         | 106 ++++++
 R/center.R                             | 112 ++++++
 R/classdist.R                          | 192 +++++++++++
 R/corr.R                               | 173 ++++++++++
 R/data.R                               |  85 +++++
 R/date.R                               | 228 +++++++++++++
 R/depth.R                              | 171 ++++++++++
 R/discretize.R                         | 291 ++++++++++++++++
 R/dummy.R                              | 167 +++++++++
 R/holiday.R                            | 163 +++++++++
 R/hyperbolic.R                         | 112 ++++++
 R/ica.R                                | 164 +++++++++
 R/interactions.R                       | 218 ++++++++++++
 R/intercept.R                          |  78 +++++
 R/invlogit.R                           |  90 +++++
 R/isomap.R                             | 171 ++++++++++
 R/knn_imp.R                            | 192 +++++++++++
 R/kpca.R                               | 179 ++++++++++
 R/lincombo.R                           | 193 +++++++++++
 R/log.R                                |  95 ++++++
 R/logit.R                              |  91 +++++
 R/meanimpute.R                         | 116 +++++++
 R/misc.R                               | 326 ++++++++++++++++++
 R/modeimpute.R                         | 110 ++++++
 R/ns.R                                 | 141 ++++++++
 R/nzv.R                                | 172 ++++++++++
 R/ordinalscore.R                       | 115 +++++++
 R/other.R                              | 169 +++++++++
 R/pca.R                                | 192 +++++++++++
 R/pkg.R                                |  33 ++
 R/poly.R                               | 145 ++++++++
 R/range.R                              | 122 +++++++
 R/ratio.R                              | 156 +++++++++
 R/recipe.R                             | 601 +++++++++++++++++++++++++++++++++
 R/regex.R                              | 146 ++++++++
 R/rm.R                                 |  98 ++++++
 R/roles.R                              |  63 ++++
 R/scale.R                              | 105 ++++++
 R/selections.R                         | 342 +++++++++++++++++++
 R/shuffle.R                            |  87 +++++
 R/spatialsign.R                        | 103 ++++++
 R/sqrt.R                               |  83 +++++
 R/window.R                             | 253 ++++++++++++++
 build/vignette.rds                     | Bin 0 -> 300 bytes
 data/biomass.RData                     | Bin 0 -> 10252 bytes
 data/covers.RData                      | Bin 0 -> 720 bytes
 data/credit_data.RData                 | Bin 0 -> 53052 bytes
 data/datalist                          |   4 +
 data/okc.RData                         | Bin 0 -> 184471 bytes
 inst/doc/Custom_Steps.R                | 154 +++++++++
 inst/doc/Custom_Steps.Rmd              | 247 ++++++++++++++
 inst/doc/Custom_Steps.html             | 315 +++++++++++++++++
 inst/doc/Ordering.Rmd                  |  28 ++
 inst/doc/Ordering.html                 |  87 +++++
 inst/doc/Selecting_Variables.R         |  33 ++
 inst/doc/Selecting_Variables.Rmd       |  73 ++++
 inst/doc/Selecting_Variables.html      | 181 ++++++++++
 inst/doc/Simple_Example.R              |  62 ++++
 inst/doc/Simple_Example.Rmd            | 134 ++++++++
 inst/doc/Simple_Example.html           | 291 ++++++++++++++++
 man/add_role.Rd                        |  48 +++
 man/add_step.Rd                        |  23 ++
 man/bake.Rd                            |  50 +++
 man/biomass.Rd                         |  25 ++
 man/covers.Rd                          |  23 ++
 man/credit_data.Rd                     |  23 ++
 man/discretize.Rd                      | 124 +++++++
 man/has_role.Rd                        |  64 ++++
 man/juice.Rd                           |  53 +++
 man/names0.Rd                          |  25 ++
 man/okc.Rd                             |  24 ++
 man/prep.Rd                            |  73 ++++
 man/print.recipe.Rd                    |  26 ++
 man/recipe.Rd                          | 165 +++++++++
 man/recipes-internal.Rd                |  18 +
 man/recipes.Rd                         |  40 +++
 man/reexports.Rd                       |  16 +
 man/selections.Rd                      |  99 ++++++
 man/step.Rd                            |  25 ++
 man/step_BoxCox.Rd                     |  80 +++++
 man/step_YeoJohnson.Rd                 |  86 +++++
 man/step_bagimpute.Rd                  | 102 ++++++
 man/step_bin2factor.Rd                 |  61 ++++
 man/step_center.Rd                     |  69 ++++
 man/step_classdist.Rd                  |  82 +++++
 man/step_corr.Rd                       |  83 +++++
 man/step_date.Rd                       |  83 +++++
 man/step_depth.Rd                      |  87 +++++
 man/step_dummy.Rd                      |  84 +++++
 man/step_holiday.Rd                    |  68 ++++
 man/step_hyperbolic.Rd                 |  62 ++++
 man/step_ica.Rd                        | 103 ++++++
 man/step_interact.Rd                   |  82 +++++
 man/step_intercept.Rd                  |  58 ++++
 man/step_invlogit.Rd                   |  65 ++++
 man/step_isomap.Rd                     | 106 ++++++
 man/step_knnimpute.Rd                  | 105 ++++++
 man/step_kpca.Rd                       | 118 +++++++
 man/step_lincomb.Rd                    |  74 ++++
 man/step_log.Rd                        |  59 ++++
 man/step_logit.Rd                      |  60 ++++
 man/step_meanimpute.Rd                 |  72 ++++
 man/step_modeimpute.Rd                 |  67 ++++
 man/step_ns.Rd                         |  70 ++++
 man/step_nzv.Rd                        |  86 +++++
 man/step_ordinalscore.Rd               |  73 ++++
 man/step_other.Rd                      |  76 +++++
 man/step_pca.Rd                        | 113 +++++++
 man/step_poly.Rd                       |  72 ++++
 man/step_range.Rd                      |  67 ++++
 man/step_ratio.Rd                      |  78 +++++
 man/step_regex.Rd                      |  66 ++++
 man/step_rm.Rd                         |  55 +++
 man/step_scale.Rd                      |  65 ++++
 man/step_shuffle.Rd                    |  48 +++
 man/step_spatialsign.Rd                |  72 ++++
 man/step_sqrt.Rd                       |  55 +++
 man/step_window.Rd                     | 113 +++++++
 man/summary.recipe.Rd                  |  40 +++
 man/terms_select.Rd                    |  40 +++
 tests/testthat.R                       |   6 +
 tests/testthat/test-basics.R           |  45 +++
 tests/testthat/test_BoxCox.R           |  58 ++++
 tests/testthat/test_YeoJohnson.R       |  61 ++++
 tests/testthat/test_bagimpute.R        |  57 ++++
 tests/testthat/test_bin2factor.R       |  38 +++
 tests/testthat/test_center_scale.R     |  68 ++++
 tests/testthat/test_classdist.R        |  47 +++
 tests/testthat/test_corr.R             |  43 +++
 tests/testthat/test_date.R             |  97 ++++++
 tests/testthat/test_depth.R            |  55 +++
 tests/testthat/test_discretized.R      |  39 +++
 tests/testthat/test_dummies.R          |  39 +++
 tests/testthat/test_holiday.R          |  57 ++++
 tests/testthat/test_hyperbolic.R       |  45 +++
 tests/testthat/test_ica.R              |  88 +++++
 tests/testthat/test_interact.R         |  76 +++++
 tests/testthat/test_intercept.R        |  61 ++++
 tests/testthat/test_invlogit.R         |  27 ++
 tests/testthat/test_isomap.R           |  43 +++
 tests/testthat/test_knnimpute.R        |  63 ++++
 tests/testthat/test_kpca.R             |  42 +++
 tests/testthat/test_lincomb.R          |  67 ++++
 tests/testthat/test_log.R              |  44 +++
 tests/testthat/test_logit.R            |  36 ++
 tests/testthat/test_meanimpute.R       |  56 +++
 tests/testthat/test_modeimpute.R       |  46 +++
 tests/testthat/test_multivariate.R     |  28 ++
 tests/testthat/test_ns.R               |  58 ++++
 tests/testthat/test_nzv.R              |  58 ++++
 tests/testthat/test_ordinalscore.R     |  72 ++++
 tests/testthat/test_other.R            | 135 ++++++++
 tests/testthat/test_pca.R              |  74 ++++
 tests/testthat/test_poly.R             |  58 ++++
 tests/testthat/test_range.R            | 105 ++++++
 tests/testthat/test_ratio.R            |  96 ++++++
 tests/testthat/test_regex.R            |  47 +++
 tests/testthat/test_retraining.R       |  27 ++
 tests/testthat/test_rm.R               |  34 ++
 tests/testthat/test_roles.R            |  36 ++
 tests/testthat/test_roll.R             |  75 ++++
 tests/testthat/test_select_terms.R     | 106 ++++++
 tests/testthat/test_shuffle.R          |  74 ++++
 tests/testthat/test_spatialsign.R      |  35 ++
 tests/testthat/test_sqrt.R             |  29 ++
 tests/testthat/test_stringsAsFactors.R |  42 +++
 vignettes/Custom_Steps.Rmd             | 247 ++++++++++++++
 vignettes/Ordering.Rmd                 |  28 ++
 vignettes/Selecting_Variables.Rmd      |  73 ++++
 vignettes/Simple_Example.Rmd           | 134 ++++++++
 177 files changed, 16748 insertions(+)

diff --git a/DESCRIPTION b/DESCRIPTION
new file mode 100644
index 0000000..3e2906b
--- /dev/null
+++ b/DESCRIPTION
@@ -0,0 +1,34 @@
+Package: recipes
+Title: Preprocessing Tools to Create Design Matrices
+Version: 0.1.0
+Authors at R: c(
+    person("Max", "Kuhn", , "max at rstudio.com", c("aut", "cre")),
+    person("Hadley", "Wickham", , "hadley at rstudio.com", "aut"),
+    person("RStudio", role = "cph"))
+Description: An extensible framework to create and preprocess 
+    design matrices. Recipes consist of one or more data manipulation 
+    and analysis "steps". Statistical parameters for the steps can 
+    be estimated from an initial data set and then applied to 
+    other data sets. The resulting design matrices can then be used 
+    as inputs into statistical or machine learning models. 
+URL: https://github.com/topepo/recipes
+BugReports: https://github.com/topepo/recipes/issues
+Depends: R (>= 3.2.3), dplyr
+Imports: tibble, stats, ipred, dimRed (>= 0.1.0), lubridate, timeDate,
+        ddalpha, purrr, rlang (>= 0.1.1), gower, RcppRoll, tidyselect
+        (>= 0.1.1), magrittr
+Suggests: testthat, rpart, kernlab, fastICA, RANN, igraph, knitr,
+        caret, ggplot2, rmarkdown
+License: GPL-2
+VignetteBuilder: knitr
+Encoding: UTF-8
+LazyData: true
+RoxygenNote: 6.0.1
+NeedsCompilation: no
+Packaged: 2017-07-27 01:40:39 UTC; max
+Author: Max Kuhn [aut, cre],
+  Hadley Wickham [aut],
+  RStudio [cph]
+Maintainer: Max Kuhn <max at rstudio.com>
+Repository: CRAN
+Date/Publication: 2017-07-27 10:46:19 UTC
diff --git a/MD5 b/MD5
new file mode 100644
index 0000000..01ed430
--- /dev/null
+++ b/MD5
@@ -0,0 +1,176 @@
+6b1fbe18a564d9e05e83081587800ca4 *DESCRIPTION
+9ed058621fa6f53ab41f8f212291b6c2 *NAMESPACE
+ee608d65a63e343e9954115594cabfd3 *NEWS.md
+7c1e34669a83a0b2a7592dd9e0d1fbb2 *R/BoxCox.R
+e65da45dac822687cb050210c7b09d68 *R/YeoJohnson.R
+a520753018c1fd42a42ff3c68a8ca689 *R/bag_imp.R
+3c0634e9debd664de368ce4fcd514da0 *R/bin2factor.R
+09b4d519ba843bcd610911871f19e364 *R/center.R
+dd04a134028014575c3683c79e2a9b9c *R/classdist.R
+cf578c88dae8b0d33df61cc18f5b35d3 *R/corr.R
+19d08e7a631aaede8a8639eaac7d82fb *R/data.R
+68e0f1094c3413b8b4a6068db1b4c621 *R/date.R
+280c26d3d7160a66290053108837c6b1 *R/depth.R
+e5fc6bd59476c2a1d6fb8758cbe41ab3 *R/discretize.R
+d6d77d7d1e1c17f67316df6440a11450 *R/dummy.R
+3aa1fc188cf19c6d96376c575a02d669 *R/holiday.R
+f27b000060b6bddd9a537c9b817d2cdb *R/hyperbolic.R
+444bfa74286ec154aec985224fe33a6d *R/ica.R
+1ef68b9cf753c60151fbb69ef1829868 *R/interactions.R
+a01c48e3c15253bd4057c4a75f104811 *R/intercept.R
+1102fd888c0ba728e4d8ef2aed48e82f *R/invlogit.R
+b70484cbced7bc39b6f1b848215be9fd *R/isomap.R
+452719f35009648bf32ba21c71e56ea8 *R/knn_imp.R
+179a405fcbd02bdc4d515f485cd3865d *R/kpca.R
+da044733b5ba8050490715eb9c461b82 *R/lincombo.R
+05abc53192652735201fff749b4585cb *R/log.R
+894f65fdecdf2550e4fb3fd4fa84bc96 *R/logit.R
+846797e57c4e2407d45187270378fc72 *R/meanimpute.R
+39157df324a60a40efaf0c4563d4b96f *R/misc.R
+ff7231ba10cc09600c4ffa59f09deee9 *R/modeimpute.R
+7f38471989ea0259c40476334917ea09 *R/ns.R
+7eb09f7654bb7be4bec0f57856ea468c *R/nzv.R
+a2a8895fe0f2fbe2360f577df99b14e8 *R/ordinalscore.R
+670e5590996e4ef421b69c3b09948db4 *R/other.R
+d3aee42ffbae201df9b46c4f4287b5af *R/pca.R
+f5b7c4f945877cb42e32cd96414854a7 *R/pkg.R
+c75cb75a9301e3627857c341979712c5 *R/poly.R
+d90750877d572f3bfa8720b976a74ce5 *R/range.R
+18507e2dfdce5f16b732e18a2ae6b061 *R/ratio.R
+382edeb3255b00aa28ffec20199101a9 *R/recipe.R
+55b444e7488d9d6f25ca1c57a7cf7ba1 *R/regex.R
+53233a2a56740b8ebf43692b0bc21ead *R/rm.R
+de64b8042ebe32be6939a5974d2507eb *R/roles.R
+bb71fa12d1fcc3bf8877c6162e4bdb68 *R/scale.R
+5d3c6dd231d9e64a6ceba4d3d81f98ab *R/selections.R
+312dee6c355000365201e871b42caf8a *R/shuffle.R
+3200d03829d576d4b1ad808ac5a13a6e *R/spatialsign.R
+57c8938eee6af867a7b5c75edd6e586d *R/sqrt.R
+57540dda916844b1a9d21427fd228f21 *R/window.R
+546a7881d38ad0eb83b945e5f8c0e39a *build/vignette.rds
+5ebd9dd64363dfd1c5cb4505f34e56d9 *data/biomass.RData
+6b9a657fc4ffd6c81ef31ecae4c15bdd *data/covers.RData
+2b13bd28fcf63623365ba65d6a460068 *data/credit_data.RData
+7f522dd3d7d6cdba4699b26247b8ad4a *data/datalist
+778f01de1e1fc099b22f40b93aee9aa3 *data/okc.RData
+b626d73ea193729345770210268faecb *inst/doc/Custom_Steps.R
+bf6a066d45d50040bed41e49fe9ca9f4 *inst/doc/Custom_Steps.Rmd
+4476d0782f9569e8923e0c1bbdddbc8d *inst/doc/Custom_Steps.html
+6f47a6cc05c76322a10966b345257daf *inst/doc/Ordering.Rmd
+77001bf96182127d91b3dd23b3fd5907 *inst/doc/Ordering.html
+78e2f4cf4a9074f5461dcaf22960e3d5 *inst/doc/Selecting_Variables.R
+6fdc74bfa5ea87db55545b13c163c019 *inst/doc/Selecting_Variables.Rmd
+d93e985c12710290808ca548f420f63d *inst/doc/Selecting_Variables.html
+68ce66004a2b1f3d3e521dec861c0c7c *inst/doc/Simple_Example.R
+e6e51ca1d9e1d605ed19253b081f8581 *inst/doc/Simple_Example.Rmd
+f30853e0b8541517930fec85fa351c24 *inst/doc/Simple_Example.html
+af5ad8334db8ee7898989d61500ed29a *man/add_role.Rd
+bd8c0aed85af98810b95bc09e0610852 *man/add_step.Rd
+20c973c53ce254ee8a049107cf21475c *man/bake.Rd
+10c18f9bc905c4fa9acf503d72a6d5ba *man/biomass.Rd
+3efeeec95688239c8366335c06cae9b4 *man/covers.Rd
+2d19524f74b563b060182be2b7f9d543 *man/credit_data.Rd
+0cd71b71946264bd370872a083729ff9 *man/discretize.Rd
+e80188e260e6530b99cb04d200cf470d *man/has_role.Rd
+e8cd4f4ea23fc1c4610595c405dbfe65 *man/juice.Rd
+cb5b923d75ea8a3e5969f8031c19fc2e *man/names0.Rd
+805bb51de405cde4c0c152a071a33de4 *man/okc.Rd
+09ab29d7bf8e8db140a1040d09ff8701 *man/prep.Rd
+cefc6713fba0f89f3ec6477a510d27c2 *man/print.recipe.Rd
+3fbe3a44178fa8eeb64f774570e493fe *man/recipe.Rd
+43f90d39972a4376d923a5ec35e925eb *man/recipes-internal.Rd
+5d3d2bd9e5d0f8dd0624d965b14c2864 *man/recipes.Rd
+19b330c57a41b9f4e9577b52ce5d7065 *man/reexports.Rd
+337080f0cfd3c74515181431c2dc4042 *man/selections.Rd
+afb81d525a924caf8467f704f5c93c2f *man/step.Rd
+da090e640bbce30f2d7dd0ba1c941ac1 *man/step_BoxCox.Rd
+559bcef8155630d18ebc45a75221b99b *man/step_YeoJohnson.Rd
+4268f48149896942889ee23f5f9da18d *man/step_bagimpute.Rd
+a938f25452863f8585a0e5950c8861c0 *man/step_bin2factor.Rd
+8a6045ffdc926a3ffc9ef9d5bacb75b3 *man/step_center.Rd
+6c366c604cb2f68e7a111f8b3bdb9c60 *man/step_classdist.Rd
+394b31ac2a20b4fcf0bf56f7fdd0b8c7 *man/step_corr.Rd
+bfd6b88ed6071b626e0aaa1d2dad5872 *man/step_date.Rd
+b40310e394715f548de4551ef40075d2 *man/step_depth.Rd
+789c218fa954d956b7fb8a6495f2839c *man/step_dummy.Rd
+c9a545a30e1c4fdc67fc658fca33486c *man/step_holiday.Rd
+88b3dee333d3baa797f53a7ab4ff0eda *man/step_hyperbolic.Rd
+9de964d81476d8d9ce28ad8df7f12e3a *man/step_ica.Rd
+6d850c7806ca891e5517b536887152b9 *man/step_interact.Rd
+ee6bccb73237180fca3031ba995f21fd *man/step_intercept.Rd
+24b908497bd43a3169feb6116c3bd697 *man/step_invlogit.Rd
+e962c87c69f13ed59bd7bdc6dfece4df *man/step_isomap.Rd
+b02cdaf76b62616235a633ee24a1aab2 *man/step_knnimpute.Rd
+a6f5fd1f51f025b71bfdbeb704b8b532 *man/step_kpca.Rd
+fb73570f9a01562b57d45108e06c556f *man/step_lincomb.Rd
+5aa5161a76953959a64af3ae85dc4cc0 *man/step_log.Rd
+ee93793a66f19a07b4e3f12e6291b4bf *man/step_logit.Rd
+37c3ea4f2b499e8ad020fda27e98b855 *man/step_meanimpute.Rd
+5e71a0dac970cf6785256c848b688a28 *man/step_modeimpute.Rd
+ca695408fbd72e960024ed0390bcdbff *man/step_ns.Rd
+c0ce7c32ae0e1d767203953a0254242f *man/step_nzv.Rd
+4edc115bd5bb168c7fdc7481daa64e2e *man/step_ordinalscore.Rd
+8cd09494e8392133c3ddb8f69b730437 *man/step_other.Rd
+05cb3c65041b9a4f0d404dd61b2cffad *man/step_pca.Rd
+83039c32718bf7d29ac9e1f4290efd08 *man/step_poly.Rd
+cfb55c81f229dc91931d893855e0cfca *man/step_range.Rd
+73e8396ae8854b1b74f6a84c60863d89 *man/step_ratio.Rd
+0bbb2666f5ff2656c54058d208197cf1 *man/step_regex.Rd
+a9c916e21237dd9d4a5472c5d59803b9 *man/step_rm.Rd
+8a0491d1989de66843f1917b0bcf0804 *man/step_scale.Rd
+c179f90b1da50731923acbf43ba74f36 *man/step_shuffle.Rd
+84c392028f5e21b61878977f79540037 *man/step_spatialsign.Rd
+25e19d3764d33d7d9a1b8f1bbfe655cb *man/step_sqrt.Rd
+831c39230ddba9a73a271af182280195 *man/step_window.Rd
+eb57bddf88550e99c38f51e9bcaa4e47 *man/summary.recipe.Rd
+e38baff20765d630bcbc7928abc09009 *man/terms_select.Rd
+8cf19fadd73508d80b84b2cc090afb19 *tests/testthat.R
+c34d4e78a1917818a04bfaf7e16e6e0e *tests/testthat/test-basics.R
+a523863f70c4dcf66e0febc27ce0516b *tests/testthat/test_BoxCox.R
+4b822ad0c4cfe447e62e31d9e8f6270a *tests/testthat/test_YeoJohnson.R
+64db95dea31805262712a6a230ef292b *tests/testthat/test_bagimpute.R
+3e4052f71b8618f5b29a725c0611dacb *tests/testthat/test_bin2factor.R
+75ccf76b9939b0182bd02e33b20ebded *tests/testthat/test_center_scale.R
+b09f1ed5201d6cb8cc341a7c0ed98b03 *tests/testthat/test_classdist.R
+66a9f3f18e51d570b3c854642d634991 *tests/testthat/test_corr.R
+4a6ea23eec48839c16cdf090523201bf *tests/testthat/test_date.R
+b136e2dd09da39a5e0dc1ac317f89c32 *tests/testthat/test_depth.R
+c0cae095aa663ed390a6d924e135654e *tests/testthat/test_discretized.R
+3f726edca82f59e2cf6a89d8da5c3b39 *tests/testthat/test_dummies.R
+242c932ec860ab8006706551ba0bd985 *tests/testthat/test_holiday.R
+34500beb845bf3801d66374f7b90a34c *tests/testthat/test_hyperbolic.R
+8fcfbfae137e6d566d0653c93d85cddb *tests/testthat/test_ica.R
+4696734eeb3d061f29b268dd49f142fa *tests/testthat/test_interact.R
+7e43040c550d5c20874ace8c1d4224b7 *tests/testthat/test_intercept.R
+3bf3ef9c55d6330f8624a1a41c6b65ca *tests/testthat/test_invlogit.R
+e785b11b03b2f3330cdd1337db7021c5 *tests/testthat/test_isomap.R
+ff427e23f4fa0087579a8294a162e24e *tests/testthat/test_knnimpute.R
+ed377cf22537cba3a8f20f9f4b54b0d3 *tests/testthat/test_kpca.R
+cd5bfaa00f8eaaf531d7315ad15881d8 *tests/testthat/test_lincomb.R
+c7b4bed868608c462b5167c29f463d85 *tests/testthat/test_log.R
+a120bb36843cb09fce6c10017d26b6ea *tests/testthat/test_logit.R
+582553ee2340e8cb02413f8d159142d6 *tests/testthat/test_meanimpute.R
+fcc0b9942eb0e78cd2b300f9fd4fd60a *tests/testthat/test_modeimpute.R
+53c252c291bebc96796f72732f941aed *tests/testthat/test_multivariate.R
+2c5c0c444583fcf49205404465f1e3cc *tests/testthat/test_ns.R
+d625575030b50291ca162343a3202914 *tests/testthat/test_nzv.R
+d6a2eb71ef48b0be9f0a764a72c12827 *tests/testthat/test_ordinalscore.R
+1855c5f4207b55bd9733e856cfc06066 *tests/testthat/test_other.R
+fb69e8f95d495e1ba27300dbeba32da5 *tests/testthat/test_pca.R
+13a4bdccfa761ea18799e6e383e2e9ad *tests/testthat/test_poly.R
+7f2a4c88b1426abee89f4895388f424b *tests/testthat/test_range.R
+171efd750e2349d990761a975da851a0 *tests/testthat/test_ratio.R
+6f850dffc53f878b508a83bd611423c0 *tests/testthat/test_regex.R
+9d3662288e41cac84805711b37d6cf0a *tests/testthat/test_retraining.R
+144b4387e63cea64279eda09505c30c5 *tests/testthat/test_rm.R
+98ac684a28c80b3dea65c4b8aae97ee3 *tests/testthat/test_roles.R
+ad9c395ba5721a10cf7f7b76d10b354e *tests/testthat/test_roll.R
+064b0cea2ade54d8561f189606195dcd *tests/testthat/test_select_terms.R
+d3a8fe61fa8c74e5fa89fdd24897d856 *tests/testthat/test_shuffle.R
+3a56b3ae68e3b5c2acdff389d8f15678 *tests/testthat/test_spatialsign.R
+35b282d4cb1776d5f85a66564800ef50 *tests/testthat/test_sqrt.R
+5c834b276a7b36cf53a397f4f62fbe39 *tests/testthat/test_stringsAsFactors.R
+bf6a066d45d50040bed41e49fe9ca9f4 *vignettes/Custom_Steps.Rmd
+6f47a6cc05c76322a10966b345257daf *vignettes/Ordering.Rmd
+6fdc74bfa5ea87db55545b13c163c019 *vignettes/Selecting_Variables.Rmd
+e6e51ca1d9e1d605ed19253b081f8581 *vignettes/Simple_Example.Rmd
diff --git a/NAMESPACE b/NAMESPACE
new file mode 100644
index 0000000..5248fd5
--- /dev/null
+++ b/NAMESPACE
@@ -0,0 +1,233 @@
+# Generated by roxygen2: do not edit by hand
+
+S3method(bake,recipe)
+S3method(bake,step_BoxCox)
+S3method(bake,step_YeoJohnson)
+S3method(bake,step_bagimpute)
+S3method(bake,step_classdist)
+S3method(bake,step_corr)
+S3method(bake,step_date)
+S3method(bake,step_depth)
+S3method(bake,step_discretize)
+S3method(bake,step_dummy)
+S3method(bake,step_holiday)
+S3method(bake,step_hyperbolic)
+S3method(bake,step_ica)
+S3method(bake,step_interact)
+S3method(bake,step_invlogit)
+S3method(bake,step_isomap)
+S3method(bake,step_knnimpute)
+S3method(bake,step_kpca)
+S3method(bake,step_lincomb)
+S3method(bake,step_log)
+S3method(bake,step_logit)
+S3method(bake,step_meanimpute)
+S3method(bake,step_modeimpute)
+S3method(bake,step_ns)
+S3method(bake,step_nzv)
+S3method(bake,step_ordinalscore)
+S3method(bake,step_other)
+S3method(bake,step_pca)
+S3method(bake,step_poly)
+S3method(bake,step_range)
+S3method(bake,step_ratio)
+S3method(bake,step_rm)
+S3method(bake,step_scale)
+S3method(bake,step_shuffle)
+S3method(bake,step_spatialsign)
+S3method(bake,step_sqrt)
+S3method(bake,step_window)
+S3method(discretize,numeric)
+S3method(predict,discretize)
+S3method(prep,recipe)
+S3method(prep,step_BoxCox)
+S3method(prep,step_YeoJohnson)
+S3method(prep,step_bagimpute)
+S3method(prep,step_bin2factor)
+S3method(prep,step_classdist)
+S3method(prep,step_corr)
+S3method(prep,step_date)
+S3method(prep,step_depth)
+S3method(prep,step_discretize)
+S3method(prep,step_dummy)
+S3method(prep,step_holiday)
+S3method(prep,step_hyperbolic)
+S3method(prep,step_ica)
+S3method(prep,step_interact)
+S3method(prep,step_invlogit)
+S3method(prep,step_isomap)
+S3method(prep,step_knnimpute)
+S3method(prep,step_kpca)
+S3method(prep,step_lincomb)
+S3method(prep,step_log)
+S3method(prep,step_logit)
+S3method(prep,step_meanimpute)
+S3method(prep,step_modeimpute)
+S3method(prep,step_ns)
+S3method(prep,step_nzv)
+S3method(prep,step_ordinalscore)
+S3method(prep,step_other)
+S3method(prep,step_pca)
+S3method(prep,step_poly)
+S3method(prep,step_range)
+S3method(prep,step_ratio)
+S3method(prep,step_regex)
+S3method(prep,step_rm)
+S3method(prep,step_scale)
+S3method(prep,step_shuffle)
+S3method(prep,step_spatialsign)
+S3method(prep,step_sqrt)
+S3method(prep,step_window)
+S3method(print,discretize)
+S3method(print,recipe)
+S3method(recipe,data.frame)
+S3method(recipe,default)
+S3method(recipe,formula)
+S3method(recipe,matrix)
+S3method(summary,recipe)
+export("%>%")
+export(add_role)
+export(add_step)
+export(all_nominal)
+export(all_numeric)
+export(all_outcomes)
+export(all_predictors)
+export(bake)
+export(current_info)
+export(denom_vars)
+export(discretize)
+export(estimate_yj)
+export(has_role)
+export(has_type)
+export(imp_vars)
+export(juice)
+export(names0)
+export(prep)
+export(prepare)
+export(recipe)
+export(step)
+export(step_BoxCox)
+export(step_YeoJohnson)
+export(step_bagimpute)
+export(step_bin2factor)
+export(step_center)
+export(step_classdist)
+export(step_corr)
+export(step_date)
+export(step_depth)
+export(step_discretize)
+export(step_dummy)
+export(step_holiday)
+export(step_hyperbolic)
+export(step_ica)
+export(step_interact)
+export(step_intercept)
+export(step_invlogit)
+export(step_isomap)
+export(step_knnimpute)
+export(step_kpca)
+export(step_lincomb)
+export(step_log)
+export(step_logit)
+export(step_meanimpute)
+export(step_modeimpute)
+export(step_ns)
+export(step_nzv)
+export(step_ordinalscore)
+export(step_other)
+export(step_pca)
+export(step_poly)
+export(step_range)
+export(step_ratio)
+export(step_regex)
+export(step_rm)
+export(step_scale)
+export(step_shuffle)
+export(step_spatialsign)
+export(step_sqrt)
+export(step_window)
+export(terms_select)
+export(yj_trans)
+import(rlang)
+import(timeDate)
+importFrom(RcppRoll,roll_max)
+importFrom(RcppRoll,roll_maxl)
+importFrom(RcppRoll,roll_maxr)
+importFrom(RcppRoll,roll_mean)
+importFrom(RcppRoll,roll_meanl)
+importFrom(RcppRoll,roll_meanr)
+importFrom(RcppRoll,roll_median)
+importFrom(RcppRoll,roll_medianl)
+importFrom(RcppRoll,roll_medianr)
+importFrom(RcppRoll,roll_min)
+importFrom(RcppRoll,roll_minl)
+importFrom(RcppRoll,roll_minr)
+importFrom(RcppRoll,roll_prod)
+importFrom(RcppRoll,roll_prodl)
+importFrom(RcppRoll,roll_prodr)
+importFrom(RcppRoll,roll_sd)
+importFrom(RcppRoll,roll_sdl)
+importFrom(RcppRoll,roll_sdr)
+importFrom(RcppRoll,roll_sum)
+importFrom(RcppRoll,roll_suml)
+importFrom(RcppRoll,roll_sumr)
+importFrom(RcppRoll,roll_var)
+importFrom(RcppRoll,roll_varl)
+importFrom(RcppRoll,roll_varr)
+importFrom(ddalpha,depth.Mahalanobis)
+importFrom(ddalpha,depth.halfspace)
+importFrom(ddalpha,depth.potential)
+importFrom(ddalpha,depth.projection)
+importFrom(ddalpha,depth.simplicial)
+importFrom(ddalpha,depth.simplicialVolume)
+importFrom(ddalpha,depth.spatial)
+importFrom(ddalpha,depth.zonoid)
+importFrom(dimRed,FastICA)
+importFrom(dimRed,dimRedData)
+importFrom(dimRed,embed)
+importFrom(dimRed,kPCA)
+importFrom(dplyr,filter)
+importFrom(dplyr,full_join)
+importFrom(dplyr,left_join)
+importFrom(gower,gower_topn)
+importFrom(ipred,ipredbagg)
+importFrom(lubridate,decimal_date)
+importFrom(lubridate,is.Date)
+importFrom(lubridate,month)
+importFrom(lubridate,quarter)
+importFrom(lubridate,semester)
+importFrom(lubridate,wday)
+importFrom(lubridate,week)
+importFrom(lubridate,yday)
+importFrom(lubridate,year)
+importFrom(magrittr,"%>%")
+importFrom(purrr,map)
+importFrom(purrr,map_chr)
+importFrom(purrr,map_if)
+importFrom(purrr,map_lgl)
+importFrom(rlang,expr)
+importFrom(rlang,f_lhs)
+importFrom(rlang,is_empty)
+importFrom(rlang,names2)
+importFrom(rlang,quos)
+importFrom(splines,ns)
+importFrom(stats,as.formula)
+importFrom(stats,binomial)
+importFrom(stats,complete.cases)
+importFrom(stats,cor)
+importFrom(stats,cov)
+importFrom(stats,mahalanobis)
+importFrom(stats,model.frame)
+importFrom(stats,model.matrix)
+importFrom(stats,optimize)
+importFrom(stats,poly)
+importFrom(stats,prcomp)
+importFrom(stats,predict)
+importFrom(stats,quantile)
+importFrom(stats,sd)
+importFrom(stats,terms)
+importFrom(stats,var)
+importFrom(tibble,add_column)
+importFrom(tibble,as_tibble)
+importFrom(tibble,is_tibble)
+importFrom(tibble,tibble)
diff --git a/NEWS.md b/NEWS.md
new file mode 100644
index 0000000..db327c5
--- /dev/null
+++ b/NEWS.md
@@ -0,0 +1,29 @@
+# recipes 0.1.0
+
+First CRAN release. 
+
+* Changed `prepare` to `prep` per [issue #59](https://github.com/topepo/recipes/issues/59)
+
+# recipes 0.0.1.9003
+
+ * Two of the main functions [changed names](https://github.com/topepo/recipes/issues/57). `learn` has become `prepare` and `process` has become `bake`
+
+
+# recipes 0.0.1.9002
+
+New steps:
+
+  * `step_lincomb` removes variables involved in linear combinations to resolve them. 
+  * A step for converting binary variables to factors (`step_bin2factor`)
+  *  `step_regex` applies a regular expression to a character or factor vector to create dummy variables. 
+
+Other changes: 
+
+* `step_dummy` and `step_interact` do a better job of respecting missing values in the data set. 
+
+
+# recipes 0.0.1.9001
+
+* The class system for `recipe` objects was changed so that [pipes can be used to create the recipe with a formula](https://github.com/topepo/recipes/issues/46).
+* `process.recipe` lost the `role` argument in factor of a general set of [selectors](https://topepo.github.io/recipes/articles/Selecting_Variables.html). If no selector is used, all the predictors are returned. 
+* Two steps for simple imputation using the mean or mode were added. 
diff --git a/R/BoxCox.R b/R/BoxCox.R
new file mode 100644
index 0000000..1c46111
--- /dev/null
+++ b/R/BoxCox.R
@@ -0,0 +1,180 @@
+#' Box-Cox Transformation for Non-Negative Data
+#'
+#' \code{step_BoxCox} creates a \emph{specification} of a recipe step that will
+#'    transform data using a simple Box-Cox transformation.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param lambdas A numeric vector of transformation values. This is
+#'   \code{NULL} until computed by \code{\link{prep.recipe}}.
+#' @param limits A length 2 numeric vector defining the range to compute the
+#'   transformation parameter lambda.
+#' @param nunique An integer where data that have less possible values will
+#'   not be evaluate for a transformation.
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @details The Box-Cox transformation, which requires a strictly positive
+#'   variable, can be used to rescale a variable to be more similar to a
+#'  normal distribution. In this package, the partial log-likelihood function
+#'  is directly optimized within a reasonable set of transformation values
+#'  (which can be changed by the user).
+#'
+#' This transformation is typically done on the outcome variable using the
+#'   residuals for a statistical model (such as ordinary least squares).
+#'   Here, a simple null model (intercept only) is used to apply the
+#'   transformation to the \emph{predictor} variables individually. This can
+#'   have the effect of making the variable distributions more symmetric.
+#'
+#' If the transformation parameters are estimated to be very closed to the
+#'   bounds, or if the optimization fails, a value of \code{NA} is used and
+#'   no transformation is applied.
+#'
+#' @references Sakia, R. M. (1992). The Box-Cox transformation technique:
+#'   A review. \emph{The Statistician}, 169-178..
+#' @examples
+#'
+#' rec <- recipe(~ ., data = as.data.frame(state.x77))
+#'
+#' bc_trans <- step_BoxCox(rec, all_numeric())
+#'
+#' bc_estimates <- prep(bc_trans, training = as.data.frame(state.x77))
+#'
+#' bc_data <- bake(bc_estimates, as.data.frame(state.x77))
+#'
+#' plot(density(state.x77[, "Illiteracy"]), main = "before")
+#' plot(density(bc_data$Illiteracy), main = "after")
+#' @seealso \code{\link{step_YeoJohnson}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+step_BoxCox <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           lambdas = NULL,
+           limits = c(-5, 5),
+           nunique = 5) {
+    add_step(
+      recipe,
+      step_BoxCox_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        lambdas = lambdas,
+        limits = sort(limits)[1:2],
+        nunique = nunique
+      )
+    )
+  }
+
+step_BoxCox_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           lambdas = NULL,
+           limits = NULL,
+           nunique = NULL) {
+    step(
+      subclass = "BoxCox",
+      terms = terms,
+      role = role,
+      trained = trained,
+      lambdas = lambdas,
+      limits = limits,
+      nunique = nunique
+    )
+  }
+
+#' @export
+prep.step_BoxCox <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  values <- vapply(
+    training[, col_names],
+    estimate_bc,
+    c(lambda = 0),
+    limits = x$limits,
+    nunique = x$nunique
+  )
+  values <- values[!is.na(values)]
+  step_BoxCox_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    lambdas = values,
+    limits = x$limits,
+    nunique = x$nunique
+  )
+}
+
+#' @export
+bake.step_BoxCox <- function(object, newdata, ...) {
+  if (length(object$lambdas) == 0)
+    return(as_tibble(newdata))
+  param <- names(object$lambdas)
+  for (i in seq_along(object$lambdas))
+    newdata[, param[i]] <-
+    bc_trans(getElement(newdata, param[i]), lambda = object$lambdas[i])
+  as_tibble(newdata)
+}
+
+print.step_BoxCox <-
+  function(x, width = max(20, options()$width - 35), ...) {
+    cat("Box-Cox transformation on ", sep = "")
+    printer(names(x$lambdas), x$terms, x$trained, width = width)
+    invisible(x)
+  }
+
+## computes the new data
+bc_trans <- function(x, lambda, eps = .001) {
+  if (is.na(lambda))
+    return(x)
+  if (abs(lambda) < eps)
+    log(x)
+  else
+    (x ^ lambda - 1) / lambda
+}
+
+## helper for the log-likelihood calc
+
+#' @importFrom stats var
+ll_bc <- function(lambda, y, gm, eps = .001) {
+  n <- length(y)
+  gm0 <- gm ^ (lambda - 1)
+  z <- if (abs(lambda) <= eps)
+    log(y) / gm0
+  else
+    (y ^ lambda - 1) / (lambda * gm0)
+  var_z <- var(z) * (n - 1) / n 
+  - .5 * n * log(var_z)
+}
+
+#' @importFrom stats complete.cases
+## eliminates missing data and returns -llh
+bc_obj <- function(lam, dat) {
+  dat <- dat[complete.cases(dat)]
+  geo_mean <- exp(mean(log(dat)))
+  ll_bc(lambda = lam, y = dat, gm = geo_mean)
+}
+
+#' @importFrom stats optimize
+## estimates the values
+estimate_bc <- function(dat,
+                        limits = c(-5, 5),
+                        nunique = 5) {
+  eps <- .001
+  if (length(unique(dat)) < nunique |
+      any(dat[complete.cases(dat)] <= 0))
+    return(NA)
+  res <- optimize(
+    bc_obj,
+    interval = limits,
+    maximum = TRUE,
+    dat = dat,
+    tol = .0001
+  )
+  lam <- res$maximum
+  if (abs(limits[1] - lam) <= eps | abs(limits[2] - lam) <= eps)
+    lam <- NA
+  lam
+}
diff --git a/R/YeoJohnson.R b/R/YeoJohnson.R
new file mode 100644
index 0000000..d459072
--- /dev/null
+++ b/R/YeoJohnson.R
@@ -0,0 +1,209 @@
+#' Yeo-Johnson Transformation
+#'
+#' \code{step_YeoJohnson} creates a \emph{specification} of a recipe step that
+#'   will transform data using a simple Yeo-Johnson transformation.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param lambdas A numeric vector of transformation values. This is
+#'   \code{NULL} until computed by \code{\link{prep.recipe}}.
+#' @param limits A length 2 numeric vector defining the range to compute the
+#'   transformation parameter lambda.
+#' @param nunique An integer where data that have less possible values will
+#'   not be evaluate for a transformation
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @details The Yeo-Johnson transformation is very similar to the Box-Cox but
+#'   does not require the input variables to be strictly positive. In the
+#'   package, the partial log-likelihood function is directly optimized within
+#'   a reasonable set of transformation values (which can be changed by the
+#'   user).
+#'
+#' This transformation is typically done on the outcome variable using the
+#'   residuals for a statistical model (such as ordinary least squares). Here,
+#'   a simple null model (intercept only) is used to apply the transformation
+#'   to the \emph{predictor} variables individually. This can have the effect
+#'   of making the variable distributions more symmetric.
+#'
+#' If the transformation parameters are estimated to be very closed to the
+#'   bounds, or if the optimization fails, a value of \code{NA} is used and
+#'   no transformation is applied.
+#'
+#' @references Yeo, I. K., and Johnson, R. A. (2000). A new family of power
+#'   transformations to improve normality or symmetry. \emph{Biometrika}.
+#' @examples
+#'
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' yj_trans <- step_YeoJohnson(rec,  all_numeric())
+#'
+#' yj_estimates <- prep(yj_trans, training = biomass_tr)
+#'
+#' yj_te <- bake(yj_estimates, biomass_te)
+#'
+#' plot(density(biomass_te$sulfur), main = "before")
+#' plot(density(yj_te$sulfur), main = "after")
+#' @seealso \code{\link{step_BoxCox}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+step_YeoJohnson <-
+  function(recipe, ..., role = NA, trained = FALSE,
+           lambdas = NULL, limits = c(-5, 5), nunique = 5) {
+    add_step(
+      recipe,
+      step_YeoJohnson_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        lambdas = lambdas,
+        limits = sort(limits)[1:2],
+        nunique = nunique
+      )
+    )
+  }
+
+step_YeoJohnson_new <-
+  function(terms = NULL, role = NA, trained = FALSE,
+           lambdas = NULL, limits = NULL, nunique = NULL) {
+    step(
+      subclass = "YeoJohnson",
+      terms = terms,
+      role = role,
+      trained = trained,
+      lambdas = lambdas,
+      limits = limits,
+      nunique = nunique
+    )
+  }
+
+#' @export
+prep.step_YeoJohnson <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  values <- vapply(
+    training[, col_names],
+    estimate_yj,
+    c(lambda = 0),
+    limits = x$limits,
+    nunique = x$nunique
+  )
+  values <- values[!is.na(values)]
+  step_YeoJohnson_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    lambdas = values,
+    limits = x$limits,
+    nunique = x$nunique
+  )
+}
+
+#' @export
+bake.step_YeoJohnson <- function(object, newdata, ...) {
+  if (length(object$lambdas) == 0)
+    return(as_tibble(newdata))
+  param <- names(object$lambdas)
+  for (i in seq_along(object$lambdas))
+    newdata[, param[i]] <-
+    yj_trans(getElement(newdata, param[i]),
+             lambda = object$lambdas[param[i]])
+  as_tibble(newdata)
+}
+
+print.step_YeoJohnson <-
+  function(x, width = max(20, options()$width - 39), ...) {
+    cat("Yeo-Johnson transformation on ", sep = "")
+    printer(names(x$lambdas), x$terms, x$trained, width = width)
+    invisible(x)
+  }
+
+## computes the new data given a lambda
+#' Internal Functions
+#' 
+#' These are not to be used directly by the users.
+#' @export
+#' @keywords internal
+#' @rdname recipes-internal
+yj_trans <- function(x, lambda, eps = .001) {
+  if (is.na(lambda))
+    return(x)
+  if (!inherits(x, "tbl_df") || is.data.frame(x)) {
+    x <- unlist(x, use.names = FALSE)
+  } else {
+    if (!is.vector(x))
+      x <- as.vector(x)
+  }
+  
+  not_neg <- x >= 0
+  
+  nn_trans <- function(x, lambda)
+    if (abs(lambda) < eps)
+      log(x + 1)
+  else
+    ((x + 1) ^ lambda - 1) / lambda
+  
+  ng_trans <- function(x, lambda)
+    if (abs(lambda - 2) < eps)
+      - log(-x + 1)
+  else
+    - ((-x + 1) ^ (2 - lambda) - 1) / (2 - lambda)
+  
+  if (any(not_neg))
+    x[not_neg] <- nn_trans(x[not_neg], lambda)
+  
+  if (any(!not_neg))
+    x[!not_neg] <- ng_trans(x[!not_neg], lambda)
+  x
+}
+
+
+## Helper for the log-likelihood calc for eq 3.1 of Yeo, I. K.,
+## & Johnson, R. A. (2000). A new family of power transformations
+## to improve normality or symmetry. Biometrika. page 957
+
+#' @importFrom stats var
+ll_yj <- function(lambda, y, eps = .001) {
+  n <- length(y)
+  nonneg <- all(y > 0)
+  y_t <- yj_trans(y, lambda)
+  mu_t <- mean(y_t)
+  var_t <- var(y_t) * (n - 1) / n
+  const <- sum(sign(y) * log(abs(y) + 1))
+  res <- -.5 * n * log(var_t) + (lambda - 1) * const
+  res
+}
+
+#' @importFrom  stats complete.cases
+## eliminates missing data and returns -llh
+yj_obj <- function(lam, dat){
+  dat <- dat[complete.cases(dat)]
+  ll_yj(lambda = lam, y = dat)
+}
+
+## estimates the values
+#' @importFrom stats optimize
+#' @export
+#' @keywords internal
+#' @rdname recipes-internal
+estimate_yj <- function(dat, limits = c(-5, 5), nunique = 5) {
+  eps <- .001
+  if (length(unique(dat)) < nunique)
+    return(NA)
+  res <- optimize(
+    yj_obj,
+    interval = limits,
+    maximum = TRUE,
+    dat = dat,
+    tol = .0001
+  )
+  lam <- res$maximum
+  if (abs(limits[1] - lam) <= eps | abs(limits[2] - lam) <= eps)
+    lam <- NA
+  lam
+}
diff --git a/R/bag_imp.R b/R/bag_imp.R
new file mode 100644
index 0000000..a5aa0d2
--- /dev/null
+++ b/R/bag_imp.R
@@ -0,0 +1,205 @@
+#' Imputation via Bagged Trees
+#'
+#' \code{step_bagimpute} creates a \emph{specification} of a recipe step that 
+#'   will create bagged tree models to impute missing data.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose variables. For 
+#'   \code{step_bagimpute}, this indicates the variables to be imputed. When 
+#'   used with \code{imp_vars}, the dots indicates which variables are used to 
+#'   predict the missing data in each variable. See \code{\link{selections}} 
+#'   for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param impute_with A call to \code{imp_vars} to specify which variables are 
+#'   used to impute the variables that can inlcude specific variable names 
+#'   seperated by commas or different selectors (see 
+#'   \code{\link{selections}}).  If a column is included in both lists to be 
+#'   imputed and to be an imputation predictor, it will be removed from the 
+#'   latter and not used to impute itself.
+#' @param options A list of options to \code{\link[ipred]{ipredbagg}}. Defaults 
+#'   are set for the arguments \code{nbagg} and \code{keepX} but others can be 
+#'   passed in. \bold{Note} that the arguments \code{X} and \code{y} should not 
+#'   be passed here.
+#' @param seed_val A integer used to create reproducible models. The same seed 
+#'   is used across all imputation models.
+#' @param models The \code{\link[ipred]{ipredbagg}} objects are stored here 
+#'   once this bagged trees have be trained by \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing imputation
+#' @export
+#' @details For each variables requiring imputation, a bagged tree is created 
+#'   where the outcome is the variable of interest and the predictors are any 
+#'   other variables listed in the \code{impute_with} formula. One advantage to 
+#'   the bagged tree is that is can accept predictors that have missing values 
+#'   themselves. This imputation method can be used when the variable of 
+#'   interest (and predictors) are numeric or categorical. Imputed categorical 
+#'   variables will remain categorical.
+#'
+#' Note that if a variable that is to be imputed is also in \code{impute_with}, 
+#'   this variable will be ignored.
+#'
+#' It is possible that missing values will still occur after imputation if a 
+#'   large majority (or all) of the imputing variables are also missing.
+#' @references Kuhn, M. and Johnson, K. (2013). 
+#'   \emph{Applied Predictive Modeling}. Springer Verlag.
+#' @examples
+#' data("credit_data")
+#'
+#' ## missing data per column
+#' vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
+#'
+#' set.seed(342)
+#' in_training <- sample(1:nrow(credit_data), 2000)
+#'
+#' credit_tr <- credit_data[ in_training, ]
+#' credit_te <- credit_data[-in_training, ]
+#' missing_examples <- c(14, 394, 565)
+#'
+#' rec <- recipe(Price ~ ., data = credit_tr)
+#'
+#' impute_rec <- rec %>%
+#'   step_bagimpute(Status, Home, Marital, Job, Income, Assets, Debt)
+#'
+#' imp_models <- prep(impute_rec, training = credit_tr)
+#'
+#' imputed_te <- bake(imp_models, newdata = credit_te, everything())
+#'
+#' credit_te[missing_examples,]
+#' imputed_te[missing_examples, names(credit_te)]
+
+
+step_bagimpute <- 
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           models = NULL,
+           options = list(nbagg = 25, keepX = FALSE),
+           impute_with = imp_vars(all_predictors()),
+           seed_val = sample.int(10 ^ 4, 1)) {
+    if (is.null(impute_with))
+      stop("Please list some variables in `impute_with`", call. = FALSE)
+    add_step(
+      recipe,
+      step_bagimpute_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        models = models,
+        options = options,
+        impute_with = impute_with,
+        seed_val = seed_val
+      )
+    )
+}
+
+step_bagimpute_new <- 
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           models = NULL,
+           options = NULL,
+           impute_with = NULL,
+           seed_val = NA) {
+  step(
+    subclass = "bagimpute",
+    terms = terms,
+    role = role,
+    trained = trained,
+    models = models,
+    options = options,
+    impute_with = impute_with,
+    seed_val = seed_val
+  )
+}
+
+
+#' @importFrom ipred ipredbagg
+bag_wrap <- function(vars, dat, opt, seed_val) {
+  seed_val <- seed_val[1]
+  dat <- as.data.frame(dat[, c(vars$y, vars$x)])
+  if (!is.null(seed_val) && !is.na(seed_val))
+    set.seed(seed_val)
+  
+  out <- do.call("ipredbagg",
+                 c(list(y = dat[, vars$y],
+                        X = dat[, vars$x, drop = FALSE]),
+                   opt))
+  out$..imp_vars <- vars$x
+  out
+}
+
+## This figures out which data should be used to predict each variable 
+## scheduled for imputation
+impute_var_lists <- function(to_impute, impute_using, info) {
+  to_impute <- terms_select(terms = to_impute, info = info)
+  impute_using <- terms_select(terms = impute_using, info = info)
+  var_lists <- vector(mode = "list", length = length(to_impute))
+  for (i in seq_along(var_lists)) {
+    var_lists[[i]] <- list(y = to_impute[i],
+                           x = impute_using[!(impute_using %in% to_impute[i])])
+  }
+  var_lists
+}
+
+#' @export
+prep.step_bagimpute <- function(x, training, info = NULL, ...) {
+  var_lists <-
+    impute_var_lists(
+      to_impute = x$terms,
+      impute_using = x$impute_with,
+      info = info
+    )
+  x$models <- lapply(
+    var_lists,
+    bag_wrap,
+    dat = training,
+    opt = x$options,
+    seed_val = x$seed_val
+  )
+  names(x$models) <- vapply(var_lists, function(x)
+    x$y, c(""))
+  x$trained <- TRUE
+  x
+}
+
+#' @importFrom tibble as_tibble
+#' @importFrom stats predict complete.cases
+#' @export
+bake.step_bagimpute <- function(object, newdata, ...) {
+  missing_rows <- !complete.cases(newdata)
+  if (!any(missing_rows))
+    return(newdata)
+  
+  old_data <- newdata
+  for (i in seq(along = object$models)) {
+    imp_var <- names(object$models)[i]
+    missing_rows <- !complete.cases(newdata[, imp_var])
+    if (any(missing_rows)) {
+      preds <- object$models[[i]]$..imp_vars
+      pred_data <- old_data[missing_rows, preds, drop = FALSE]
+      ## do a better job of checking this:
+      if (all(is.na(pred_data))) {
+        warning("All predictors are missing; cannot impute", call. = FALSE)
+      } else {
+        pred_vals <- predict(object$models[[i]], pred_data)
+        newdata[missing_rows, imp_var] <- pred_vals
+      }
+    }
+  }
+  ## changes character to factor!
+  as_tibble(newdata)
+}
+
+
+print.step_bagimpute <-
+  function(x, width = max(20, options()$width - 31), ...) {
+    cat("Bagged tree imputation for ", sep = "")
+    printer(names(x$models), x$terms, x$trained, width = width)
+    invisible(x)
+  }
+
+#' @export
+#' @rdname step_bagimpute
+imp_vars <- function(...) quos(...)
diff --git a/R/bin2factor.R b/R/bin2factor.R
new file mode 100644
index 0000000..0dbce4b
--- /dev/null
+++ b/R/bin2factor.R
@@ -0,0 +1,106 @@
+#' Create a Factors from A Dummy Variable
+#'
+#' \code{step_bin2factor} creates a \emph{specification} of a recipe step that
+#' will create a two-level factor from a single dummy variable.
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... Selector functions that choose which variables will be converted.
+#'   See \code{\link{selections}} for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param levels A length 2 character string that indicate the factor levels
+#' for the 1's (in the first position) and the zeros (second)
+#' @param columns A vector with the selected variable names. This is
+#' \code{NULL} until computed by \code{\link{prep.recipe}}.
+#' @details This operation may be useful for situations where a binary piece of
+#'   information may need to be represented as categorical instead of numeric.
+#'   For example, naive Bayes models would do better to have factor predictors
+#'   so that the binomial distribution is modeled in stead of a Gaussian
+#'   probability density of numeric binary data.
+#' Note that the numeric data is only verified to be numeric (and does not
+#' count levels).
+#' @keywords datagen
+#' @concept preprocessing dummy_variables factors
+#' @export
+#' @examples
+#' data(covers)
+#'
+#' rec <- recipe(~ description, covers) %>%
+#'  step_regex(description, pattern = "(rock|stony)", result = "rocks") %>%
+#'  step_regex(description, pattern = "(rock|stony)", result = "more_rocks") %>%
+#'  step_bin2factor(rocks)
+#'
+#' rec <- prep(rec, training = covers)
+#' results <- bake(rec, newdata = covers)
+#'
+#' table(results$rocks, results$more_rocks)
+step_bin2factor <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           levels = c("yes", "no"),
+           columns = NULL) {
+    if (length(levels) != 2 | !is.character(levels))
+      stop("`levels` should be a two element character string", call. = FALSE)
+    add_step(
+      recipe,
+      step_bin2factor_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        levels = levels,
+        columns = columns
+      )
+    )
+  }
+
+step_bin2factor_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           levels = NULL,
+           columns = NULL) {
+    step(
+      subclass = "bin2factor",
+      terms = terms,
+      role = role,
+      trained = trained,
+      levels = levels,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_bin2factor <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  if (length(col_names) < 1)
+    stop("The selector should only select at least one variable")
+  if (any(info$type[info$variable %in% col_names] != "numeric"))
+    stop("The variables should be numeric")
+  step_bin2factor_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    levels = x$levels,
+    columns = col_names
+  )
+}
+
+bake.step_bin2factor <- function(object, newdata, ...) {
+  for (i in seq_along(object$columns))
+    newdata[, object$columns[i]] <-
+      factor(ifelse(
+        getElement(newdata, object$columns[i]) == 1,
+        object$levels[1],
+        object$levels[2]
+      ),
+      levels = object$levels)
+  newdata
+}
+
+print.step_bin2factor <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Dummy variable to factor conversion for ", sep = "")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/center.R b/R/center.R
new file mode 100644
index 0000000..09314ff
--- /dev/null
+++ b/R/center.R
@@ -0,0 +1,112 @@
+#' Centering Numeric Data
+#'
+#' \code{step_center} creates a \emph{specification} of a recipe step that 
+#'   will normalize numeric data to have a mean of zero.
+#'
+#' @param recipe A recipe object. The step will be added to the sequence of 
+#'   operations for this recipe.
+#' @param ... One or more selector functions to choose which variables are 
+#'   affected by the step. See \code{\link{selections}} for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param trained A logical to indicate if the quantities for preprocessing 
+#'   have been estimated.
+#' @param means A named numeric vector of means. This is \code{NULL} until 
+#'   computed by \code{\link{prep.recipe}}.
+#' @param na.rm A logical value indicating whether \code{NA} values should be 
+#'   removed when averaging.
+#' @return An updated version of \code{recipe} with the
+#'   new step added to the sequence of existing steps (if any). 
+#' @keywords datagen
+#' @concept preprocessing normalization_methods
+#' @export
+#' @details Centering data means that the average of a variable is subtracted 
+#'   from the data. \code{step_center} estimates the variable means from the 
+#'   data used in the \code{training} argument of \code{prep.recipe}. 
+#'   \code{bake.recipe} then applies the centering to new data sets using 
+#'   these means.
+#'
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' center_trans <- rec %>%
+#'   step_center(carbon, contains("gen"), -hydrogen)
+#'
+#' center_obj <- prep(center_trans, training = biomass_tr)
+#'
+#' transformed_te <- bake(center_obj, biomass_te)
+#'
+#' biomass_te[1:10, names(transformed_te)]
+#' transformed_te
+#' @seealso \code{\link{recipe}} \code{\link{prep.recipe}} 
+#'   \code{\link{bake.recipe}}
+step_center <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           means = NULL,
+           na.rm = TRUE) {
+    add_step(
+      recipe,
+      step_center_new(
+        terms = check_ellipses(...),
+        trained = trained,
+        role = role,
+        means = means,
+        na.rm = na.rm
+      )
+    )
+  }
+
+## Initializes a new object
+step_center_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           means = NULL,
+           na.rm = NULL) {
+    step(
+      subclass = "center",
+      terms = terms,
+      role = role,
+      trained = trained,
+      means = means,
+      na.rm = na.rm
+    )
+  }
+
+prep.step_center <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  means <-
+    vapply(training[, col_names], mean, c(mean = 0), na.rm = x$na.rm)
+  step_center_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    means = means,
+    na.rm = x$na.rm
+  )
+}
+
+bake.step_center <- function(object, newdata, ...) {
+  res <-
+    sweep(as.matrix(newdata[, names(object$means)]), 2, object$means, "-")
+  if (is.matrix(res) && ncol(res) == 1)
+    res <- res[, 1]
+  newdata[, names(object$means)] <- res
+  as_tibble(newdata)
+}
+
+print.step_center <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Centering for ", sep = "")
+    printer(names(x$means), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/classdist.R b/R/classdist.R
new file mode 100644
index 0000000..3844888
--- /dev/null
+++ b/R/classdist.R
@@ -0,0 +1,192 @@
+#' Distances to Class Centroids
+#'
+#' \code{step_classdist} creates a a \emph{specification} of a recipe step
+#'   that will convert numeric data into Mahalanobis distance measurements to
+#'   the data centroid. This is done for each value of a categorical class
+#'   variable.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param class A single character string that specifies a single categorical
+#'   variable to be used as the class.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that resulting
+#'   distances will be used as predictors in a model.
+#' @param mean_func A function to compute the center of the distribution.
+#' @param cov_func A function that computes the covariance matrix
+#' @param pool A logical: should the covariance matrix be computed by pooling
+#'   the data for all of the classes?
+#' @param log A logical: should the distances be transformed by the natural
+#'   log function?
+#' @param objects Statistics are stored here once this step has been trained
+#'   by \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing dimension_reduction
+#' @export
+#' @details \code{step_classdist} will create a
+#'
+#' The function will create a new column for every unique value of the
+#'   \code{class} variable. The resulting variables will not replace the
+#'   original values and have the prefix \code{classdist_}.
+#'
+#' Note that, by default,  the default covariance function requires that each
+#'   class should have at least as many rows as variables listed in the
+#'   \code{terms} argument. If \code{pool = TRUE}, there must be at least as
+#'   many data points are variables overall.
+#' @examples
+#'
+#' # in case of missing data...
+#' mean2 <- function(x) mean(x, na.rm = TRUE)
+#'
+#' rec <- recipe(Species ~ ., data = iris) %>%
+#'   step_classdist(all_predictors(), class = "Species",
+#'                  pool = FALSE, mean_func = mean2)
+#'
+#' rec_dists <- prep(rec, training = iris)
+#'
+#' dists_to_species <- bake(rec_dists, newdata = iris, everything())
+#' ## on log scale:
+#' dist_cols <- grep("classdist", names(dists_to_species), value = TRUE)
+#' dists_to_species[, c("Species", dist_cols)]
+#' @importFrom stats cov
+step_classdist <- function(recipe,
+                           ...,
+                           class,
+                           role = "predictor",
+                           trained = FALSE,
+                           mean_func = mean,
+                           cov_func = cov,
+                           pool = FALSE,
+                           log = TRUE,
+                           objects = NULL) {
+  if (!is.character(class) || length(class) != 1)
+    stop("`class` should be a single character value.")
+  add_step(
+    recipe,
+    step_classdist_new(
+      terms = check_ellipses(...),
+      class = class,
+      role = role,
+      trained = trained,
+      mean_func = mean_func,
+      cov_func = cov_func,
+      pool = pool,
+      log = log,
+      objects = objects
+    )
+  )
+}
+
+step_classdist_new <-
+  function(terms = NULL,
+           class = NULL,
+           role = "predictor",
+           trained = FALSE,
+           mean_func = NULL,
+           cov_func = NULL,
+           pool = NULL,
+           log = NULL,
+           objects = NULL) {
+    step(
+      subclass = "classdist",
+      terms = terms,
+      class = class,
+      role = role,
+      trained = trained,
+      mean_func = mean_func,
+      cov_func = cov_func,
+      pool = pool,
+      log = log,
+      objects = objects
+    )
+  }
+
+get_center <- function(x, mfun = mean) {
+  apply(x, 2, mfun)
+}
+get_both <- function(x, mfun = mean, cfun = cov) {
+  list(center = get_center(x, mfun),
+       scale = cfun(x))
+}
+
+
+#' @importFrom stats as.formula model.frame
+#' @export
+prep.step_classdist <- function(x, training, info = NULL, ...) {
+  class_var <- x$class[1]
+  x_names <- terms_select(x$terms, info = info)
+  x_dat <-
+    split(training[, x_names], getElement(training, class_var))
+  if (x$pool) {
+    res <- list(
+      center = lapply(x_dat, get_center, mfun = x$mean_func),
+      scale = x$cov_func(training[, x_names])
+    )
+    
+  } else {
+    res <-
+      lapply(x_dat,
+             get_both,
+             mfun = x$mean_func,
+             cfun = x$cov_func)
+  }
+  step_classdist_new(
+    terms = x$terms,
+    class = x$class,
+    role = x$role,
+    trained = TRUE,
+    mean_func = x$mean_func,
+    cov_func = x$cov_func,
+    pool = x$pool,
+    log = x$log,
+    objects = res
+  )
+}
+
+
+#' @importFrom stats mahalanobis
+mah_by_class <- function(param, x)
+  mahalanobis(x, param$center, param$scale)
+
+mah_pooled <- function(means, x, cov_mat)
+  mahalanobis(x, means, cov_mat)
+
+
+#' @importFrom tibble as_tibble
+#' @export
+bake.step_classdist <- function(object, newdata, ...) {
+  if (object$pool) {
+    x_cols <- names(object$objects[["center"]][[1]])
+    res <- lapply(
+      object$objects$center,
+      mah_pooled,
+      x = newdata[, x_cols],
+      cov_mat = object$objects$scale
+    )
+  } else {
+    x_cols <- names(object$objects[[1]]$center)
+    res <-
+      lapply(object$objects, mah_by_class, x = newdata[, x_cols])
+  }
+  if (object$log)
+    res <- lapply(res, log)
+  res <- as_tibble(res)
+  colnames(res) <- paste0("classdist_", colnames(res))
+  res <- cbind(newdata, res)
+  if (!is_tibble(res))
+    res <- as_tibble(res)
+  res
+}
+
+print.step_classdist <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Distances to", x$class, "for ")
+    if (x$trained) {
+      x_names <- if (x$pool)
+        names(x$objects[["center"]][[1]])
+      else
+        names(x$objects[[1]]$center)
+    } else x_names <- NULL
+    printer(x_names, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/corr.R b/R/corr.R
new file mode 100644
index 0000000..25e4700
--- /dev/null
+++ b/R/corr.R
@@ -0,0 +1,173 @@
+#' High Correlation Filter
+#'
+#' \code{step_corr} creates a \emph{specification} of a recipe step that will
+#'   potentially remove variables that have large absolute correlations with
+#'   other variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param threshold A value for the threshold of absolute correlation values.
+#'   The step will try to remove the minimum number of columns so that all the
+#'   resulting absolute correlations are less than this value.
+#' @param use A character string for the \code{use} argument to the
+#'   \code{\link[stats]{cor}} function.
+#' @param method A character string for the \code{method} argument to the
+#'   \code{\link[stats]{cor}} function.
+#' @param removals A character string that contains the names of columns that
+#'   should be removed. These values are not determined until
+#'   \code{\link{prep.recipe}} is called.
+#' @keywords datagen
+#' @author Original R code for filtering algorithm by Dong Li, modified by
+#'   Max Kuhn. Contributions by Reynald Lescarbeau (for original in
+#'   \code{caret} package). Max Kuhn for the \code{step} function.
+#' @concept preprocessing variable_filters
+#' @export
+#'
+#' @details This step attempts to remove variables to keep the largest absolute
+#'   correlation between the variables less than \code{threshold}.
+#' @examples
+#' data(biomass)
+#'
+#' set.seed(3535)
+#' biomass$duplicate <- biomass$carbon + rnorm(nrow(biomass))
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen +
+#'                     sulfur + duplicate,
+#'               data = biomass_tr)
+#'
+#' corr_filter <- rec %>%
+#'   step_corr(all_predictors(), threshold = .5)
+#'
+#' filter_obj <- prep(corr_filter, training = biomass_tr)
+#'
+#' filtered_te <- bake(filter_obj, biomass_te)
+#' round(abs(cor(biomass_tr[, c(3:7, 9)])), 2)
+#' round(abs(cor(filtered_te)), 2)
+#' @seealso \code{\link{step_nzv}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+step_corr <- function(recipe,
+                      ...,
+                      role = NA,
+                      trained = FALSE,
+                      threshold = 0.9,
+                      use = "pairwise.complete.obs",
+                      method = "pearson",
+                      removals = NULL) {
+  add_step(
+    recipe,
+    step_corr_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      threshold = threshold,
+      use = use,
+      method = method,
+      removals = removals
+    )
+  )
+}
+
+step_corr_new <- 
+  function(
+    terms = NULL,
+    role = NA,
+    trained = FALSE,
+    threshold = NULL,
+    use = NULL,
+    method = NULL,
+    removals = NULL
+  ) {
+    step(
+      subclass = "corr",
+      terms = terms,
+      role = role,
+      trained = trained,
+      threshold = threshold,
+      use = use,
+      method = method,
+      removals = removals
+    )
+  }
+
+#' @export
+prep.step_corr <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  filter <- corr_filter(
+    x = training[, col_names],
+    cutoff = x$threshold,
+    use = x$use,
+    method = x$method
+  )
+  
+  step_corr_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    threshold = x$threshold,
+    use = x$use,
+    method = x$method,
+    removals = filter
+  )
+}
+
+#' @export
+bake.step_corr <- function(object, newdata, ...) {
+  if (length(object$removals) > 0)
+    newdata <- newdata[,!(colnames(newdata) %in% object$removals)]
+  as_tibble(newdata)
+}
+
+print.step_corr <-
+  function(x,  width = max(20, options()$width - 36), ...) {
+    if (x$trained) {
+      if (length(x$removals) > 0) {
+        cat("Correlation filter removed ")
+        cat(format_ch_vec(x$removals, width = width))
+      } else
+        cat("Correlation filter removed no terms")
+    } else {
+      cat("Correlation filter on ", sep = "")
+      cat(format_selectors(x$terms, wdth = width))
+    }
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
+
+
+#' @importFrom stats cor
+corr_filter <-
+  function(x,
+           cutoff = .90,
+           use = "pairwise.complete.obs",
+           method = "pearson") {
+    x <- cor(x, use = use, method = method)
+    
+    if (any(!complete.cases(x)))
+      stop("The correlation matrix has some missing values.")
+    averageCorr <- colMeans(abs(x))
+    averageCorr <- as.numeric(as.factor(averageCorr))
+    x[lower.tri(x, diag = TRUE)] <- NA
+    combsAboveCutoff <- which(abs(x) > cutoff)
+    
+    colsToCheck <- ceiling(combsAboveCutoff / nrow(x))
+    rowsToCheck <- combsAboveCutoff %% nrow(x)
+    
+    colsToDiscard <-
+      averageCorr[colsToCheck] > averageCorr[rowsToCheck]
+    rowsToDiscard <- !colsToDiscard
+    
+    deletecol <-
+      c(colsToCheck[colsToDiscard], rowsToCheck[rowsToDiscard])
+    deletecol <- unique(deletecol)
+    if (length(deletecol) > 0)
+      deletecol <- colnames(x)[deletecol]
+    deletecol
+  }
diff --git a/R/data.R b/R/data.R
new file mode 100644
index 0000000..7be08ab
--- /dev/null
+++ b/R/data.R
@@ -0,0 +1,85 @@
+#' Biomass Data
+#'
+#' Ghugare et al (2014) contains a data set where different biomass fuels are
+#' characterized by the amount of certain molecules (carbon, hydrogen, oxygen, 
+#' nitrogen, and sulfur) and the corresponding higher heating value (HHV). 
+#' These data are from their Table S.2 of the Supplementary Materials
+#'
+#' @name biomass
+#' @aliases biomass
+#' @docType data
+#' @return \item{biomass}{a data frame} 
+#'
+#' @source Ghugare, S. B., Tiwary, S., Elangovan, V., and Tambe, S. S. (2013). 
+#' Prediction of Higher Heating Value of Solid Biomass Fuels Using Artificial 
+#' Intelligence Formalisms. \emph{BioEnergy Research}, 1-12.
+#'
+#' @keywords datasets
+#' @examples 
+#' data(biomass)
+#' str(biomass)
+NULL
+
+#' OkCupid Data
+#'
+#' These are a sample of columns of users of OkCupid dating website. The data
+#' are from Kim and Escobedo-Land (2015). 
+#'
+#' @name okc
+#' @aliases okc
+#' @docType data
+#' @return \item{okc}{a data frame} 
+#'
+#' @source Kim, A. Y., and A. Escobedo-Land. 2015. "OkCupid Data for 
+#'   Introductory Statistics and Data Science Courses." \emph{Journal of 
+#'   Statistics Education: An International Journal on the Teaching and 
+#'   Learning of Statistics}.
+#'
+#' @keywords datasets
+#' @examples 
+#' data(okc)
+#' str(okc)
+NULL
+
+
+#' Credit Data
+#'
+#' These data are from the website of Dr. Lluís A. Belanche Muñoz by way of a 
+#' github repository of Dr. Gaston Sanchez. One data point is a missing outcome
+#' was removed from the original data. 
+#'
+#' @name credit_data
+#' @aliases credit_data
+#' @docType data
+#' @return \item{credit_data}{a data frame} 
+#'
+#' @source \url{https://github.com/gastonstat/CreditScoring}, 
+#' \url{http://bit.ly/2kkBFrk}
+#'
+#' @keywords datasets
+#' @examples 
+#' data(credit_data)
+#' str(credit_data)
+NULL
+
+
+
+#' Raw Cover Type Data
+#'
+#' These data are raw data describing different types of forest cover-types 
+#'   from the UCI Machine Learning Database (see link below). There is one 
+#'   column in the data that has a few difference pieces of textual 
+#'   information (of variable lengths). 
+#'
+#' @name covers
+#' @aliases covers
+#' @docType data
+#' @return \item{covers}{a data frame} 
+#'
+#' @source \url{https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.info}
+#'
+#' @keywords datasets
+#' @examples 
+#' data(covers)
+#' str(covers)
+NULL
diff --git a/R/date.R b/R/date.R
new file mode 100644
index 0000000..b361d3a
--- /dev/null
+++ b/R/date.R
@@ -0,0 +1,228 @@
+#' Date Feature Generator
+#'
+#' \code{step_date} creates a a \emph{specification} of a recipe step that will
+#'   convert date data into one or more factor or numeric variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables that
+#'   will be used to create the new variables. The selected variables should
+#'   have class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for
+#'   more details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new variable
+#'   columns created by the original variables will be used as predictors in a
+#'   model.
+#' @param features A character string that includes at least one of the
+#'   following values: \code{month}, \code{dow} (day of week), \code{doy}
+#'   (day of year), \code{week}, \code{month}, \code{decimal} (decimal date,
+#'   e.g. 2002.197), \code{quarter}, \code{semester}, \code{year}.
+#' @param label A logical. Only available for features \code{month} or
+#'   \code{dow}. \code{TRUE} will display the day of the week as an ordered
+#'   factor of character strings, such as "Sunday." \code{FALSE} will display
+#'   the day of the week as a number.
+#' @param abbr A logical. Only available for features \code{month} or
+#'   \code{dow}. \code{FALSE} will display the day of the week as an ordered
+#'   factor of character strings, such as "Sunday". \code{TRUE} will display
+#'   an abbreviated version of the label, such as "Sun". \code{abbr} is
+#'   disregarded if \code{label = FALSE}.
+#' @param ordinal A logical: should factors be ordered? Only available for
+#'   features \code{month} or \code{dow}.
+#' @param columns A character string of variables that will be used as
+#'   inputs. This field is a placeholder and will be populated once
+#'    \code{\link{prep.recipe}} is used.
+#' @keywords datagen
+#' @concept preprocessing model_specification variable_encodings dates
+#' @export
+#' @details Unlike other steps, \code{step_date} does \emph{not} remove the
+#'   original date variables. \code{\link{step_rm}} can be used for this
+#'   purpose.
+#' @examples
+#' library(lubridate)
+#'
+#' examples <- data.frame(Dan = ymd("2002-03-04") + days(1:10),
+#'                        Stefan = ymd("2006-01-13") + days(1:10))
+#' date_rec <- recipe(~ Dan + Stefan, examples) %>%
+#'    step_date(all_predictors())
+#'
+#' date_rec <- prep(date_rec, training = examples)
+#' date_values <- bake(date_rec, newdata = examples)
+#' date_values
+#' @seealso \code{\link{step_holiday}} \code{\link{step_rm}} 
+#'   \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+step_date <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           features = c("dow", "month", "year"),
+           abbr = TRUE,
+           label = TRUE,
+           ordinal = FALSE,
+           columns = NULL
+  ) {
+  feat <-
+    c("year",
+      "doy",
+      "week",
+      "decimal",
+      "semester",
+      "quarter",
+      "dow",
+      "month")
+  if (!all(features %in% feat))
+    stop("Possible values of `features` should include: ",
+         paste0("'", feat, "'", collapse = ", "))
+  add_step(
+    recipe,
+    step_date_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      features = features,
+      abbr = abbr,
+      label = label,
+      ordinal = ordinal,
+      columns = columns
+    )
+  )
+}
+
+step_date_new <- 
+  function(
+    terms = NULL,
+    role = "predictor",
+    trained = FALSE,
+    features = features,
+    abbr = abbr,
+    label = label,
+    ordinal = ordinal,
+    columns = columns
+  ) {
+  step(
+    subclass = "date",
+    terms = terms,
+    role = role,
+    trained = trained,
+    features = features,
+    abbr = abbr,
+    label = label,
+    ordinal = ordinal,
+    columns = columns
+  )
+}
+
+#' @importFrom stats as.formula model.frame
+#' @export
+prep.step_date <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  date_data <- info[info$variable %in% col_names, ]
+  if (any(date_data$type != "date"))
+    stop("All variables for `step_date` should be either `Date` or", 
+         "`POSIXct` classes.", call. = FALSE)
+  
+  step_date_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    features = x$features,
+    abbr = x$abbr,
+    label = x$label,
+    ordinal = x$ordinal,
+    columns = col_names
+  )
+}
+
+
+ord2fac <- function(x, what) {
+  x <- getElement(x, what)
+  factor(as.character(x), levels = levels(x), ordered = FALSE)
+}
+
+
+#' @importFrom lubridate year yday week decimal_date quarter semester wday month
+get_date_features <-
+  function(dt,
+           feats,
+           abbr = TRUE,
+           label = TRUE,
+           ord = FALSE) {
+    ## pre-allocate values
+    res <- matrix(NA, nrow = length(dt), ncol = length(feats))
+    res <- as_tibble(res)
+    colnames(res) <- feats
+    
+    if ("year" %in% feats)
+      res[, grepl("year$", names(res))] <- year(dt)
+    if ("doy" %in% feats)
+      res[, grepl("doy$", names(res))] <- yday(dt)
+    if ("week" %in% feats)
+      res[, grepl("week$", names(res))] <- week(dt)
+    if ("decimal" %in% feats)
+      res[, grepl("decimal$", names(res))] <- decimal_date(dt)
+    if ("quarter" %in% feats)
+      res[, grepl("quarter$", names(res))] <- quarter(dt)
+    if ("semester" %in% feats)
+      res[, grepl("semester$", names(res))] <- semester(dt)
+    if ("dow" %in% feats) {
+      res[, grepl("dow$", names(res))] <-
+        wday(dt, abbr = abbr, label = label)
+      if (!ord & label == TRUE)
+        res[, grepl("dow$", names(res))]  <-
+          ord2fac(res, grep("dow$", names(res), value = TRUE))
+    }
+    if ("month" %in% feats) {
+      res[, grepl("month$", names(res))] <-
+        month(dt, abbr = abbr, label = label)
+      if (!ord & label == TRUE)
+        res[, grepl("month$", names(res))]  <-
+          ord2fac(res, grep("month$", names(res), value = TRUE))
+    }
+    res
+  }
+
+#' @importFrom tibble as_tibble is_tibble
+#' @export
+bake.step_date <- function(object, newdata, ...) {
+  new_cols <-
+    rep(length(object$features), each = length(object$columns))
+  date_values <-
+    matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols))
+  colnames(date_values) <- rep("", sum(new_cols))
+  date_values <- as_tibble(date_values)
+  
+  strt <- 1
+  for (i in seq_along(object$columns)) {
+    cols <- (strt):(strt + new_cols[i] - 1)
+    
+    tmp <- get_date_features(
+      dt = getElement(newdata, object$columns[i]),
+      feats = object$features,
+      abbr = object$abbr,
+      label = object$label,
+      ord = object$ordinal
+    )
+    
+    date_values[, cols] <- tmp
+    
+    names(date_values)[cols] <-
+      paste(object$columns[i],
+            names(tmp),
+            sep = "_")
+    
+    strt <- max(cols) + 1
+  }
+  newdata <- cbind(newdata, date_values)
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+
+print.step_date <-
+  function(x, width = max(20, options()$width - 29), ...) {
+    cat("Date features from ")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/depth.R b/R/depth.R
new file mode 100644
index 0000000..b7d637d
--- /dev/null
+++ b/R/depth.R
@@ -0,0 +1,171 @@
+#' Data Depths
+#'
+#' \code{step_depth} creates a a \emph{specification} of a recipe step that
+#'   will convert numeric data into measurement of \emph{data depth}. This is
+#'   done for each value of a categorical class variable.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables that
+#'   will be used to create the new features. See \code{\link{selections}} for
+#'   more details.
+#' @param class A single character string that specifies a single categorical
+#'   variable to be used as the class.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that resulting depth
+#'   estimates will be used as predictors in a model.
+#' @param metric A character string specifying the depth metric. Possible
+#'   values are "potential", "halfspace", "Mahalanobis", "simplicialVolume",
+#'   "spatial", and "zonoid".
+#' @param options A list of options to pass to the underlying depth functions.
+#'   See \code{\link[ddalpha]{depth.halfspace}},
+#'   \code{\link[ddalpha]{depth.Mahalanobis}},
+#'   \code{\link[ddalpha]{depth.potential}},
+#'   \code{\link[ddalpha]{depth.projection}},
+#'   \code{\link[ddalpha]{depth.simplicial}},
+#'   \code{\link[ddalpha]{depth.simplicialVolume}},
+#'   \code{\link[ddalpha]{depth.spatial}}, \code{\link[ddalpha]{depth.zonoid}}.
+#' @param data The training data are stored here once after
+#' \code{\link{prep.recipe}} is executed.
+#' @keywords datagen
+#' @concept preprocessing dimension_reduction
+#' @export
+#' @details Data depth metrics attempt to measure how close data a data point
+#'   is to the center of its distribution.  There are a number of methods for
+#'   calculating death but a simple example is the inverse of the distance of
+#'   a data point to the centroid of the distribution. Generally, small values
+#'   indicate that a data point not close to the centroid. \code{step_depth}
+#'   can compute a class-specific depth for a new data point based on the
+#'   proximity of the new value to the training set distribution.
+#'
+#' Note that the entire training set is saved to compute future depth values.
+#' The saved data have been trained (i.e. prepared) and baked (i.e. processed) up to the point before the
+#' location that \code{step_depth} occupies in the recipe. Also, the data
+#' requirements for the different step methods may vary. For example, using
+#' \code{metric = "Mahalanobis"} requires that each class should have at least
+#' as many rows as variables listed in the \code{terms} argument.
+#'
+#' The function will create a new column for every unique value of the
+#' \code{class} variable. The resulting variables will not replace the
+#' original values and have the prefix \code{depth_}.
+#'
+#' @examples
+#'
+#' # halfspace depth is the default
+#' rec <- recipe(Species ~ ., data = iris) %>%
+#'   step_depth(all_predictors(), class = "Species")
+#'
+#' rec_dists <- prep(rec, training = iris)
+#'
+#' dists_to_species <- bake(rec_dists, newdata = iris)
+#' dists_to_species
+
+step_depth <-
+  function(recipe,
+           ...,
+           class,
+           role = "predictor",
+           trained = FALSE,
+           metric =  "halfspace",
+           options = list(),
+           data = NULL) {
+    if (!is.character(class) || length(class) != 1)
+      stop("`class` should be a single character value.")
+    add_step(
+      recipe,
+      step_depth_new(
+        terms = check_ellipses(...),
+        class = class,
+        role = role,
+        trained = trained,
+        metric = metric,
+        options = options,
+        data = data
+      )
+    )
+  }
+
+step_depth_new <-
+  function(terms = NULL,
+           class = NULL,
+           role = "predictor",
+           trained = FALSE,
+           metric = NULL,
+           options = NULL,
+           data = NULL) {
+    step(
+      subclass = "depth",
+      terms = terms,
+      class = class,
+      role = role,
+      trained = trained,
+      metric = metric,
+      options = options,
+      data = data
+    )
+  }
+
+#' @importFrom stats as.formula model.frame
+#' @export
+prep.step_depth <- function(x, training, info = NULL, ...) {
+  class_var <- x$class[1]
+  x_names <- terms_select(x$terms, info = info)
+  x_dat <-
+    split(training[, x_names], getElement(training, class_var))
+  x_dat <- lapply(x_dat, as.matrix)
+  step_depth_new(
+    terms = x$terms,
+    class = x$class,
+    role = x$role,
+    trained = TRUE,
+    metric = x$metric,
+    options = x$options,
+    data = x_dat
+  )
+}
+
+
+get_depth <- function(tr_dat, new_dat, metric, opts) {
+  if (!is.matrix(new_dat))
+    new_dat <- as.matrix(new_dat)
+  opts$data <- tr_dat
+  opts$x <- new_dat
+  do.call(paste0("depth.", metric), opts)
+}
+
+
+
+#' @importFrom tibble as_tibble
+#' @importFrom ddalpha depth.halfspace depth.Mahalanobis depth.potential
+#'   depth.projection depth.simplicial depth.simplicialVolume depth.spatial
+#'   depth.zonoid
+#' @export
+bake.step_depth <- function(object, newdata, ...) {
+  x_names <- colnames(object$data[[1]])
+  x_data <- as.matrix(newdata[, x_names])
+  res <- lapply(
+    object$data,
+    get_depth,
+    new_dat = x_data,
+    metric = object$metric,
+    opts = object$options
+  )
+  res <- as_tibble(res)
+  colnames(res) <- paste0("depth_", colnames(res))
+  res <- cbind(newdata, res)
+  if (!is_tibble(res))
+    res <- as_tibble(res)
+  res
+}
+
+print.step_depth <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Data depth by ", x$class, "for ")
+    
+    if (x$trained) {
+      cat(format_ch_vec(x_names, width = width))
+    } else
+      x_names <- NULL
+    printer(x_names, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/discretize.R b/R/discretize.R
new file mode 100644
index 0000000..e56554a
--- /dev/null
+++ b/R/discretize.R
@@ -0,0 +1,291 @@
+#' Discretize Numeric Variables
+#'
+#' \code{discretize} converts a numeric vector into a factor with bins having
+#'   approximately the same number of data points (based on a training set).
+#'
+#' @export
+#' @param x A numeric vector
+discretize <- function(x, ...)
+  UseMethod("discretize")
+
+#' @rdname discretize
+discretize.default <- function(x, ...)
+  stop("Only numeric `x` is accepted")
+
+#' @rdname discretize
+#' @param cuts An integer defining how many cuts to make of the data.
+#' @param labels A character vector defining the factor levels that will be in
+#' the new factor (from smallest to largest). This should have length
+#'  \code{cuts+1} and should not include a level for missing (see
+#'  \code{keep_na} below).
+#' @param prefix A single parameter value to be used as a prefix for the factor
+#'   levels (e.g. \code{bin1}, \code{bin2}, ...). If the string is not a valid
+#'   R name, it is coerced to one.
+#' @param keep_na A logical for whether a factor level should be created to
+#'   identify missing values in \code{x}.
+#' @param infs A logical indicating whether the smallest and largest cut point
+#'   should be infinite.
+#' @param min_unique An integer defining a sample size line of dignity for the
+#'   binning. If (the number of unique values)\code{/(cuts+1)} is less than
+#'   \code{min_unique}, no discretization takes place.
+#' @param ... For \code{discretize}: options to pass to
+#'   \code{\link[stats]{quantile}} that should not include \code{x} or
+#'   \code{probs}. For \code{step_discretize}, the dots specify one or more
+#'   selector functions to choose which variables are affected by the step. See
+#'   \code{\link{selections}} for more details.
+#'
+#' @return \code{discretize} returns an object of class \code{discretize}.
+#'   \code{predict.discretize} returns a factor vector.
+#' @keywords datagen
+#' @concept preprocessing discretization factors
+#' @export
+#' @details \code{discretize} estimates the cut points from \code{x} using
+#'   percentiles. For example, if \code{cuts = 3}, the function estimates the
+#'  quartiles of \code{x} and uses these as the cut points. If \code{cuts = 2},
+#'  the bins are defined as being above or below the median of \code{x}.
+#'
+#' The \code{predict} method can then be used to turn numeric vectors into
+#'  factor vectors.
+#'
+#' If \code{keep_na = TRUE}, a suffix of "_missing" is used as a factor level
+#'  (see the examples below).
+#'
+#'If \code{infs = FALSE} and a new value is greater than the largest value of
+#'  \code{x}, a missing value will result.
+#'@examples
+#'data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' median(biomass_tr$carbon)
+#' discretize(biomass_tr$carbon, cuts = 2)
+#' discretize(biomass_tr$carbon, cuts = 2, infs = FALSE)
+#' discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE)
+#' discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin")
+#'
+#' carbon_binned <- discretize(biomass_tr$carbon)
+#' table(predict(carbon_binned, biomass_tr$carbon))
+#'
+#' carbon_no_infs <- discretize(biomass_tr$carbon, infs = FALSE)
+#' predict(carbon_no_infs, c(50, 100))
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#' rec <- rec %>% step_discretize(carbon, hydrogen)
+#' rec <- prep(rec, biomass_tr)
+#' binned_te <- bake(rec, biomass_te)
+#' table(binned_te$carbon)
+
+#' @importFrom stats quantile
+
+discretize.numeric <-
+  function(x,
+           cuts = 4,
+           labels = NULL,
+           prefix = "bin",
+           keep_na = TRUE,
+           infs = TRUE,
+           min_unique = 10,
+           ...) {
+    unique_vals <- length(unique(x))
+    missing_lab <- "_missing"
+    
+    if (cuts < 2)
+      stop("There should be at least 2 cuts")
+    
+    if (unique_vals / (cuts + 1) >= min_unique) {
+      breaks <- quantile(x, probs = seq(0, 1, length = cuts + 1), ...)
+      num_breaks <- length(breaks)
+      breaks <- unique(breaks)
+      if (num_breaks > length(breaks))
+        warning(
+          "Not enough data for ",
+          cuts,
+          " breaks. Only ",
+          length(breaks),
+          " breaks were used.",
+          sep = ""
+        )
+      if (infs) {
+        breaks[1] <- -Inf
+        breaks[length(breaks)] <- Inf
+      }
+      breaks <- unique(breaks)
+      
+      if (is.null(labels)) {
+        prefix <- prefix[1]
+        if (make.names(prefix) != prefix) {
+          warning(
+            "The prefix '",
+            prefix,
+            "' is not a valid R name. It has been changed to '",
+            make.names(prefix),
+            "'."
+          )
+          prefix <- make.names(prefix)
+        }
+        labels <- names0(length(breaks) - 1, "")
+      }
+      out <- list(
+        breaks = breaks,
+        bins = length(breaks) - 1,
+        prefix = prefix,
+        labels =  if (keep_na)
+          labels <- c(missing_lab, labels)
+        else
+          labels,
+        keep_na = keep_na
+      )
+    } else {
+      out <- list(bins = 0)
+      warning("Data not binned; too few unique values per bin. ",
+              "Adjust 'min_unique' as needed", call. = FALSE)
+    }
+    class(out) <- "discretize"
+    out
+  }
+
+#' @rdname discretize
+#' @importFrom stats predict
+#' @param object An object of class \code{discretize}.
+#' @param newdata A new numeric object to be binned.
+#' @export
+predict.discretize <- function(object, newdata, ...) {
+  if (is.matrix(newdata) |
+      is.data.frame(newdata))
+    newdata <- newdata[, 1]
+  object$labels <- paste0(object$prefix, object$labels)
+  if (object$bins >= 1) {
+    labs <- if (object$keep_na)
+      object$labels[-1]
+    else
+      object$labels
+    out <-
+      cut(newdata,
+          object$breaks,
+          labels = labs,
+          include.lowest = TRUE)
+    if (object$keep_na) {
+      out <- as.character(out)
+      if (any(is.na(newdata)))
+        out[is.na(newdata)] <- object$labels[1]
+      out <- factor(out, levels = object$labels)
+    }
+  } else
+    out <- newdata
+  
+  out
+}
+
+#' @export
+print.discretize <-
+  function(x, digits = max(3L, getOption("digits") - 3L), ...) {
+    if (length(x$breaks) > 0) {
+      cat("Bins:", length(x$labels))
+      if (any(grepl("_missing", x$labels)))
+        cat(" (includes missing category)")
+      cat("\n")
+      
+      if (length(x$breaks) <= 6) {
+        cat("Breaks:",
+            paste(signif(x$breaks, digits = digits), collapse = ", "))
+      }
+    } else {
+      if (x$bins == 0)
+        cat("Too few unique data points. No binning.")
+      else
+        cat("Non-numeric data. No binning was used.")
+    }
+  }
+
+
+#' @rdname discretize
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param objects The \code{\link{discretize}} objects are stored here once
+#'   the recipe has be trained by \code{\link{prep.recipe}}.
+#' @param options A list of options to \code{\link{discretize}}. A defaults is
+#'   set for the argument \code{x}. Note that the using the options
+#'   \code{prefix} and \code{labels} when more than one variable is being
+#'   transformed might be problematic as all variables inherit those values.
+#' @export
+
+step_discretize <- function(recipe,
+                            ...,
+                            role = NA,
+                            trained = FALSE,
+                            objects = NULL,
+                            options = list()) {
+  add_step(
+    recipe,
+    step_discretize_new(
+      terms = check_ellipses(...),
+      trained = trained,
+      role = role,
+      objects = objects,
+      options = options
+    )
+  )
+}
+
+step_discretize_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           objects = NULL,
+           options = NULL) {
+    step(
+      subclass = "discretize",
+      terms = terms,
+      role = role,
+      trained = trained,
+      objects = objects,
+      options = options
+    )
+  }
+
+bin_wrapper <- function(x, args) {
+  bin_call <-
+    quote(discretize(x, cuts, labels, prefix, keep_na, infs, min_unique, ...))
+  args <- sub_args(discretize.numeric, args, "x")
+  args$x <- x
+  eval(bin_call, envir = args)
+}
+
+#' @export
+prep.step_discretize <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  if (length(col_names) > 1 &
+      any(names(x$options) %in% c("prefix", "labels"))) {
+    warning("Note that the options `prefix` and `labels`",
+            "will be applied to all variables")
+  }
+  
+  obj <- lapply(training[, col_names], bin_wrapper, x$options)
+  step_discretize_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    objects = obj,
+    options = x$options
+  )
+}
+
+#' @importFrom tibble as_tibble
+#' @importFrom stats predict
+#' @export
+bake.step_discretize <- function(object, newdata, ...) {
+  for (i in names(object$objects))
+    newdata[, i] <-
+      predict(object$objects[[i]], getElement(newdata, i))
+  as_tibble(newdata)
+}
+
+print.step_discretize <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Dummy variables from ")
+    printer(names(x$objects), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/dummy.R b/R/dummy.R
new file mode 100644
index 0000000..e89c9c1
--- /dev/null
+++ b/R/dummy.R
@@ -0,0 +1,167 @@
+#' Dummy Variables Creation
+#'
+#' \code{step_dummy} creates a a \emph{specification} of a recipe step that
+#'   will convert nominal data (e.g. character or factors) into one or more
+#'   numeric binary model terms for the levels of the original data.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will
+#'   be used to create the dummy variables. See \code{\link{selections}} for
+#'   more details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the binary
+#'   dummy variable columns created by the original variables will be used as
+#'   predictors in a model.
+#' @param contrast A specification for which type of contrast should be used
+#'   to make a set of full rank dummy variables. See
+#'   \code{\link[stats]{contrasts}} for more details. \bold{not currently
+#'   working}
+#' @param naming A function that defines the naming convention for new binary
+#' columns. See Details below.
+#' @param levels A list that contains the information needed to create dummy
+#'   variables for each variable contained in \code{terms}. This is
+#'   \code{NULL} until the step is trained by \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing dummy_variables model_specification dummy_variables
+#'   variable_encodings
+#' @export
+#' @details \code{step_dummy} will create a set of binary dummy variables 
+#'   from a factor variable. For example, if a factor column in the data set
+#'   has levels of "red", "green", "blue", the dummy variable bake will
+#'   create two additional columns of 0/1 data for two of those three values
+#'   (and remove the original column).
+#'
+#' By default, the missing dummy variable will correspond to the first level
+#'   of the factor being converted.
+#'
+#' The function allows for non-standard naming of the resulting variables. For
+#'   a factor named \code{x}, with levels \code{"a"} and \code{"b"}, the
+#'   default naming convention would be to create a new variable called
+#'   \code{x_b}. Note that if the factor levels are not valid variable names
+#'   (e.g. "some text with spaces"), it will be changed by
+#'   \code{\link[base]{make.names}} to be valid (see the example below). The
+#'   naming format can be changed using the \code{naming} argument.
+#' @examples
+#' data(okc)
+#' okc <- okc[complete.cases(okc),]
+#'
+#' rec <- recipe(~ diet + age + height, data = okc)
+#'
+#' dummies <- rec %>% step_dummy(diet)
+#' dummies <- prep(dummies, training = okc)
+#'
+#' dummy_data <- bake(dummies, newdata = okc)
+#'
+#' unique(okc$diet)
+#' grep("^diet", names(dummy_data), value = TRUE)
+
+
+step_dummy <- 
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           contrast = options("contrasts"),
+           naming = function(var, lvl)
+             paste(var, make.names(lvl), sep = "_"),
+           levels = NULL) {
+  add_step(
+    recipe,
+    step_dummy_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      contrast = contrast,
+      naming = naming,
+      levels = levels
+    )
+  )
+}
+
+step_dummy_new <-
+  function(terms = NULL,
+           role = "predictor",
+           trained = FALSE,
+           contrast = contrast,
+           naming = naming,
+           levels = levels
+    ) {
+  step(
+    subclass = "dummy",
+    terms = terms,
+    role = role,
+    trained = trained,
+    contrast = contrast,
+    naming = naming,
+    levels = levels
+  )
+}
+
+#' @importFrom stats as.formula model.frame
+#' @export
+prep.step_dummy <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  ## I hate doing this but currently we are going to have
+  ## to save the terms object form the original (= training)
+  ## data
+  levels <- vector(mode = "list", length = length(col_names))
+  names(levels) <- col_names
+  for (i in seq_along(col_names)) {
+    form <- as.formula(paste0("~", col_names[i]))
+    terms <- model.frame(form,
+                         data = training,
+                         xlev = x$levels[[i]])
+    levels[[i]] <- attr(terms, "terms")
+  }
+  
+  step_dummy_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    contrast = x$contrast,
+    naming = x$naming,
+    levels = levels
+  )
+}
+
+#' @export
+bake.step_dummy <- function(object, newdata, ...) {
+  ## Maybe do this in C?
+  col_names <- names(object$levels)
+  
+  ## `na.action` cannot be passed to `model.matrix` but we
+  ## can change it globally for a bit
+  old_opt <- options()$na.action
+  options(na.action = "na.pass")
+  on.exit(options(na.action = old_opt))
+  
+  for (i in seq_along(object$levels)) {
+    indicators <- 
+      model.matrix(
+        object = object$levels[[i]],
+        data = newdata
+      )
+    
+    options(na.action = old_opt)
+    on.exit(expr = NULL)
+    
+    indicators <- indicators[, -1, drop = FALSE]
+    ## use backticks for nonstandard factor levels here
+    used_lvl <- gsub(paste0("^", col_names[i]), "", colnames(indicators))
+    colnames(indicators) <- object$naming(col_names[i], used_lvl)
+    newdata <- cbind(newdata, as_tibble(indicators))
+    newdata[, col_names[i]] <- NULL
+  }
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+print.step_dummy <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Dummy variables from ")
+    printer(x$levels, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/holiday.R b/R/holiday.R
new file mode 100644
index 0000000..c972745
--- /dev/null
+++ b/R/holiday.R
@@ -0,0 +1,163 @@
+#' Holiday Feature Generator
+#'
+#' \code{step_holiday} creates a a \emph{specification} of a recipe step that
+#'   will convert date data into one or more binary indicator variables for
+#'   common holidays.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   used to create the new variables. The selected variables should have
+#'   class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for
+#'   more details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new variable
+#'   columns created by the original variables will be used as predictors in
+#'   a model.
+#' @param holidays A character string that includes at least one holdiay
+#'   supported by the \code{timeDate} package. See
+#'   \code{\link[timeDate]{listHolidays}} for a complete list.
+
+#' @param columns A character string of variables that will be used as
+#'   inputs. This field is a placeholder and will be populated once
+#'   \code{\link{prep.recipe}} is used.
+#' @keywords datagen
+#' @concept preprocessing model_specification variable_encodings dates
+#' @export
+#' @details Unlike other steps, \code{step_holiday} does \emph{not} remove the
+#'   original date variables. \code{\link{step_rm}} can be used for
+#'   this purpose.
+#' @examples
+#' library(lubridate)
+#'
+#' examples <- data.frame(someday = ymd("2000-12-20") + days(0:40))
+#' holiday_rec <- recipe(~ someday, examples) %>%
+#'    step_holiday(all_predictors())
+#'
+#' holiday_rec <- prep(holiday_rec, training = examples)
+#' holiday_values <- bake(holiday_rec, newdata = examples)
+#' holiday_values
+#' @seealso \code{\link{step_date}} \code{\link{step_rm}}
+#'   \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}} \code{\link[timeDate]{listHolidays}}
+#' @import timeDate
+step_holiday <-
+  function(
+    recipe,
+    ...,
+    role = "predictor",
+    trained = FALSE,
+    holidays = c("LaborDay", "NewYearsDay", "ChristmasDay"),
+    columns = NULL
+  ) {
+  all_days <- listHolidays()
+  if (!all(holidays %in% all_days))
+    stop("Invalid `holidays` value. See timeDate::listHolidays", call. = FALSE)
+
+  add_step(
+    recipe,
+    step_holiday_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      holidays = holidays,
+      columns = columns
+    )
+  )
+}
+
+step_holiday_new <-
+  function(
+    terms = NULL,
+    role = "predictor",
+    trained = FALSE,
+    holidays = holidays,
+    columns = columns
+    ) {
+  step(
+    subclass = "holiday",
+    terms = terms,
+    role = role,
+    trained = trained,
+    holidays = holidays,
+    columns = columns
+  )
+}
+
+#' @importFrom stats as.formula model.frame
+#' @export
+prep.step_holiday <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  holiday_data <- info[info$variable %in% col_names, ]
+  if (any(holiday_data$type != "date"))
+    stop("All variables for `step_holiday` should be either `Date` ",
+         "or `POSIXct` classes.", call. = FALSE)
+  
+  step_holiday_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    holidays = x$holidays,
+    columns = col_names
+  )
+}
+
+
+is_holiday <- function(hol, dt) {
+  hdate <- holiday(year = unique(year(dt)), Holiday = hol)
+  hdate <- as.Date(hdate)
+  out <- rep(0, length(dt))
+  out[dt %in% hdate] <- 1
+  out
+}
+
+#' @importFrom lubridate year is.Date
+get_holiday_features <- function(dt, hdays) {
+  if (!is.Date(dt))
+    dt <- as.Date(dt)
+  hdays <- as.list(hdays)
+  hfeat <- lapply(hdays, is_holiday, dt = dt)
+  hfeat <- do.call("cbind", hfeat)
+  colnames(hfeat) <- unlist(hdays)
+  as_tibble(hfeat)
+}
+
+#' @importFrom tibble as_tibble is_tibble
+#' @export
+bake.step_holiday <- function(object, newdata, ...) {
+  new_cols <-
+    rep(length(object$holidays), each = length(object$columns))
+  holiday_values <-
+    matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols))
+  colnames(holiday_values) <- rep("", sum(new_cols))
+  holiday_values <- as_tibble(holiday_values)
+  
+  strt <- 1
+  for (i in seq_along(object$columns)) {
+    cols <- (strt):(strt + new_cols[i] - 1)
+    
+    tmp <- get_holiday_features(dt = getElement(newdata, object$columns[i]),
+                                hdays = object$holidays)
+    
+    holiday_values[, cols] <- tmp
+    
+    names(holiday_values)[cols] <-
+      paste(object$columns[i],
+            names(tmp),
+            sep = "_")
+    
+    strt <- max(cols) + 1
+  }
+  newdata <- cbind(newdata, as_tibble(holiday_values))
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+print.step_holiday <-
+  function(x, width = max(20, options()$width - 29), ...) {
+    cat("Holiday features from ")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/hyperbolic.R b/R/hyperbolic.R
new file mode 100644
index 0000000..098daf7
--- /dev/null
+++ b/R/hyperbolic.R
@@ -0,0 +1,112 @@
+#' Hyperbolic Transformations
+#'
+#' \code{step_hyperbolic} creates a \emph{specification} of a recipe step that
+#'   will transform data using a hyperbolic function.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param func A character value for the function. Valid values are "sin",
+#'   "cos", or "tan".
+#' @param inverse A logical: should the inverse function be used?
+#' @param columns A character string of variable names that will be (eventually)
+#'   populated by the \code{terms} argument.
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @examples
+#' set.seed(313)
+#' examples <- matrix(rnorm(40), ncol = 2)
+#' examples <- as.data.frame(examples)
+#'
+#' rec <- recipe(~ V1 + V2, data = examples)
+#'
+#' cos_trans <- rec  %>%
+#'   step_hyperbolic(all_predictors(),
+#'                   func = "cos", inverse = FALSE)
+#'
+#' cos_obj <- prep(cos_trans, training = examples)
+#'
+#' transformed_te <- bake(cos_obj, examples)
+#' plot(examples$V1, transformed_te$V1)
+#' @seealso \code{\link{step_logit}} \code{\link{step_invlogit}}
+#'   \code{\link{step_log}}  \code{\link{step_sqrt}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+step_hyperbolic <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           func = "sin",
+           inverse = TRUE,
+           columns = NULL) {
+    funcs <- c("sin", "cos", "tan")
+    if (!(func %in% funcs))
+      stop("`func` should be either `sin``, `cos`, or `tan`", call. = FALSE)
+    add_step(
+      recipe,
+      step_hyperbolic_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        func = func,
+        inverse = inverse,
+        columns = columns
+      )
+    )
+  }
+
+step_hyperbolic_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           func = NULL,
+           inverse = NULL,
+           columns = NULL) {
+    step(
+      subclass = "hyperbolic",
+      terms = terms,
+      role = role,
+      trained = trained,
+      func = func,
+      inverse = inverse,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_hyperbolic <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_hyperbolic_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    func = x$func,
+    inverse = x$inverse,
+    columns = col_names
+  )
+}
+
+#' @export
+bake.step_hyperbolic <- function(object, newdata, ...) {
+  func <- if (object$inverse)
+    get(paste0("a", object$func))
+  else
+    get(object$func)
+  col_names <- object$columns
+  for (i in seq_along(col_names))
+    newdata[, col_names[i]] <-
+    func(getElement(newdata, col_names[i]))
+  as_tibble(newdata)
+}
+
+print.step_hyperbolic <-
+  function(x, width = max(20, options()$width - 32), ...) {
+    ttl <- paste("Hyperbolic", x$func)
+    if (x$inverse)
+      ttl <- paste(ttl, "(inv)")
+    cat(ttl, "transformation on ")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/ica.R b/R/ica.R
new file mode 100644
index 0000000..ef56068
--- /dev/null
+++ b/R/ica.R
@@ -0,0 +1,164 @@
+#' ICA Signal Extraction
+#'
+#' \code{step_ica} creates a \emph{specification} of a recipe step that will
+#'   convert numeric data into one or more independent components.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   used to compute the components. See \code{\link{selections}} for more
+#'   details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new
+#'   independent component columns created by the original variables will be
+#'   used as predictors in a model.
+#' @param num The number of ICA components to retain as new predictors. If
+#'   \code{num} is greater than the number of columns or the number of possible
+#'   components, a smaller value will be used.
+#' @param options A list of options to \code{\link[fastICA]{fastICA}}. No
+#'   defaults are set here. \bold{Note} that the arguments \code{X} and
+#'   \code{n.comp} should not be passed here.
+#' @param res The \code{\link[fastICA]{fastICA}} object is stored here once
+#'   this preprocessing step has be trained by \code{\link{prep.recipe}}.
+#' @param prefix A character string that will be the prefix to the resulting
+#'   new variables. See notes below.
+#' @keywords datagen
+#' @concept preprocessing ica projection_methods
+#' @export
+#' @details Independent component analysis (ICA) is a transformation of a
+#'   group of variables that produces a new set of artificial features or
+#'   components. ICA assumes that the variables are mixtures of a set of
+#'   distinct, non-Gaussian signals and attempts to transform the data to
+#'   isolate these signals. Like PCA, the components are statistically
+#'   independent from one another. This means that they can be used to combat
+#'   large inter-variables correlations in a data set. Also like PCA, it is
+#'   advisable to center and scale the variables prior to running ICA.
+#'
+#' This package produces components using the "FastICA" methodology (see
+#'   reference below).
+#'
+#' The argument \code{num} controls the number of components that will be
+#'   retained (the original variables that are used to derive the components
+#'   are removed from the data). The new components will have names that begin
+#'   with \code{prefix} and a sequence of numbers. The variable names are
+#'   padded with zeros. For example, if \code{num < 10}, their names will be
+#'   \code{IC1} - \code{IC9}. If \code{num = 101}, the names would be
+#'   \code{IC001} - \code{IC101}.
+#'
+#' @references Hyvarinen, A., and Oja, E. (2000). Independent component
+#'   analysis: algorithms and applications. \emph{Neural Networks}, 13(4-5),
+#'   411-430.
+#'
+#' @examples
+#' # from fastICA::fastICA
+#' set.seed(131)
+#' S <- matrix(runif(400), 200, 2)
+#' A <- matrix(c(1, 1, -1, 3), 2, 2, byrow = TRUE)
+#' X <- as.data.frame(S %*% A)
+#'
+#' tr <- X[1:100, ]
+#' te <- X[101:200, ]
+#'
+#' rec <- recipe( ~ ., data = tr)
+#'
+#' ica_trans <- step_center(rec,  V1, V2)
+#' ica_trans <- step_scale(rec, V1, V2)
+#' ica_trans <- step_ica(rec, V1, V2, num = 2)
+#' ica_estimates <- prep(ica_trans, training = tr)
+#' ica_data <- bake(ica_estimates, te)
+#'
+#' plot(te$V1, te$V2)
+#' plot(ica_data$IC1, ica_data$IC2)
+#' @seealso \code{\link{step_pca}} \code{\link{step_kpca}}
+#'   \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+step_ica <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           num  = 5,
+           options = list(),
+           res = NULL,
+           prefix = "IC") {
+    add_step(
+      recipe,
+      step_ica_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        num = num,
+        options = options,
+        res = res,
+        prefix = prefix
+      )
+    )
+  }
+
+step_ica_new <-
+  function(terms = NULL,
+           role = "predictor",
+           trained = FALSE,
+           num  = NULL,
+           options = NULL,
+           res = NULL,
+           prefix = "IC") {
+    step(
+      subclass = "ica",
+      terms = terms,
+      role = role,
+      trained = trained,
+      num = num,
+      options = options,
+      res = res,
+      prefix = prefix
+    )
+  }
+
+#' @importFrom dimRed FastICA dimRedData
+#' @export
+prep.step_ica <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  x$num <- min(x$num, length(col_names))
+  
+  indc <- FastICA(stdpars = x$options)
+  indc <-
+    indc at fun(dimRedData(as.data.frame(training[, col_names, drop = FALSE])),
+             list(ndim = x$num))
+  
+  step_ica_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    num = x$num,
+    options = x$options,
+    res = indc,
+    prefix = x$prefix
+  )
+}
+
+#' @export
+bake.step_ica <- function(object, newdata, ...) {
+  ica_vars <- colnames(environment(object$res at apply)$indata)
+  comps <-
+    object$res at apply(
+      dimRedData(
+        as.data.frame(newdata[, ica_vars, drop = FALSE])
+        )
+      )@data
+  comps <- comps[, 1:object$num, drop = FALSE]
+  colnames(comps) <- names0(ncol(comps), object$prefix)
+  newdata <- cbind(newdata, as_tibble(comps))
+  newdata <-
+    newdata[, !(colnames(newdata) %in% ica_vars), drop = FALSE]
+  as_tibble(newdata)
+}
+
+
+print.step_ica <-
+  function(x, width = max(20, options()$width - 29), ...) {
+    cat("ICA extraction with ")
+    printer(colnames(x$res at org.data), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/interactions.R b/R/interactions.R
new file mode 100644
index 0000000..0216a06
--- /dev/null
+++ b/R/interactions.R
@@ -0,0 +1,218 @@
+#' Create Interaction Variables
+#'
+#' \code{step_interact} creates a \emph{specification} of a recipe step that
+#'   will create new columns that are interaction terms between two or more
+#'   variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param terms A traditional R formula that contains interaction terms.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new columns
+#'   created from the original variables will be used as predictors in a model.
+#' @param objects A list of \code{terms} objects for each individual interation.
+#' @param sep A character value used to delinate variables in an interaction
+#'   (e.g. \code{var1_x_var2} instead of the more traditional \code{var1:var2}).
+#' @keywords datagen
+#' @concept preprocessing model_specification
+#' @export
+#' @details \code{step_interact} can create interactions between variables. It
+#'   is primarily intended for \bold{numeric data}; categorical variables
+#'   should probably be converted to dummy variables using
+#'   \code{\link{step_dummy}} prior to being used for interactions.
+#'
+#' Unlike other step functions, the \code{terms} argument should be a
+#'   traditional R model formula but should contain no inline functions (e.g.
+#'   \code{log}). For example, for predictors \code{A}, \code{B}, and \code{C},
+#'   a formula such as \code{~A:B:C} can be used to make a three way
+#'   interaction between the variables. If the formula contains terms other
+#'   than interactions (e.g. \code{(A+B+C)^3}) only the interaction terms are
+#'   retained for the design matrix.
+#'
+#' The separator between the variables defaults to "\code{_x_}" so that the
+#'   three way interaction shown previously would generate a column named
+#'   \code{A_x_B_x_C}. This can be changed using the \code{sep} argument.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' int_mod_1 <- rec %>%
+#'   step_interact(terms = ~ carbon:hydrogen)
+#'
+#' int_mod_2 <- int_mod_1 %>%
+#'   step_interact(terms = ~ (oxygen + nitrogen + sulfur)^3)
+#'
+#' int_mod_1 <- prep(int_mod_1, training = biomass_tr)
+#' int_mod_2 <- prep(int_mod_2, training = biomass_tr)
+#'
+#' dat_1 <- bake(int_mod_1, biomass_te)
+#' dat_2 <- bake(int_mod_2, biomass_te)
+#'
+#' names(dat_1)
+#' names(dat_2)
+
+step_interact <-
+  function(recipe,
+           terms,
+           role = "predictor",
+           trained = FALSE,
+           objects = NULL,
+           sep = "_x_") {
+    add_step(
+      recipe,
+      step_interact_new(
+        terms = terms,
+        trained = trained,
+        role = role,
+        objects = objects,
+        sep = sep
+      )
+    )
+  }
+
+## Initializes a new object
+step_interact_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           objects = NULL,
+           sep = NULL) {
+    step(
+      subclass = "interact",
+      terms = terms,
+      role = role,
+      trained = trained,
+      objects = objects,
+      sep = sep
+    )
+  }
+
+
+## The idea is to save a bunch of x-factor interaction terms instead of
+## one large set of collected terms.
+#' @export
+prep.step_interact <- function(x, training, info = NULL, ...) {
+  ## First, find the interaction terms based on the given formula
+  int_terms <- get_term_names(x$terms, vnames = colnames(training))
+  
+  ## Check to see if any variables are non-numeric and issue a warning
+  ## if that is the case
+  vars <-
+    unique(unlist(lapply(make_new_formula(int_terms), all.vars)))
+  var_check <- info[info$variable %in% vars, ]
+  if (any(var_check$type == "nominal"))
+    warning(
+      "Categorical variables used in `step_interact` should probably be ",
+      "avoided;  This can lead to differences in dummy variable values that ",
+      "are produced by `step_dummy`."
+    )
+  
+  ## For each interaction, create a new formula that has main effects
+  ## and only the interaction of choice (e.g. `a+b+c+a:b:c`)
+  int_forms <- make_new_formula(int_terms)
+  
+  ## Generate a standard R `terms` object from these short formulas and
+  ## save to make future interactions
+  int_terms <- make_small_terms(int_forms, training)
+  
+  step_interact_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    objects = int_terms,
+    sep = x$sep
+  )
+}
+
+
+#' @export
+bake.step_interact <- function(object, newdata, ...) {
+  ## `na.action` cannot be passed to `model.matrix` but we
+  ## can change it globally for a bit
+  
+  old_opt <- options()$na.action
+  options(na.action = "na.pass")
+  on.exit(options(na.action = old_opt))
+  
+  ## Create low level model matrices then remove the non-interaction terms.
+  res <- lapply(object$object, model.matrix, data = newdata)
+  options(na.action = old_opt)
+  on.exit(expr = NULL)
+  
+  res <-
+    lapply(res, function(x)
+      x[, grepl(":", colnames(x)), drop = FALSE])
+  ncols <- vapply(res, ncol, c(int = 1L))
+  out <- matrix(NA, nrow = nrow(newdata), ncol = sum(ncols))
+  strt <- 1
+  for (i in seq_along(ncols)) {
+    cols <- (strt):(strt + ncols[i] - 1)
+    out[, cols] <- res[[i]]
+    strt <- max(cols) + 1
+  }
+  colnames(out) <-
+    gsub(":", object$sep, unlist(lapply(res, colnames)))
+  newdata <- cbind(newdata, as_tibble(out))
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+## This uses the highest level of interactions
+x_fac_int <- function(x)
+  as.formula(
+    paste0("~",
+           paste0(x, collapse = "+"),
+           "+",
+           paste0(x, collapse = ":")
+    )
+  )
+
+make_new_formula <- function(x) {
+  splitup <- strsplit(x, ":")
+  lapply(splitup, x_fac_int)
+}
+
+
+#' @importFrom stats model.matrix
+
+## Given a standard model formula and some data, get the
+## term expansion (without `.`s). This returns the factor
+## names and would not expand dummy variables.
+get_term_names <- function(form, vnames) {
+  ## We are going to cheat and make a small fake data set to
+  ## effcicently get the full formula exapnsion from
+  ## model.matrix (devoid of factor levels) and then
+  ## pick off the interactions
+  dat <- matrix(1, nrow = 5, ncol = length(vnames))
+  colnames(dat) <- vnames
+  nms <- colnames(model.matrix(form, data = as.data.frame(dat)))
+  nms <- nms[nms != "(Intercept)"]
+  nms <- grep(":", nms, value = TRUE)
+  nms
+}
+
+#' @importFrom stats terms
+
+## For a given data set and a list of formulas, generate the
+## standard R `terms` objects
+make_small_terms <- function(forms, dat) {
+  lapply(forms, terms, data = dat)
+}
+
+
+print.step_interact <-
+  function(x, width = max(20, options()$width - 27), ...) {
+    cat("Interactions with ", sep = "")
+    cat(as.character(x$terms)[-1])
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
diff --git a/R/intercept.R b/R/intercept.R
new file mode 100644
index 0000000..7db4730
--- /dev/null
+++ b/R/intercept.R
@@ -0,0 +1,78 @@
+#' Add intercept (or constant) column
+#'
+#' \code{step_intercept} creates a \emph{specification} of a recipe step that
+#'   will add an intercept or constant term in the first column of a data
+#'   matrix. \code{step_intercept} has defaults to \emph{predictor} role so
+#'   that it is by default called in the bake step. Be careful to avoid
+#'   unintentional transformations when calling steps with
+#'   \code{all_predictors}.
+#'
+#' @param recipe A recipe object. The step will be added to the sequence of
+#'   operations for this recipe.
+#' @param ... Argument ignored; included for consistency with other step
+#'   specification functions.
+#' @param role Defaults to "predictor"
+#' @param trained A logical to indicate if the quantities for preprocessing
+#'   have been estimated. Again included for consistency.
+#' @param name Character name for new added column
+#' @param value A numeric constant to fill the intercept column. Defaults to 1.
+#'
+#' @return An updated version of \code{recipe} with the
+#'   new step added to the sequence of existing steps (if any).
+#' @export
+#'
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#' rec_trans <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) %>%
+#'   step_intercept(value = 2)
+#'
+#' rec_obj <- prep(rec_trans, training = biomass_tr)
+#'
+#' with_intercept <- bake(rec_obj, biomass_te)
+#' with_intercept
+#'
+#' @seealso \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+step_intercept <- function(recipe, ..., role = "predictor",
+                           trained = FALSE, name = "intercept", value = 1) {
+  if (length(list(...)) > 0)
+    warning("Selectors are not used for this step.", call. = FALSE)
+  if (!is.numeric(value))
+    stop("Intercept value must be numeric.", call. = FALSE)
+  if (!is.character(name) | length(name) != 1)
+    stop("Intercept/constant column name must be a character value.", call. = FALSE)
+  add_step(
+    recipe,
+    step_intercept_new(
+      role = role,
+      trained = trained,
+      name = name,
+      value = value))
+}
+
+step_intercept_new <- function(role = "predictor", trained = FALSE,
+                               name = "intercept", value = 1) {
+  step(
+    subclass = "intercept",
+    role = role,
+    trained = trained,
+    name = name,
+    value = value
+  )
+}
+
+prep.step_intercept <- function(x, training, info = NULL, ...) {
+  x$trained <- TRUE
+  x
+}
+
+#' @importFrom tibble add_column
+bake.step_intercept <- function(object, newdata, ...) {
+  tibble::add_column(newdata, !!object$name := object$value, .before = TRUE)
+}
diff --git a/R/invlogit.R b/R/invlogit.R
new file mode 100644
index 0000000..eba36b0
--- /dev/null
+++ b/R/invlogit.R
@@ -0,0 +1,90 @@
+#' Inverse Logit Transformation
+#'
+#' \code{step_invlogit} creates a \emph{specification} of a recipe step that
+#'   will transform the data from real values to be between zero and one.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param columns A character string of variable names that will be (eventually)
+#'   populated by the \code{terms} argument.
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @details The inverse logit transformation takes values on the real line and
+#'   translates them to be between zero and one using the function
+#'   \code{f(x) = 1/(1+exp(-x))}.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' ilogit_trans <- rec  %>%
+#'   step_center(carbon, hydrogen) %>%
+#'   step_scale(carbon, hydrogen) %>%
+#'   step_invlogit(carbon, hydrogen)
+#'
+#' ilogit_obj <- prep(ilogit_trans, training = biomass_tr)
+#'
+#' transformed_te <- bake(ilogit_obj, biomass_te)
+#' plot(biomass_te$carbon, transformed_te$carbon)
+#' @seealso \code{\link{step_logit}} \code{\link{step_log}}
+#'   \code{\link{step_sqrt}}  \code{\link{step_hyperbolic}}
+#'   \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+
+step_invlogit <-
+  function(recipe, ...,  role = NA, trained = FALSE, columns = NULL) {
+    add_step(recipe,
+             step_invlogit_new(
+               terms = check_ellipses(...),
+               role = role,
+               trained = trained,
+               columns = columns
+             ))
+  }
+
+step_invlogit_new <-
+  function(terms = NULL, role = NA, trained = FALSE, columns = NULL) {
+    step(
+      subclass = "invlogit",
+      terms = terms,
+      role = role,
+      trained = trained,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_invlogit <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_invlogit_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    columns = col_names
+  )
+}
+
+#' @importFrom tibble as_tibble
+#' @importFrom stats binomial
+#' @export
+bake.step_invlogit <- function(object, newdata, ...) {
+  for (i in seq_along(object$columns))
+    newdata[, object$columns[i]] <-
+      binomial()$linkinv(unlist(getElement(newdata, object$columns[i]),
+                                use.names = FALSE))
+  as_tibble(newdata)
+}
+
+
+print.step_invlogit <-
+  function(x, width = max(20, options()$width - 26), ...) {
+    cat("Inverse logit on ", sep = "")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/isomap.R b/R/isomap.R
new file mode 100644
index 0000000..4734941
--- /dev/null
+++ b/R/isomap.R
@@ -0,0 +1,171 @@
+#' Isomap Embedding
+#'
+#' \code{step_isomap} creates a \emph{specification} of a recipe step that will
+#'   convert numeric data into one or more new dimensions.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   used to compute the dimensions. See \code{\link{selections}} for more
+#'   details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new
+#'   dimension columns created by the original variables will be used as
+#'   predictors in a model.
+#' @param num The number of isomap dimensions to retain as new predictors. If
+#'   \code{num} is greater than the number of columns or the number of
+#'   possible dimensions, a smaller value will be used.
+#' @param options A list of options to \code{\link[dimRed]{Isomap}}.
+#' @param res The \code{\link[dimRed]{Isomap}} object is stored here once this
+#'   preprocessing step has be trained by \code{\link{prep.recipe}}.
+#' @param prefix A character string that will be the prefix to the resulting
+#'   new variables. See notes below
+#' @keywords datagen
+#' @concept preprocessing isomap projection_methods
+#' @export
+#' @details Isomap is a form of multidimensional scaling (MDS). MDS methods
+#'   try to find a reduced set of dimensions such that the geometric distances
+#'   between the original data points are preserved. This version of MDS uses
+#'   nearest neighbors in the data as a method for increasing the fidelity of
+#'   the new dimensions to the original data values.
+#'
+#' It is advisable to center and scale the variables prior to running Isomap
+#'   (\code{step_center} and \code{step_scale} can be used for this purpose).
+#'
+#' The argument \code{num} controls the number of components that will be
+#'   retained (the original variables that are used to derive the components
+#'   are removed from the data). The new components will have names that begin
+#'   with \code{prefix} and a sequence of numbers. The variable names are
+#'   padded with zeros. For example, if \code{num < 10}, their names will be
+#'   \code{Isomap1} - \code{Isomap9}. If \code{num = 101}, the names would be
+#'   \code{Isomap001} - \code{Isomap101}.
+#' @references De Silva, V., and Tenenbaum, J. B. (2003). Global versus local
+#'   methods in nonlinear dimensionality reduction. \emph{Advances in Neural
+#'   Information Processing Systems}. 721-728.
+#'
+#' \pkg{dimRed}, a framework for dimensionality reduction,
+#'   \url{https://github.com/gdkrmr}
+#'
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' im_trans <- rec %>%
+#'   step_YeoJohnson(all_predictors()) %>%
+#'   step_center(all_predictors()) %>%
+#'   step_scale(all_predictors()) %>%
+#'   step_isomap(all_predictors(),
+#'               options = list(knn = 100),
+#'               num = 2)
+#'
+#' im_estimates <- prep(im_trans, training = biomass_tr)
+#'
+#' im_te <- bake(im_estimates, biomass_te)
+#'
+#' rng <- extendrange(c(im_te$Isomap1, im_te$Isomap2))
+#' plot(im_te$Isomap1, im_te$Isomap2,
+#'      xlim = rng, ylim = rng)
+#' @seealso \code{\link{step_pca}} \code{\link{step_kpca}}
+#'   \code{\link{step_ica}} \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+
+step_isomap <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           num  = 5,
+           options = list(knn = 50, .mute = c("message", "output")),
+           res = NULL,
+           prefix = "Isomap") {
+    add_step(
+      recipe,
+      step_isomap_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        num = num,
+        options = options,
+        res = res,
+        prefix = prefix
+      )
+    )
+  }
+
+step_isomap_new <-
+  function(terms = NULL,
+           role = "predictor",
+           trained = FALSE,
+           num  = NULL,
+           options = NULL,
+           res = NULL,
+           prefix = "isomap") {
+    step(
+      subclass = "isomap",
+      terms = terms,
+      role = role,
+      trained = trained,
+      num = num,
+      options = options,
+      res = res,
+      prefix = prefix
+    )
+  }
+
+#' @importFrom dimRed embed dimRedData
+#' @export
+prep.step_isomap <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  x$num <- min(x$num, ncol(training))
+  x$options$knn <- min(x$options$knn, nrow(training))
+  
+  imap <-
+    embed(
+      dimRedData(as.data.frame(training[, col_names, drop = FALSE])),
+      "Isomap",
+      knn = x$options$knn,
+      ndim = x$num,
+      .mute = x$options$.mute
+    )
+  
+  step_isomap_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    num = x$num,
+    options = x$options,
+    res = imap,
+    prefix = x$prefix
+  )
+}
+
+#' @export
+bake.step_isomap <- function(object, newdata, ...) {
+  isomap_vars <- colnames(environment(object$res at apply)$indata)
+  comps <-
+    object$res at apply(
+      dimRedData(as.data.frame(newdata[, isomap_vars, drop = FALSE]))
+      )@data
+  comps <- comps[, 1:object$num, drop = FALSE]
+  colnames(comps) <- names0(ncol(comps), object$prefix)
+  newdata <- cbind(newdata, as_tibble(comps))
+  newdata <-
+    newdata[, !(colnames(newdata) %in% isomap_vars), drop = FALSE]
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+
+print.step_isomap <-
+  function(x, width = max(20, options()$width - 35), ...) {
+    cat("Isomap approximation with ")
+    printer(colnames(x$res at org.data), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/knn_imp.R b/R/knn_imp.R
new file mode 100644
index 0000000..1b4711e
--- /dev/null
+++ b/R/knn_imp.R
@@ -0,0 +1,192 @@
+#' Imputation via K-Nearest Neighbors
+#'
+#' \code{step_knnimpute} creates a \emph{specification} of a recipe step that
+#'   will impute missing data using nearest neighbors.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose variables. For
+#'   \code{step_knnimpute}, this indicates the variables to be imputed. When
+#'   used with \code{imp_vars}, the dots indicates which variables are used to
+#'   predict the missing data in each variable. See \code{\link{selections}}
+#'   for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param impute_with A call to \code{imp_vars} to specify which variables are
+#'   used to impute the variables that can include specific variable names
+#'   separated by commas or different selectors (see
+#'   \code{\link{selections}}).  If a column is included in both lists to be
+#'   imputed and to be an imputation predictor, it will be removed from the
+#'   latter and not used to impute itself.
+#' @param K The number of neighbors.
+#' @param ref_data A tibble of data that will reflect the data preprocessing
+#'   done up to the point of this imputation step. This is
+#'   \code{NULL} until the step is trained by \code{\link{prep.recipe}}.
+#' @param columns The column names that will be imputed and used for
+#'   imputation. This is  \code{NULL} until the step is trained by
+#'   \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing imputation
+#' @export
+#' @details The step uses the training set to impute any other data sets. The
+#'   only distance function available is Gower's distance which can be used for
+#'   mixtures of nominal and numeric data.
+#'
+#' Once the nearest neighbors are determined, the mode is used to predictor
+#'   nominal variables and the mean is used for numeric data.
+#'
+#' Note that if a variable that is to be imputed is also in \code{impute_with},
+#'   this variable will be ignored.
+#'
+#' It is possible that missing values will still occur after imputation if a
+#'   large majority (or all) of the imputing variables are also missing.
+#' @references Gower, C. (1971) "A general coefficient of similarity and some
+#'   of its properties," Biometrics, 857-871.
+#' @examples
+#' library(recipes)
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training", ]
+#' biomass_te <- biomass[biomass$dataset == "Testing", ]
+#' biomass_te_whole <- biomass_te
+#'
+#' # induce some missing data at random
+#' set.seed(9039)
+#' carb_missing <- sample(1:nrow(biomass_te), 3)
+#' nitro_missing <- sample(1:nrow(biomass_te), 3)
+#'
+#' biomass_te$carbon[carb_missing] <- NA
+#' biomass_te$nitrogen[nitro_missing] <- NA
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' ratio_recipe <- rec %>%
+#'   step_knnimpute(all_predictors(), K = 3)
+#' ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr)
+#' imputed <- bake(ratio_recipe2, biomass_te)
+#'
+#' # how well did it work?
+#' summary(biomass_te_whole$carbon)
+#' cbind(before = biomass_te_whole$carbon[carb_missing],
+#'       after = imputed$carbon[carb_missing])
+#'
+#' summary(biomass_te_whole$nitrogen)
+#' cbind(before = biomass_te_whole$nitrogen[nitro_missing],
+#'       after = imputed$nitrogen[nitro_missing])
+
+step_knnimpute <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           K = 5,
+           impute_with = imp_vars(all_predictors()),
+           ref_data = NULL,
+           columns = NULL) {
+    if (is.null(impute_with))
+      stop("Please list some variables in `impute_with`", call. = FALSE)
+    add_step(
+      recipe,
+      step_knnimpute_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        K = K,
+        impute_with = impute_with,
+        ref_data = ref_data,
+        columns = columns
+      )
+    )
+  }
+
+step_knnimpute_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           K = NULL,
+           impute_with = NULL,
+           ref_data = NULL,
+           columns = NA) {
+    step(
+      subclass = "knnimpute",
+      terms = terms,
+      role = role,
+      trained = trained,
+      K = K,
+      impute_with = impute_with,
+      ref_data = ref_data,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_knnimpute <- function(x, training, info = NULL, ...) {
+  var_lists <-
+    impute_var_lists(
+      to_impute = x$terms,
+      impute_using = x$impute_with,
+      info = info
+    )
+  all_x_vars <- lapply(var_lists, function(x) c(x$x, x$y))
+  all_x_vars <- unique(unlist(all_x_vars))
+  
+  x$columns <- var_lists
+  x$ref_data <- training[, all_x_vars]
+  x$trained <- TRUE
+  x
+}
+
+#' @importFrom gower gower_topn
+nn_index <- function(.new, .old, vars, K) {
+  gower_topn(.old[, vars], .new[, vars], n = K, nthread = 1)$index
+}
+
+nn_pred <- function(index, dat) {
+  dat <- dat[index, ]
+  dat <- getElement(dat, names(dat))
+  dat <- dat[!is.na(dat)]
+  est <- if (is.factor(dat) | is.character(dat))
+    mode_est(dat)
+  else
+    mean(dat)
+  est
+}
+
+#' @importFrom tibble as_tibble
+#' @importFrom stats predict complete.cases
+#' @export
+bake.step_knnimpute <- function(object, newdata, ...) {
+  missing_rows <- !complete.cases(newdata)
+  if (!any(missing_rows))
+    return(newdata)
+  
+  old_data <- newdata
+  for (i in seq(along = object$columns)) {
+    imp_var <- object$columns[[i]]$y
+    missing_rows <- !complete.cases(newdata[, imp_var])
+    if (any(missing_rows)) {
+      preds <- object$columns[[i]]$x
+      new_data <- old_data[missing_rows, preds, drop = FALSE]
+      ## do a better job of checking this:
+      if (all(is.na(new_data))) {
+        warning("All predictors are missing; cannot impute", call. = FALSE)
+      } else {
+        nn_ind <- nn_index(object$ref_data, new_data, preds, object$K)
+        pred_vals <-
+          apply(nn_ind, 2, nn_pred, dat = object$ref_data[, imp_var])
+        newdata[missing_rows, imp_var] <- pred_vals
+      }
+    }
+  }
+  newdata
+}
+
+
+print.step_knnimpute <-
+  function(x, width = max(20, options()$width - 31), ...) {
+    all_x_vars <- lapply(x$columns, function(x) x$x)
+    all_x_vars <- unique(unlist(all_x_vars))
+    cat(x$K, "-nearest neighbor imputation for ", sep = "")
+    printer(all_x_vars, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/kpca.R b/R/kpca.R
new file mode 100644
index 0000000..b824f8e
--- /dev/null
+++ b/R/kpca.R
@@ -0,0 +1,179 @@
+#' Kernel PCA Signal Extraction
+#'
+#' \code{step_kpca} a \emph{specification} of a recipe step that will convert
+#'   numeric data into one or more principal components using a kernel basis
+#'   expansion.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   used to compute the components. See \code{\link{selections}} for more
+#'   details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new principal
+#'   component columns created by the original variables will be used as
+#'   predictors in a model.
+#' @param num The number of PCA components to retain as new predictors. If
+#'   \code{num} is greater than the number of columns or the number of possible
+#'   components, a smaller value will be used.
+#' @param options A list of options to \code{\link[kernlab]{kpca}}. Defaults
+#'   are set for the arguments \code{kernel} and \code{kpar} but others can be
+#'   passed in. \bold{Note} that the arguments \code{x} and \code{features}
+#'   should not be passed here (or at all).
+#' @param res An S4 \code{\link[kernlab]{kpca}} object is stored here once this
+#'   preprocessing step has be trained by \code{\link{prep.recipe}}.
+#' @param prefix A character string that will be the prefix to the resulting
+#'   new variables. See notes below.
+#' @keywords datagen
+#' @concept preprocessing pca projection_methods kernel_methods
+#' @export
+#' @details Kernel principal component analysis (kPCA) is an extension a PCA
+#'   analysis that conducts the calculations in a broader dimensionality
+#'   defined by a kernel function. For example, if a quadratic kernel function
+#'   were used, each variable would be represented by its original values as
+#'   well as its square. This nonlinear mapping is used  during the PCA
+#'   analysis and can potentially help find better representations of the
+#'   original data.
+#'
+#' As with ordinary PCA, it is important to standardized the variables prior
+#'   to running PCA (\code{step_center} and \code{step_scale} can be used for
+#'   this purpose).
+#'
+#' When performing kPCA, the kernel function (and any important kernel
+#'   parameters) must be chosen. The \pkg{kernlab} package is used and the
+#'   reference below discusses the types of kernels available and their
+#'   parameter(s). These specifications can be made in the \code{kernel} and
+#'   \code{kpar} slots of the \code{options} argument to \code{step_kpca}.
+#'
+#' The argument \code{num} controls the number of components that will be
+#'   retained (the original variables that are used to derive the components
+#'   are removed from the data). The new components will have names that begin
+#'   with \code{prefix} and a sequence of numbers. The variable names are
+#'   padded with zeros. For example, if \code{num < 10}, their names will be
+#'   \code{kPC1} - \code{kPC9}. If \code{num = 101}, the names would be
+#'   \code{kPC001} - \code{kPC101}.
+#'
+#' @references Scholkopf, B., Smola, A., and Muller, K. (1997). Kernel
+#'   principal component analysis. \emph{Lecture Notes in Computer Science},
+#'   1327, 583-588.
+#'
+#' Karatzoglou, K., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab -
+#'   An S4 package for kernel methods in R. \emph{Journal of Statistical
+#'   Software}, 11(1), 1-20.
+#'
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' kpca_trans <- rec %>%
+#'   step_YeoJohnson(all_predictors()) %>%
+#'   step_center(all_predictors()) %>%
+#'   step_scale(all_predictors()) %>%
+#'   step_kpca(all_predictors())
+#'
+#' kpca_estimates <- prep(kpca_trans, training = biomass_tr)
+#'
+#' kpca_te <- bake(kpca_estimates, biomass_te)
+#'
+#' rng <- extendrange(c(kpca_te$kPC1, kpca_te$kPC2))
+#' plot(kpca_te$kPC1, kpca_te$kPC2,
+#'      xlim = rng, ylim = rng)
+#' @seealso \code{\link{step_pca}} \code{\link{step_ica}}
+#'   \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+#'
+step_kpca <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           num  = 5,
+           res = NULL,
+           options = list(kernel = "rbfdot",
+                          kpar = list(sigma = 0.2)),
+           prefix = "kPC") {
+  add_step(
+    recipe,
+    step_kpca_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      num = num,
+      res = res,
+      options = options,
+      prefix = prefix
+    )
+  )
+}
+
+step_kpca_new <-
+  function(terms = NULL,
+           role = "predictor",
+           trained = FALSE,
+           num  = NULL,
+           res = NULL,
+           options = NULL,
+           prefix = "kPC") {
+  step(
+    subclass = "kpca",
+    terms = terms,
+    role = role,
+    trained = trained,
+    num = num,
+    res = res,
+    options = options,
+    prefix = prefix
+  )
+}
+
+#' @importFrom dimRed kPCA dimRedData
+#' @export
+prep.step_kpca <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+
+  kprc <- kPCA(stdpars = c(list(ndim = x$num), x$options))
+  kprc <- kprc at fun(
+    dimRedData(as.data.frame(training[, col_names, drop = FALSE])),
+    kprc at stdpars
+  )
+
+  step_kpca_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    num = x$num,
+    options = x$options,
+    res = kprc,
+    prefix = x$prefix
+  )
+}
+
+#' @export
+bake.step_kpca <- function(object, newdata, ...) {
+  pca_vars <- colnames(environment(object$res at apply)$indata)
+  comps <- object$res at apply(
+    dimRedData(as.data.frame(newdata[, pca_vars, drop = FALSE]))
+    )@data
+  comps <- comps[, 1:object$num, drop = FALSE]
+  colnames(comps) <- names0(ncol(comps), object$prefix)
+  newdata <- cbind(newdata, as_tibble(comps))
+  newdata <- newdata[, !(colnames(newdata) %in% pca_vars), drop = FALSE]
+  as_tibble(newdata)
+}
+
+print.step_kpca <- function(x, width = max(20, options()$width - 40), ...) {
+  if(x$trained) {
+    cat("Kernel PCA (", x$res at pars$kernel, ") extraction with ", sep = "")
+    cat(format_ch_vec(colnames(x$res at org.data), width = width))
+  } else {
+    cat("Kernel PCA extraction with ", sep = "")
+    cat(format_selectors(x$terms, wdth = width))
+  }
+  if(x$trained) cat(" [trained]\n") else cat("\n")
+  invisible(x)
+}
diff --git a/R/lincombo.R b/R/lincombo.R
new file mode 100644
index 0000000..c65b008
--- /dev/null
+++ b/R/lincombo.R
@@ -0,0 +1,193 @@
+#' Linear Combination Filter
+#'
+#' \code{step_lincomb} creates a \emph{specification} of a recipe step that
+#'   will potentially remove numeric variables that have linear combinations
+#'   between them.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param max_steps A value .
+#' @param removals A character string that contains the names of columns that
+#'   should be removed. These values are not determined until
+#'   \code{\link{prep.recipe}} is called.
+#' @keywords datagen
+#' @concept preprocessing variable_filters
+#' @author Max Kuhn, Kirk Mettler, and Jed Wing
+#' @export
+#'
+#' @details This step finds exact linear combinations between two or more
+#'   variables and recommends which column(s) should be removed to resolve the
+#'   issue. This algorithm may need to be applied multiple times (as defined
+#'   by \code{max_steps}).
+#' @examples
+#' data(biomass)
+#' 
+#' biomass$new_1 <- with(biomass,
+#'                       .1*carbon - .2*hydrogen + .6*sulfur)
+#' biomass$new_2 <- with(biomass,
+#'                       .5*carbon - .2*oxygen + .6*nitrogen)
+#' 
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#' 
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen +
+#'                 sulfur + new_1 + new_2,
+#'               data = biomass_tr)
+#' 
+#' lincomb_filter <- rec %>%
+#'   step_lincomb(all_predictors())
+#'   
+#' prep(lincomb_filter, training = biomass_tr)
+#' @seealso \code{\link{step_nzv}}\code{\link{step_corr}}
+#'   \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+
+step_lincomb <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           max_steps = 5,
+           removals = NULL) {
+    add_step(
+      recipe,
+      step_lincomb_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        max_steps = max_steps,
+        removals = removals
+      )
+    )
+  }
+
+step_lincomb_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           max_steps = NULL,
+           removals = NULL) {
+    step(
+      subclass = "lincomb",
+      terms = terms,
+      role = role,
+      trained = trained,
+      max_steps = max_steps,
+      removals = removals
+    )
+  }
+
+#' @export
+prep.step_lincomb <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  if (any(info$type[info$variable %in% col_names] != "numeric"))
+    stop("All variables for mean imputation should be numeric")
+  
+  filter <- iter_lc_rm(x = training[, col_names],
+                       max_steps = x$max_steps)
+  
+  step_lincomb_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    max_steps = x$max_steps,
+    removals = filter
+  )
+}
+
+#' @export
+bake.step_lincomb <- function(object, newdata, ...) {
+  if (length(object$removals) > 0)
+    newdata <- newdata[, !(colnames(newdata) %in% object$removals)]
+  as_tibble(newdata)
+}
+
+print.step_lincomb <-
+  function(x,  width = max(20, options()$width - 36), ...) {
+    if (x$trained) {
+      if (length(x$removals) > 0) {
+        cat("Linear combination filter removed ")
+        cat(format_ch_vec(x$removals, width = width))
+      } else
+        cat("Linear combination filter removed no terms")
+    } else {
+      cat("Linear combination filter on ", sep = "")
+      cat(format_selectors(x$terms, wdth = width))
+    }
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
+
+
+recommend_rm <- function(x, eps  = 1e-6, ...) {
+  if (!is.matrix(x))
+    x <- as.matrix(x)
+  if (is.null(colnames(x)))
+    stop("`x` should have column names", call. = FALSE)
+  
+  qr_decomp <- qr(x)
+  qr_decomp_R <- qr.R(qr_decomp)           # extract R matrix
+  num_cols <- ncol(qr_decomp_R)            # number of columns in R
+  rank <- qr_decomp$rank                   # number of independent columns
+  pivot <- qr_decomp$pivot                 # get the pivot vector
+  
+  if (is.null(num_cols) || rank == num_cols) {
+    rm_list <- character(0)                 # there are no linear combinations
+  } else {
+    p1 <- 1:rank
+    X <- qr_decomp_R[p1, p1]                # extract the independent columns
+    Y <- qr_decomp_R[p1, -p1, drop = FALSE] # extract the dependent columns
+    b <- qr(X)                              # factor the independent columns
+    b <- qr.coef(b, Y)                      # get regression coefficients of
+                                            # the dependent columns
+    b[abs(b) < eps] <- 0                    # zap small values
+    
+    # generate a list with one element for each dependent column
+    combos <- lapply(1:ncol(Y),
+                     function(i)
+                       c(pivot[rank + i], pivot[which(b[, i] != 0)]))
+    rm_list <- unlist(lapply(combos, function(x)
+      x[1]))
+    rm_list <- colnames(x)[rm_list]
+  }
+  rm_list
+}
+
+iter_lc_rm <- function(x,
+                       max_steps = 10,
+                       verbose = FALSE) {
+  if (is.null(colnames(x)))
+    stop("`x` should have column names", call. = FALSE)
+  
+  orig_names <- colnames(x)
+  if (!is.matrix(x))
+    x <- as.matrix(x)
+  
+  # converting to matrix may alter column names
+  name_df <- data.frame(orig = orig_names,
+                        current = colnames(x),
+                        stringsAsFactors = FALSE)
+  
+  for (i in 1:max_steps) {
+    if (verbose)
+      cat(i)
+    if (i == max_steps)
+      break ()
+    lcs <- recommend_rm(x)
+    if (length(lcs) == 0)
+      break ()
+    else {
+      if (verbose)
+        cat(" removing", length(lcs), "\n")
+      x <- x[, !(colnames(x) %in% lcs)]
+    }
+  }
+  if (verbose)
+    cat("\n")
+  name_df <- name_df[!(name_df$current %in% colnames(x)), ]
+  name_df$orig
+}
diff --git a/R/log.R b/R/log.R
new file mode 100644
index 0000000..6587faa
--- /dev/null
+++ b/R/log.R
@@ -0,0 +1,95 @@
+#' Logarithmic Transformation
+#'
+#' \code{step_log} creates a \emph{specification} of a recipe step that will
+#'   log transform data.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param base A numeric value for the base.
+#' @param columns A character string of variable names that will be (eventually)
+#'   populated by the \code{terms} argument.
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @examples
+#' set.seed(313)
+#' examples <- matrix(exp(rnorm(40)), ncol = 2)
+#' examples <- as.data.frame(examples)
+#'
+#' rec <- recipe(~ V1 + V2, data = examples)
+#'
+#' log_trans <- rec  %>%
+#'   step_log(all_predictors())
+#'
+#' log_obj <- prep(log_trans, training = examples)
+#'
+#' transformed_te <- bake(log_obj, examples)
+#' plot(examples$V1, transformed_te$V1)
+#' @seealso \code{\link{step_logit}} \code{\link{step_invlogit}}
+#'   \code{\link{step_hyperbolic}}  \code{\link{step_sqrt}}
+#'   \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+
+step_log <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           base = exp(1),
+           columns = NULL) {
+    add_step(
+      recipe,
+      step_log_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        base = base,
+        columns = columns
+      )
+    )
+  }
+
+step_log_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           base = NULL,
+           columns = NULL) {
+    step(
+      subclass = "log",
+      terms = terms,
+      role = role,
+      trained = trained,
+      base = base,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_log <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_log_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    base = x$base,
+    columns = col_names
+  )
+}
+
+#' @export
+bake.step_log <- function(object, newdata, ...) {
+  col_names <- object$columns
+  for (i in seq_along(col_names))
+    newdata[, col_names[i]] <-
+      log(getElement(newdata, col_names[i]), base = object$base)
+  as_tibble(newdata)
+}
+
+print.step_log <-
+  function(x, width = max(20, options()$width - 31), ...) {
+    cat("Log transformation on ", sep = "")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/logit.R b/R/logit.R
new file mode 100644
index 0000000..c2b9e97
--- /dev/null
+++ b/R/logit.R
@@ -0,0 +1,91 @@
+#' Logit Transformation
+#'
+#' \code{step_logit} creates a \emph{specification} of a recipe step that will
+#'   logit transform the data.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param columns A character string of variable names that will be (eventually)
+#'   populated by the \code{terms} argument.
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @details The inverse logit transformation takes values between zero and one
+#'   and translates them to be on the real line using the function
+#'   \code{f(p) = log(p/(1-p))}.
+#' @examples
+#' set.seed(313)
+#' examples <- matrix(runif(40), ncol = 2)
+#' examples <- data.frame(examples)
+#'
+#' rec <- recipe(~ X1 + X2, data = examples)
+#'
+#' logit_trans <- rec  %>%
+#'   step_logit(all_predictors())
+#'
+#' logit_obj <- prep(logit_trans, training = examples)
+#'
+#' transformed_te <- bake(logit_obj, examples)
+#' plot(examples$X1, transformed_te$X1)
+#' @seealso \code{\link{step_invlogit}} \code{\link{step_log}}
+#' \code{\link{step_sqrt}}  \code{\link{step_hyperbolic}} \code{\link{recipe}}
+#' \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+step_logit <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           columns = NULL) {
+    add_step(recipe,
+             step_logit_new(
+               terms = check_ellipses(...),
+               role = role,
+               trained = trained,
+               columns = columns
+             ))
+  }
+
+step_logit_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           columns = NULL) {
+    step(
+      subclass = "logit",
+      terms = terms,
+      role = role,
+      trained = trained,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_logit <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_logit_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    columns = col_names
+  )
+}
+
+#' @importFrom tibble as_tibble
+#' @importFrom stats binomial
+#' @export
+bake.step_logit <- function(object, newdata, ...) {
+  for (i in seq_along(object$columns))
+    newdata[, object$columns[i]] <-
+      binomial()$linkfun(getElement(newdata, object$columns[i]))
+  as_tibble(newdata)
+}
+
+
+print.step_logit <-
+  function(x, width = max(20, options()$width - 33), ...) {
+    cat("Logit transformation on ", sep = "")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/meanimpute.R b/R/meanimpute.R
new file mode 100644
index 0000000..3e9b618
--- /dev/null
+++ b/R/meanimpute.R
@@ -0,0 +1,116 @@
+#' Impute Numeric Data Using the Mean
+#'
+#' \code{step_meanimpute} creates a \emph{specification} of a recipe step that
+#'   will substitute missing values of numeric variables by the training set
+#'   mean of those variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param means A named numeric vector of means. This is \code{NULL} until
+#'   computed by \code{\link{prep.recipe}}.
+#' @param trim The fraction (0 to 0.5) of observations to be trimmed from each
+#'   end of the variables before the mean is computed. Values of trim outside
+#'   that range are taken as the nearest endpoint.
+#' @keywords datagen
+#' @concept preprocessing imputation
+#' @export
+#' @details \code{step_meanimpute} estimates the variable means from the data
+#'   used in the \code{training} argument of \code{prep.recipe}.
+#'   \code{bake.recipe} then applies the new values to new data sets using
+#'   these averages.
+#' @examples
+#' data("credit_data")
+#'
+#' ## missing data per column
+#' vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
+#'
+#' set.seed(342)
+#' in_training <- sample(1:nrow(credit_data), 2000)
+#'
+#' credit_tr <- credit_data[ in_training, ]
+#' credit_te <- credit_data[-in_training, ]
+#' missing_examples <- c(14, 394, 565)
+#'
+#' rec <- recipe(Price ~ ., data = credit_tr)
+#'
+#' impute_rec <- rec %>%
+#'   step_meanimpute(Income, Assets, Debt)
+#'
+#' imp_models <- prep(impute_rec, training = credit_tr)
+#'
+#' imputed_te <- bake(imp_models, newdata = credit_te, everything())
+#'
+#' credit_te[missing_examples,]
+#' imputed_te[missing_examples, names(credit_te)]
+
+step_meanimpute <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           means = NULL,
+           trim = 0) {
+    add_step(
+      recipe,
+      step_meanimpute_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        means = means,
+        trim = trim
+      )
+    )
+  }
+
+step_meanimpute_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           means = NULL,
+           trim = NULL) {
+    step(
+      subclass = "meanimpute",
+      terms = terms,
+      role = role,
+      trained = trained,
+      means = means,
+      trim = trim
+    )
+  }
+
+#' @export
+prep.step_meanimpute <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  if (any(info$type[info$variable %in% col_names] != "numeric"))
+    stop("All variables for mean imputation should be numeric")
+  means <-
+    vapply(training[, col_names],
+           mean,
+           c(mean = 0),
+           trim = x$trim,
+           na.rm = TRUE)
+  step_meanimpute_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    means,
+    trim = x$trim
+  )
+}
+
+#' @export
+bake.step_meanimpute <- function(object, newdata, ...) {
+  for (i in names(object$means)) {
+    if (any(is.na(newdata[, i])))
+      newdata[is.na(newdata[, i]), i] <- object$means[i]
+  }
+  as_tibble(newdata)
+}
+
+print.step_meanimpute <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Mean Imputation for ", sep = "")
+    printer(names(x$means), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/misc.R b/R/misc.R
new file mode 100644
index 0000000..98349ba
--- /dev/null
+++ b/R/misc.R
@@ -0,0 +1,326 @@
+filter_terms <- function(x, ...)
+  UseMethod("filter_terms")
+
+## Buckets variables into discrete, mutally exclusive types
+#' @importFrom tibble tibble
+get_types <- function(x) {
+  var_types <-
+    c(
+      character = "nominal",
+      factor = "nominal",
+      ordered = "nominal",
+      integer = "numeric",
+      numeric = "numeric",
+      double = "numeric",
+      Surv = "censored",
+      logical = "logical",
+      Date = "date",
+      POSIXct = "date"
+    )
+  
+  classes <- lapply(x, class)
+  res <- lapply(classes,
+                function(x, types) {
+                  in_types <- x %in% names(types)
+                  if (sum(in_types) > 0) {
+                    # not sure what to do with multiple matches; right now
+                    ## pick the first match which favors "factor" over "ordered"
+                    out <-
+                      unname(types[min(which(names(types) %in% x))])
+                  } else
+                    out <- "other"
+                  out
+                },
+                types = var_types)
+  res <- unlist(res)
+  tibble(variable = names(res), type = unname(res))
+}
+
+type_by_var <- function(classes, dat) {
+  res <- sapply(dat, is_one_of, what = classes)
+  names(res)[res]
+}
+
+is_one_of <- function(x, what) {
+  res <- sapply(as.list(what),
+                function(class, obj)
+                  inherits(obj, what = class),
+                obj = x)
+  any(res)
+}
+
+## general error trapping functions
+
+check_all_outcomes_same_type <- function(x)
+  x
+
+## get variables from formulas
+is_formula <- function(x)
+  isTRUE(inherits(x, "formula"))
+
+#' @importFrom rlang f_lhs
+get_lhs_vars <- function(formula, data) {
+  if (!is_formula(formula))
+    formula <- as.formula(formula)
+  ## Want to make sure that multiple outcomes can be expressed as
+  ## additions with no cbind business and that `.` works too (maybe)
+  formula <- as.formula(paste("~", deparse(f_lhs(formula))))
+  get_rhs_vars(formula, data)
+}
+
+#' @importFrom stats model.frame
+get_rhs_vars <- function(formula, data) {
+  if (!is_formula(formula))
+    formula <- as.formula(formula)
+  ## This will need a lot of work to account for cases with `.`
+  ## or embedded functions like `Sepal.Length + poly(Sepal.Width)`.
+  ## or should it? what about Y ~ log(x)?
+  data_info <- attr(model.frame(formula, data), "terms")
+  response_info <- attr(data_info, "response")
+  predictor_names <- names(attr(data_info, "dataClasses"))
+  if (length(response_info) > 0 && all(response_info > 0))
+    predictor_names <- predictor_names[-response_info]
+  predictor_names
+}
+
+get_lhs_terms <- function(x) x
+get_rhs_terms <- function(x) x
+
+## ancillary step functions
+
+#' Add a New Step to Current Recipe
+#'
+#' \code{add_step} adds a step to the last location in the recipe.
+#'
+#' @param rec A \code{\link{recipe}}.
+#' @param object A step object.
+#' @keywords datagen
+#' @concept preprocessing
+#' @return A updated \code{\link{recipe}} with the new step in the last slot.
+#' @export
+add_step <- function(rec, object) {
+  rec$steps[[length(rec$steps) + 1]] <- object
+  rec
+}
+
+
+var_by_role <-
+  function(rec,
+           role = "predictor",
+           returnform = TRUE) {
+    res <- rec$var_info$variable[rec$var_info$role == role]
+    if (returnform)
+      res <- as.formula(paste("~",
+                              paste(res, collapse = "+")))
+    res
+  }
+
+## Overall wrapper to make new step_X objects
+#' A General Step Wrapper
+#'
+#' \code{step} sets the class of the step.
+#'
+#' @param subclass A character string for the resulting class. For example,
+#'   if \code{subclass = "blah"} the step object that is returned has class
+#'   \code{step_blah}.
+#' @param ... All arguments to the step that should be returned.
+#' @keywords datagen
+#' @concept preprocessing
+#' @return A updated step with the new class.
+#' @export
+step <- function(subclass, ...) {
+  structure(list(...),
+            class = c(paste0("step_", subclass), "step"))
+}
+
+## then 9 is to keep space for "[trained]"
+format_ch_vec <-
+  function(x,
+           sep = ", ",
+           width = options()$width - 9) {
+    widths <- nchar(x)
+    sep_wd <- nchar(sep)
+    adj_wd <- widths + sep_wd
+    if (sum(adj_wd) >= width) {
+      keepers <- max(which(cumsum(adj_wd) < width)) - 1
+      if (length(keepers) == 0 || keepers < 1) {
+        x <- paste(length(x), "items")
+      } else {
+        x <- c(x[1:keepers], "...")
+      }
+    }
+    paste0(x, collapse = sep)
+  }
+
+format_selectors <- function(x, wdth = options()$width - 9, ...) {
+  ## convert to character without the leading ~
+  x_items <- lapply(x, function(x)
+    as.character(x[-1]))
+  x_items <- unlist(x_items)
+  format_ch_vec(x_items, width = wdth, sep = ", ")
+}
+
+terms.recipe <- function(x, ...)
+  x$term_info
+
+filter_terms.formula <- function(formula, data, ...)
+  get_rhs_vars(formula, data)
+
+
+## This function takes the default arguments of `func` and
+## replaces them with the matching ones in `options` and
+## remove any in `removals`
+sub_args <- function(func, options, removals = NULL) {
+  args <- formals(func)
+  for (i in seq_along(options))
+    args[[names(options)[i]]] <- options[[i]]
+  if (!is.null(removals))
+    args[removals] <- NULL
+  args
+}
+## Same as above but starts with a call object
+mod_call_args <- function(cl, args, removals = NULL) {
+  if (!is.null(removals))
+    for (i in removals)
+      cl[[i]] <- NULL
+    arg_names <- names(args)
+    for (i in arg_names)
+      cl[[i]] <- args[[i]]
+    cl
+}
+
+#' Sequences of Names with Padded Zeros
+#'
+#' This function creates a series of \code{num} names with a common prefix.
+#'   The names are numbered with leading zeros (e.g.
+#'   \code{prefix01}-\code{prefix10} instead of \code{prefix1}-\code{prefix10}).
+#'
+#' @param num A single integer for how many elements are created.
+#' @param prefix A character string that will start each name. .
+#' @return A character string of length \code{num}.
+#' @keywords datagen
+#' @concept string_functions naming_functions
+#' @export
+
+
+names0 <- function(num, prefix = "x") {
+  if (num < 1)
+    stop("`num` should be > 0", call. = FALSE)
+  ind <- format(1:num)
+  ind <- gsub(" ", "0", ind)
+  paste0(prefix, ind)
+}
+
+
+
+## As suggested by HW, brought in from the `pryr` package
+## https://github.com/hadley/pryr
+fun_calls <- function(f) {
+  if (is.function(f)) {
+    fun_calls(body(f))
+  } else if (is.call(f)) {
+    fname <- as.character(f[[1]])
+    # Calls inside .Internal are special and shouldn't be included
+    if (identical(fname, ".Internal"))
+      return(fname)
+    unique(c(fname, unlist(lapply(f[-1], fun_calls), use.names = FALSE)))
+  }
+}
+
+
+get_levels <- function(x) {
+  if (!is.factor(x) & !is.character(x))
+    return(list(values = NA, ordered = NA))
+  out <- if (is.factor(x))
+    list(values = levels(x), ordered = is.ordered(x))
+  else
+    list(values = sort(unique(x)), ordered = FALSE)
+  out
+}
+
+has_lvls <- function(info)
+  !vapply(info, function(x) all(is.na(x$values)), c(logic = TRUE))
+
+strings2factors <- function(x, info) {
+  check_lvls <- has_lvls(info)
+  if (!any(check_lvls))
+    return(x)
+  info <- info[check_lvls]
+  for (i in seq_along(info)) {
+    lcol <- names(info)[i]
+    x[, lcol] <- factor(as.character(getElement(x, lcol)), 
+                        levels = info[[i]]$values, 
+                        ordered = info[[i]]$ordered)
+  }
+  x
+}
+
+## short summary of training set
+train_info <- function(x) {
+  data.frame(nrows = nrow(x),
+             ncomplete = sum(complete.cases(x)))
+}
+
+# Per LH and HW, brought in from the `dplyr` package
+is_negated <- function(x) {
+  is_lang(x, "-", n = 1)
+}
+
+## `merge_term_info` takes the information on the current variable
+## list and the information on the new set of variables (after each step)
+## and merges them. Special attention is paid to cases where the
+## _type_ of data is changed for a common column in the data.
+
+#' @importFrom dplyr left_join
+merge_term_info <- function(.new, .old) {
+  # Look for conflicts where the new variable type is different from
+  # the original value
+  tmp_new <- .new
+  names(tmp_new)[names(tmp_new) == "type"] <- "new_type"
+  tmp <- left_join(tmp_new[, c("variable", "new_type")],
+                   .old[, c("variable", "type")],
+                   by = "variable")
+  tmp <- tmp[!(is.na(tmp$new_type) | is.na(tmp$type)), ]
+  diff_type <- !(tmp$new_type == tmp$type)
+  if (any(diff_type)) {
+    ## Override old type to facilitate merge
+    .old$type[which(diff_type)] <- .new$type[which(diff_type)]
+  }
+  left_join(.new, .old, by = c("variable", "type"))
+}
+
+#' @importFrom rlang quos is_empty
+check_ellipses <- function(...) {
+  terms <- quos(...)
+  if (is_empty(terms))
+    stop("Please supply at least one variable specification.",
+         "See ?selections.",
+         call. = FALSE)
+  terms
+}
+
+#' @importFrom magrittr %>%
+#' @export
+magrittr::`%>%`
+
+printer <- function(tr_obj = NULL, 
+                    untr_obj = NULL, 
+                    trained = FALSE,
+                    width = max(20, options()$width - 30)) {
+  if (trained) {
+    cat(format_ch_vec(tr_obj, width = width))
+  } else
+    cat(format_selectors(untr_obj, wdth = width))
+  if (trained)
+    cat(" [trained]\n")
+  else
+    cat("\n")
+}
+
+
+#' @export
+#' @keywords internal
+#' @rdname recipes-internal
+prepare   <- function(x, ...) 
+  stop("As of version 0.0.1.9006, used `prep` ",
+       "instead of `prepare`", call. = FALSE)
diff --git a/R/modeimpute.R b/R/modeimpute.R
new file mode 100644
index 0000000..9e7deb4
--- /dev/null
+++ b/R/modeimpute.R
@@ -0,0 +1,110 @@
+#' Impute Nominal Data Using the Most Common Value
+#'
+#' \code{step_modeimpute} creates a \emph{specification} of a recipe step that
+#'   will substitute missing values of nominal variables by the training set
+#'   mode of those variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param modes A named character vector of modes. This is \code{NULL} until
+#'   computed by \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing imputation
+#' @export
+#' @details \code{step_modeimpute} estimates the variable modes from the data
+#'   used in the \code{training} argument of \code{prep.recipe}.
+#'   \code{bake.recipe} then applies the new values to new data sets using
+#'   these values. If the training set data has more than one mode, one is
+#'   selected at random.
+#' @examples
+#' data("credit_data")
+#'
+#' ## missing data per column
+#' vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
+#'
+#' set.seed(342)
+#' in_training <- sample(1:nrow(credit_data), 2000)
+#'
+#' credit_tr <- credit_data[ in_training, ]
+#' credit_te <- credit_data[-in_training, ]
+#' missing_examples <- c(14, 394, 565)
+#'
+#' rec <- recipe(Price ~ ., data = credit_tr)
+#'
+#' impute_rec <- rec %>%
+#'   step_modeimpute(Status, Home, Marital)
+#'
+#' imp_models <- prep(impute_rec, training = credit_tr)
+#'
+#' imputed_te <- bake(imp_models, newdata = credit_te, everything())
+#'
+#' table(credit_te$Home, imputed_te$Home, useNA = "always")
+
+step_modeimpute <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           modes = NULL) {
+    add_step(
+      recipe,
+      step_modeimpute_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        modes = modes
+      )
+    )
+  }
+
+step_modeimpute_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           modes = NULL) {
+    step(
+      subclass = "modeimpute",
+      terms = terms,
+      role = role,
+      trained = trained,
+      modes = modes
+    )
+  }
+
+#' @export
+prep.step_modeimpute <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  modes <- vapply(training[, col_names], mode_est, c(mode = ""))
+  step_modeimpute_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    modes
+  )
+}
+
+#' @export
+bake.step_modeimpute <- function(object, newdata, ...) {
+  for (i in names(object$modes)) {
+    if (any(is.na(newdata[, i])))
+      newdata[is.na(newdata[, i]), i] <- object$modes[i]
+  }
+  as_tibble(newdata)
+}
+
+print.step_modeimpute <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Mode Imputation for ", sep = "")
+    printer(names(x$modes), x$terms, x$trained, width = width)
+    invisible(x)
+  }
+
+mode_est <- function(x) {
+  if (!is.character(x) & !is.factor(x))
+    stop("The data should be character or factor to compute the mode.",
+         call. = FALSE)
+  tab <- table(x)
+  modes <- names(tab)[tab == max(tab)]
+  sample(modes, size = 1)
+}
diff --git a/R/ns.R b/R/ns.R
new file mode 100644
index 0000000..f6cd0bc
--- /dev/null
+++ b/R/ns.R
@@ -0,0 +1,141 @@
+#' Nature Spline Basis Functions
+#'
+#' \code{step_ns} creates a \emph{specification} of a recipe step that will
+#'   create new columns that are basis expansions of variables using natural
+#'   splines.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new columns
+#'   created from the original variables will be used as predictors in a model.
+#' @param objects A list of \code{\link[splines]{ns}} objects created once the
+#'   step has been trained.
+#' @param options A list of options for \code{\link[splines]{ns}} which should
+#'   not include \code{x}.
+#' @keywords datagen
+#' @concept preprocessing basis_expansion
+#' @export
+#' @details \code{step_ns} can new features from a single variable that enable
+#'   fitting routines to model this variable in a nonlinear manner. The extent
+#'   of the possible nonlinearity is determined by the \code{df} or \code{knot}
+#'   arguments of \code{\link[splines]{ns}}. The original variables are
+#'   removed from the data and new columns are added. The naming convention
+#'   for the new variables is \code{varname_ns_1} and so on.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' with_splines <- rec %>%
+#'   step_ns(carbon, hydrogen)
+#' with_splines <- prep(with_splines, training = biomass_tr)
+#'
+#' expanded <- bake(with_splines, biomass_te)
+#' expanded
+#' @seealso \code{\link{step_poly}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+step_ns <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           objects = NULL,
+           options = list(df = 2)) {
+    add_step(
+      recipe,
+      step_ns_new(
+        terms = check_ellipses(...),
+        trained = trained,
+        role = role,
+        objects = objects,
+        options = options
+      )
+    )
+  }
+
+step_ns_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           objects = NULL,
+           options = NULL) {
+    step(
+      subclass = "ns",
+      terms = terms,
+      role = role,
+      trained = trained,
+      objects = objects,
+      options = options
+    )
+  }
+
+#' @importFrom splines ns
+ns_wrapper <- function(x, args) {
+  if (!("Boundary.knots" %in% names(args)))
+    args$Boundary.knots <- range(x)
+  args$x <- x
+  ns_obj <- do.call("ns", args)
+  ## don't need to save the original data so keep 1 row
+  out <- matrix(NA, ncol = ncol(ns_obj), nrow = 1)
+  class(out) <- c("ns", "basis", "matrix")
+  attr(out, "knots") <- attr(ns_obj, "knots")[]
+  attr(out, "Boundary.knots") <- attr(ns_obj, "Boundary.knots")
+  attr(out, "intercept") <- attr(ns_obj, "intercept")
+  out
+}
+
+#' @export
+prep.step_ns <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  obj <- lapply(training[, col_names], ns_wrapper, x$options)
+  for (i in seq(along = col_names))
+    attr(obj[[i]], "var") <- col_names[i]
+  step_ns_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    objects = obj,
+    options = x$options
+  )
+}
+
+#' @importFrom tibble as_tibble is_tibble
+#' @importFrom stats predict
+#' @export
+bake.step_ns <- function(object, newdata, ...) {
+  ## pre-allocate a matrix for the basis functions.
+  new_cols <- vapply(object$objects, ncol, c(int = 1L))
+  ns_values <-
+    matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols))
+  colnames(ns_values) <- rep("", sum(new_cols))
+  strt <- 1
+  for (i in names(object$objects)) {
+    cols <- (strt):(strt + new_cols[i] - 1)
+    orig_var <- attr(object$objects[[i]], "var")
+    ns_values[, cols] <-
+      predict(object$objects[[i]], getElement(newdata, i))
+    new_names <-
+      paste(orig_var, "ns", names0(new_cols[i], ""), sep = "_")
+    colnames(ns_values)[cols] <- new_names
+    strt <- max(cols) + 1
+    newdata[, orig_var] <- NULL
+  }
+  newdata <- cbind(newdata, as_tibble(ns_values))
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+
+print.step_ns <-
+  function(x, width = max(20, options()$width - 28), ...) {
+    cat("Natural Splines on ")
+    printer(names(x$objects), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/nzv.R b/R/nzv.R
new file mode 100644
index 0000000..1204f71
--- /dev/null
+++ b/R/nzv.R
@@ -0,0 +1,172 @@
+#' Near-Zero Variance Filter
+#'
+#' \code{step_nzv} creates a \emph{specification} of a recipe step that will
+#'   potentially remove variables that are highly sparse and unbalanced.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables that
+#'   will evaluated by the filtering bake. See \code{\link{selections}} for
+#'   more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param options A list of options for the filter (see Details below).
+#' @param removals A character string that contains the names of columns that
+#'   should be removed. These values are not determined until
+#'   \code{\link{prep.recipe}} is called.
+#' @keywords datagen
+#' @concept preprocessing variable_filters
+#' @export
+#'
+#' @details This step diagnoses predictors that have one unique value (i.e.
+#'   are zero variance predictors) or predictors that are have both of the
+#'   following characteristics:
+#' \enumerate{
+#'   \item they have very few unique values relative to the number of samples
+#'     and
+#'   \item the ratio of the frequency of the most common value to the
+#'     frequency of the second most common value is large.
+#' }
+#'
+#' For example, an example of near zero variance predictor is one that, for
+#'   1000 samples, has two distinct values and 999 of them are a single value.
+#'
+#' To be flagged, first the frequency of the most prevalent value over the
+#'   second most frequent value (called the "frequency ratio") must be above
+#'   \code{freq_cut}. Secondly, the "percent of unique values," the number of
+#'   unique values divided by the total number of samples (times 100), must
+#'   also be below \code{unique_cut}.
+#'
+#' In the above example, the frequency ratio is 999 and the unique value
+#'   percentage is 0.0001.
+#' @examples
+#' data(biomass)
+#'
+#' biomass$sparse <- c(1, rep(0, nrow(biomass) - 1))
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + sparse,
+#'               data = biomass_tr)
+#'
+#' nzv_filter <- rec %>%
+#'   step_nzv(all_predictors())
+#'
+#' filter_obj <- prep(nzv_filter, training = biomass_tr)
+#'
+#' filtered_te <- bake(filter_obj, biomass_te)
+#' any(names(filtered_te) == "sparse")
+#' @seealso \code{\link{step_corr}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+step_nzv <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           options = list(freq_cut = 95 / 5, unique_cut = 10),
+           removals = NULL) {
+    add_step(
+      recipe,
+      step_nzv_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        options = options,
+        removals = removals
+      )
+    )
+  }
+
+step_nzv_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           options = NULL,
+           removals = NULL) {
+    step(
+      subclass = "nzv",
+      terms = terms,
+      role = role,
+      trained = trained,
+      options = options,
+      removals = removals
+    )
+  }
+
+#' @export
+prep.step_nzv <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  filter <- nzv(
+    x = training[, col_names],
+    freq_cut = x$options$freq_cut,
+    unique_cut = x$options$unique_cut
+  )
+  
+  step_nzv_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    options = x$options,
+    removals = filter
+  )
+}
+
+#' @export
+bake.step_nzv <- function(object, newdata, ...) {
+  if (length(object$removals) > 0)
+    newdata <- newdata[, !(colnames(newdata) %in% object$removals)]
+  as_tibble(newdata)
+}
+
+print.step_nzv <-
+  function(x, width = max(20, options()$width - 38), ...) {
+    if (x$trained) {
+      if (length(x$removals) > 0) {
+        cat("Sparse, unbalanced variable filter removed ")
+        cat(format_ch_vec(x$removals, width = width))
+      } else
+        cat("Sparse, unbalanced variable filter removed no terms")
+    } else {
+      cat("Correlation filter on ", sep = "")
+      cat(format_selectors(x$terms, wdth = width))
+    }
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
+
+nzv <- function(x,
+                freq_cut = 95 / 5,
+                unique_cut = 10) {
+  if (is.null(dim(x)))
+    x <- matrix(x, ncol = 1)
+  
+  fr_foo <- function(data) {
+    t <- table(data[!is.na(data)])
+    if (length(t) <= 1) {
+      return(0)
+    }
+    w <- which.max(t)
+    
+    return(max(t, na.rm = TRUE) / max(t[-w], na.rm = TRUE))
+  }
+  
+  freq_ratio <- vapply(x, fr_foo, c(ratio = 0))
+  uni_foo <- function(data)
+    length(unique(data[!is.na(data)]))
+  lunique <- vapply(x, uni_foo, c(num = 0))
+  pct_unique <- 100 * lunique / vapply(x, length, c(num = 0))
+  
+  zero_func <- function(data)
+    all(is.na(data))
+  zero_var <- (lunique == 1) | vapply(x, zero_func, c(zv = TRUE))
+  
+  out <-
+    which( (freq_ratio > freq_cut &
+             pct_unique <= unique_cut) | zero_var)
+  names(out) <- NULL
+  colnames(x)[out]
+}
diff --git a/R/ordinalscore.R b/R/ordinalscore.R
new file mode 100644
index 0000000..56432e0
--- /dev/null
+++ b/R/ordinalscore.R
@@ -0,0 +1,115 @@
+#' Convert Ordinal Factors to Numeric Scores
+#'
+#' \code{step_ordinalscore} creates a \emph{specification} of a recipe step that
+#'   will convert ordinal factor variables into numeric scores.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param columns A character string of variables that will be converted. This is \code{NULL}
+#'   until computed by \code{\link{prep.recipe}}.
+#' @param convert A function that takes an ordinal factor vector as an input and outputs a single numeric variable.
+#' @keywords datagen
+#' @concept preprocessing ordinal_data
+#' @export
+#' @details Dummy variables from ordered factors with \code{C} levels will create polynomial basis functions with \code{C-1} terms. As an alternative, this step can be used to translate the ordered levels into a single numeric vector of values that represent (subjective) scores. By default, the translation uses a linear scale (1, 2, 3, ... \code{C}) but custom score functions can also be used (see the example below). 
+#' @examples
+#' fail_lvls <- c("meh", "annoying", "really_bad")
+#' 
+#' ord_data <- 
+#'   data.frame(item = c("paperclip", "twitter", "airbag"),
+#'              fail_severity = factor(fail_lvls,
+#'                                     levels = fail_lvls,
+#'                                     ordered = TRUE))
+#' 
+#' model.matrix(~fail_severity, data = ord_data)
+#' 
+#' linear_values <- recipe(~ item + fail_severity, data = ord_data) %>%
+#'   step_dummy(item) %>%
+#'   step_ordinalscore(fail_severity)
+#' 
+#' linear_values <- prep(linear_values, training = ord_data, retain = TRUE)
+#' 
+#' juice(linear_values, everything())
+#' 
+#' custom <- function(x) {
+#'   new_values <- c(1, 3, 7)
+#'   new_values[as.numeric(x)]
+#' }
+#' 
+#' nonlin_scores <- recipe(~ item + fail_severity, data = ord_data) %>%
+#'   step_dummy(item) %>%
+#'   step_ordinalscore(fail_severity, convert = custom)
+#' 
+#' nonlin_scores <- prep(nonlin_scores, training = ord_data, retain = TRUE)
+#' 
+#' juice(nonlin_scores, everything())
+
+step_ordinalscore <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           columns = NULL,
+           convert = as.numeric) {
+    add_step(
+      recipe,
+      step_ordinalscore_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        columns = columns,
+        convert = convert
+      )
+    )
+  }
+
+step_ordinalscore_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           columns = NULL,
+           convert = NULL) {
+    step(
+      subclass = "ordinalscore",
+      terms = terms,
+      role = role,
+      trained = trained,
+      columns = columns,
+      convert = convert
+    )
+  }
+
+#' @export
+prep.step_ordinalscore <-
+  function(x, training, info = NULL, ...) {
+    col_names <- terms_select(x$terms, info = info)
+    ord_check <-
+      vapply(training[, col_names], is.ordered, c(logic = TRUE))
+    if (!all(ord_check))
+      stop("Ordinal factor variables should be selected as ",
+           "inputs into this step.",
+           call. = TRUE)
+    step_ordinalscore_new(
+      terms = x$terms,
+      role = x$role,
+      trained = TRUE,
+      columns = col_names,
+      convert = x$convert
+    )
+  }
+
+#' @export
+bake.step_ordinalscore <- function(object, newdata, ...) {
+  scores <- lapply(newdata[, object$columns], object$convert)
+  for (i in object$columns)
+    newdata[, i] <- scores[[i]]
+  as_tibble(newdata)
+}
+
+print.step_ordinalscore <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Scoring for ", sep = "")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/other.R b/R/other.R
new file mode 100644
index 0000000..cd312e4
--- /dev/null
+++ b/R/other.R
@@ -0,0 +1,169 @@
+#' Collapse Some Categorical Levels
+#'
+#' \code{step_other} creates a \emph{specification} of a recipe step that will
+#'    potentially pool infrequently occurring values into an "other" category.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables that
+#'   will potentially be reduced. See \code{\link{selections}} for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param threshold A single numeric value in (0, 1) for pooling.
+#' @param other A single character value for the "other" category.
+#' @param objects A list of objects that contain the information to pool
+#'   infrequent levels that is determined by \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing factors
+#' @export
+#' @details The overall proportion of the categories are computed. The "other"
+#'   category is used in place of any categorical levels whose individual
+#'   proportion in the training set is less than \code{threshold}.
+#'
+#' If no pooling is done the data are unmodified (although character data may
+#'   be changed to factors based on the value of \code{stringsAsFactors} in
+#'   \code{\link{prep.recipe}}). Otherwise, a factor is always returned with
+#'   different factor levels.
+#'
+#' If \code{threshold} is less than the largest category proportion, all levels
+#'   except for the most frequent are collapsed to the \code{other} level.
+#'
+#' If the retained categories include the value of \code{other}, an error is
+#'   thrown. If \code{other} is in the list of discarded levels, no error
+#'   occurs.
+#' @examples
+#' data(okc)
+#'
+#' set.seed(19)
+#' in_train <- sample(1:nrow(okc), size = 30000)
+#'
+#' okc_tr <- okc[ in_train,]
+#' okc_te <- okc[-in_train,]
+#'
+#' rec <- recipe(~ diet + location, data = okc_tr)
+#'
+#'
+#' rec <- rec %>%
+#'   step_other(diet, location, threshold = .1, other = "other values")
+#' rec <- prep(rec, training = okc_tr)
+#'
+#' collapsed <- bake(rec, okc_te)
+#' table(okc_te$diet, collapsed$diet, useNA = "always")
+
+step_other <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           threshold = .05,
+           other = "other",
+           objects = NULL) {
+    if (threshold <= 0)
+      stop("`threshold` should be greater than zero", call. = FALSE)
+    if (threshold >= 1)
+      stop("`threshold` should be less than one", call. = FALSE)
+    add_step(
+      recipe,
+      step_other_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        threshold = threshold,
+        other = other,
+        objects = objects
+      )
+    )
+  }
+
+step_other_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           threshold = NULL,
+           other = NULL,
+           objects = NULL) {
+    step(
+      subclass = "other",
+      terms = terms,
+      role = role,
+      trained = trained,
+      threshold = threshold,
+      other = other,
+      objects = objects
+    )
+  }
+
+#' @importFrom stats sd
+#' @export
+prep.step_other <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  objects <- lapply(training[, col_names],
+                    keep_levels,
+                    prop = x$threshold,
+                    other = x$other)
+  step_other_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    threshold = x$threshold,
+    other = x$other,
+    objects = objects
+  )
+}
+
+#' @importFrom tibble as_tibble is_tibble
+#' @export
+bake.step_other <- function(object, newdata, ...) {
+  for (i in names(object$objects)) {
+    if (object$objects[[i]]$collapse) {
+      tmp <- if (!is.character(newdata[, i]))
+        as.character(getElement(newdata, i))
+      else
+        getElement(newdata, i)
+      
+      tmp <- ifelse(tmp %in% object$objects[[i]]$keep,
+                    tmp,
+                    object$objects[[i]]$other)
+      tmp <- factor(tmp,
+                    levels = c(object$objects[[i]]$keep,
+                               object$objects[[i]]$other))
+      tmp[is.na(getElement(newdata, i))] <- NA
+      newdata[, i] <- tmp
+    }
+  }
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+print.step_other <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Collapsing factor levels for ", sep = "")
+    printer(names(x$objects), x$terms, x$trained, width = width)
+    invisible(x)
+  }
+
+keep_levels <- function(x, prop = .1, other = "other") {
+  if (!is.factor(x))
+    x <- factor(x)
+  xtab <-
+    sort(table(x, useNA = "no"), decreasing = TRUE) / sum(!is.na(x))
+  dropped <- which(xtab < prop)
+  orig <- levels(x)
+  collapse <- length(dropped) > 0
+  if (collapse) {
+    keepers <- names(xtab[-dropped])
+    if (length(keepers) == 0)
+      keepers <- names(xtab)[which.max(xtab)]
+    if (other %in% keepers)
+      stop(
+        "The level ",
+        other,
+        " is already a factor level that will be retained. ",
+        "Please choose a different value.", call. = FALSE
+      )
+  } else
+    keepers <- orig
+  list(keep = orig[orig %in% keepers],
+       collapse = collapse,
+       other = other)
+}
diff --git a/R/pca.R b/R/pca.R
new file mode 100644
index 0000000..84bc86c
--- /dev/null
+++ b/R/pca.R
@@ -0,0 +1,192 @@
+#' PCA Signal Extraction
+#'
+#' \code{step_pca} creates a \emph{specification} of a recipe step that will
+#'   convert numeric data into one or more principal components.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   used to compute the components. See \code{\link{selections}} for more
+#'   details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new principal
+#'   component columns created by the original variables will be used as
+#'   predictors in a model.
+#' @param num The number of PCA components to retain as new predictors. If
+#'   \code{num} is greater than the number of columns or the number of
+#'   possible components, a smaller value will be used.
+#' @param threshold A fraction of the total variance that should be covered
+#'   by the components. For example, \code{threshold = .75} means that
+#'   \code{step_pca} should generate enough components to capture 75\% of the
+#'   variability in the variables. Note: using this argument will override and
+#'   resent any value given to \code{num}.
+#' @param options A list of options to the default method for
+#'   \code{\link[stats]{prcomp}}. Argument defaults are set to
+#'   \code{retx = FALSE}, \code{center = FALSE}, \code{scale. = FALSE}, and
+#'   \code{tol = NULL}. \bold{Note} that the argument \code{x} should not be
+#'   passed here (or at all).
+#' @param res The \code{\link[stats]{prcomp.default}} object is stored here
+#'   once this preprocessing step has be trained by \code{\link{prep.recipe}}.
+#' @param prefix A character string that will be the prefix to the resulting
+#'   new variables. See notes below
+#' @keywords datagen
+#' @concept preprocessing pca projection_methods
+#' @export
+#' @details
+#' Principal component analysis (PCA) is a transformation of a group of
+#'   variables that produces a new set of artificial features or components.
+#'   These components are designed to capture the maximum amount of information
+#'   (i.e. variance) in the original variables. Also, the components are
+#'   statistically independent from one another. This means that they can be
+#'   used to combat large inter-variables correlations in a data set.
+#'
+#' It is advisable to standardized the variables prior to running PCA. Here,
+#'   each variable will be centered and scaled prior to the PCA calculation.
+#'   This can be changed using the \code{options} argument or by using
+#'   \code{\link{step_center}} and \code{\link{step_scale}}.
+#'
+#' The argument \code{num} controls the number of components that will be
+#'   retained (the original variables that are used to derive the components
+#'   are removed from the data). The new components will have names that begin
+#'   with \code{prefix} and a sequence of numbers. The variable names are
+#'   padded with zeros. For example, if \code{num < 10}, their names will be
+#'   \code{PC1} - \code{PC9}. If \code{num = 101}, the names would be
+#'   \code{PC001} - \code{PC101}.
+#'
+#' Alternatively, \code{threshold} can be used to determine the number of
+#'   components that are required to capture a specified fraction of the total
+#'   variance in the variables.
+#'
+#' @references Jolliffe, I. T. (2010). \emph{Principal Component Analysis}.
+#'   Springer.
+#'
+#' @examples
+#' rec <- recipe( ~ ., data = USArrests)
+#' pca_trans <- rec %>%
+#'   step_center(all_numeric()) %>%
+#'   step_scale(all_numeric()) %>%
+#'   step_pca(all_numeric(), num = 3)
+#' pca_estimates <- prep(pca_trans, training = USArrests)
+#' pca_data <- bake(pca_estimates, USArrests)
+#'
+#' rng <- extendrange(c(pca_data$PC1, pca_data$PC2))
+#' plot(pca_data$PC1, pca_data$PC2,
+#'      xlim = rng, ylim = rng)
+#'
+#' with_thresh <- rec %>%
+#'   step_center(all_numeric()) %>%
+#'   step_scale(all_numeric()) %>%
+#'   step_pca(all_numeric(), threshold = .99)
+#' with_thresh <- prep(with_thresh, training = USArrests)
+#' bake(with_thresh, USArrests)
+#' @seealso \code{\link{step_ica}} \code{\link{step_kpca}}
+#'   \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}}
+#'   \code{\link{bake.recipe}}
+step_pca <- function(recipe,
+                     ...,
+                     role = "predictor",
+                     trained = FALSE,
+                     num  = 5,
+                     threshold = NA,
+                     options = list(),
+                     res = NULL,
+                     prefix = "PC") {
+  if (!is.na(threshold) && (threshold > 1 | threshold <= 0))
+    stop("`threshold` should be on (0, 1].", call. = FALSE)
+  add_step(
+    recipe,
+    step_pca_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      num = num,
+      threshold = threshold,
+      options = options,
+      res = res,
+      prefix = prefix
+    )
+  )
+}
+
+step_pca_new <- function(terms = NULL,
+                         role = "predictor",
+                         trained = FALSE,
+                         num  = NULL,
+                         threshold = NULL,
+                         options = NULL,
+                         res = NULL,
+                         prefix = "PC") {
+  step(
+    subclass = "pca",
+    terms = terms,
+    role = role,
+    trained = trained,
+    num = num,
+    threshold = threshold,
+    options = options,
+    res = res,
+    prefix = prefix
+  )
+}
+
+#' @importFrom stats prcomp
+#' @importFrom rlang expr
+#' @export
+prep.step_pca <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  prc_call <-
+    expr(prcomp(
+      retx = FALSE,
+      center = FALSE,
+      scale. = FALSE,
+      tol = NULL
+    ))
+  if (length(x$options) > 0)
+    prc_call <- mod_call_args(prc_call, args = x$options)
+  prc_call$x <- expr(training[, col_names, drop = FALSE])
+  prc_obj <- eval(prc_call)
+  
+  x$num <- min(x$num, length(col_names))
+  if (!is.na(x$threshold)) {
+    total_var <- sum(prc_obj$sdev ^ 2)
+    num_comp <-
+      which.max(cumsum(prc_obj$sdev ^ 2 / total_var) >= x$threshold)
+    if (length(num_comp) == 0)
+      num_comp <- length(prc_obj$sdev)
+    x$num <- num_comp
+  }
+  ## decide on removing prc elements that aren't used in new projections
+  ## e.g. `sdev` etc.
+  
+  step_pca_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    num = x$num,
+    threshold = x$threshold,
+    options = x$options,
+    res = prc_obj,
+    prefix = x$prefix
+  )
+}
+
+#' @importFrom tibble as_tibble
+#' @export
+bake.step_pca <- function(object, newdata, ...) {
+  pca_vars <- rownames(object$res$rotation)
+  comps <- predict(object$res, newdata = newdata[, pca_vars])
+  comps <- comps[, 1:object$num, drop = FALSE]
+  colnames(comps) <- names0(ncol(comps), object$prefix)
+  newdata <- cbind(newdata, as_tibble(comps))
+  newdata <-
+    newdata[, !(colnames(newdata) %in% pca_vars), drop = FALSE]
+  as_tibble(newdata)
+}
+
+print.step_pca <-
+  function(x, width = max(20, options()$width - 29), ...) {
+    cat("PCA extraction with ")
+    printer(rownames(x$res$rotation), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/pkg.R b/R/pkg.R
new file mode 100644
index 0000000..8580965
--- /dev/null
+++ b/R/pkg.R
@@ -0,0 +1,33 @@
+#' recipes: A package for computing and preprocessing design matrices.
+#'
+#'The \code{recipes} package can be used to create design matrices for modeling
+#'   and to conduct preprocessing of variables. It is meant to be a more
+#'   extensive framework that R's formula method. Some differences between
+#'   simple formula methods and recipes are that
+#'\enumerate{
+#'\item Variables can have arbitrary roles in the analysis beyond predictors
+#'  and outcomes.
+#'\item A recipe consists of one or more steps that define actions on the
+#'  variables.
+#'\item Recipes can be defined sequentially using pipes as well as being
+#'  modifiable and extensible.
+#'}
+#'
+#'
+#' @section Basic Functions:
+#' The three main functions are \code{\link{recipe}}, \code{\link{prep}},
+#'   and \code{\link{bake}}.
+#'
+#' \code{\link{recipe}} defines the operations on the data and the associated
+#'   roles. Once the preprocessing steps are defined, any parameters are
+#'   estimated using \code{\link{prep}}. Once the data are ready for
+#'   transformation, the \code{\link{bake}} function applies the operations.
+#'
+#' @section Step Functions:
+#' These functions are used to add new actions to the recipe and have the
+#'   naming convention \code{"step_action"}. For example,
+#'   \code{\link{step_center}} centers the data to have a zero mean and
+#'   \code{\link{step_dummy}} is used to create dummy variables.
+#' @docType package
+#' @name recipes
+NULL
diff --git a/R/poly.R b/R/poly.R
new file mode 100644
index 0000000..18735b6
--- /dev/null
+++ b/R/poly.R
@@ -0,0 +1,145 @@
+#' Orthogonal Polynomial Basis Functions
+#'
+#' \code{step_poly} creates a \emph{specification} of a recipe step that will
+#'   create new columns that are basis expansions of variables using orthogonal
+#'   polynomials.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new columns
+#'   created from the original variables will be used as predictors in a model.
+#' @param objects A list of \code{\link[stats]{poly}} objects created once the
+#'   step has been trained.
+#' @param options A list of options for  \code{\link[stats]{poly}} which should
+#'   not include \code{x} or \code{simple}. Note that the option
+#'   \code{raw = TRUE} will produce the regular polynomial values (not
+#'   orthogonalized).
+#' @keywords datagen
+#' @concept preprocessing basis_expansion
+#' @export
+#' @details \code{step_poly} can new features from a single variable that
+#'   enable fitting routines to model this variable in a nonlinear manner. The
+#'   extent of the possible nonlinearity is determined by the \code{degree}
+#'   argument of  \code{\link[stats]{poly}}. The original variables are
+#'   removed from the data and new columns are added. The naming convention
+#'   for the new variables is \code{varname_poly_1} and so on.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' quadratic <- rec %>%
+#'   step_poly(carbon, hydrogen)
+#' quadratic <- prep(quadratic, training = biomass_tr)
+#'
+#' expanded <- bake(quadratic, biomass_te)
+#' expanded
+#' @seealso \code{\link{step_ns}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+
+step_poly <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           objects = NULL,
+           options = list(degree = 2)) {
+    add_step(
+      recipe,
+      step_poly_new(
+        terms = check_ellipses(...),
+        trained = trained,
+        role = role,
+        objects = objects,
+        options = options
+      )
+    )
+  }
+
+step_poly_new <- function(terms = NULL,
+                          role = NA,
+                          trained = FALSE,
+                          objects = NULL,
+                          options = NULL) {
+  step(
+    subclass = "poly",
+    terms = terms,
+    role = role,
+    trained = trained,
+    objects = objects,
+    options = options
+  )
+}
+
+
+poly_wrapper <- function(x, args) {
+  args$x <- x
+  args$simple <- FALSE
+  poly_obj <- do.call("poly", args)
+  
+  ## don't need to save the original data so keep 1 row
+  out <- matrix(NA, ncol = ncol(poly_obj), nrow = 1)
+  class(out) <- c("poly", "basis", "matrix")
+  attr(out, "degree") <- attr(poly_obj, "degree")
+  attr(out, "coefs") <- attr(poly_obj, "coefs")
+  out
+}
+
+#' @importFrom stats poly
+#' @export
+prep.step_poly <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  obj <- lapply(training[, col_names], poly_wrapper, x$options)
+  for (i in seq(along = col_names))
+    attr(obj[[i]], "var") <- col_names[i]
+  
+  step_poly_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    objects = obj,
+    options = x$options
+  )
+}
+
+#' @importFrom tibble as_tibble is_tibble
+#' @importFrom stats predict
+#' @export
+bake.step_poly <- function(object, newdata, ...) {
+  ## pre-allocate a matrix for the basis functions.
+  new_cols <- vapply(object$objects, ncol, c(int = 1L))
+  poly_values <-
+    matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols))
+  colnames(poly_values) <- rep("", sum(new_cols))
+  strt <- 1
+  for (i in names(object$objects)) {
+    cols <- (strt):(strt + new_cols[i] - 1)
+    orig_var <- attr(object$objects[[i]], "var")
+    poly_values[, cols] <-
+      predict(object$objects[[i]], getElement(newdata, i))
+    new_names <-
+      paste(orig_var, "poly", names0(new_cols[i], ""), sep = "_")
+    colnames(poly_values)[cols] <- new_names
+    strt <- max(cols) + 1
+    newdata[, orig_var] <- NULL
+  }
+  newdata <- cbind(newdata, as_tibble(poly_values))
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+
+print.step_poly <-
+  function(x, width = max(20, options()$width - 35), ...) {
+    cat("Orthogonal polynomials on ")
+    printer(names(x$objects), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/range.R b/R/range.R
new file mode 100644
index 0000000..ac7f4d7
--- /dev/null
+++ b/R/range.R
@@ -0,0 +1,122 @@
+#' Scaling Numeric Data to a Specific Range
+#'
+#' \code{step_range} creates a \emph{specification} of a recipe step that will
+#'   normalize numeric data to have a standard deviation of one.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   scaled. See \code{\link{selections}} for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param min A single numeric value for the smallest value in the range
+#' @param max A single numeric value for the largest value in the range
+#' @param ranges A character vector of variables that will be normalized. Note
+#'   that this is ignored until the values are determined by
+#'   \code{\link{prep.recipe}}. Setting this value will be ineffective.
+#' @keywords datagen
+#' @concept preprocessing normalization_methods
+#' @export
+#' @details Scaling data means that the standard deviation of a variable is
+#'   divided out of the data. \code{step_range} estimates the variable standard
+#'   deviations from the data used in the \code{training} argument of
+#'   \code{prep.recipe}. \code{bake.recipe} then applies the scaling to new
+#'   data sets using these standard deviations.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' ranged_trans <- rec %>%
+#'   step_range(carbon, hydrogen)
+#'
+#' ranged_obj <- prep(ranged_trans, training = biomass_tr)
+#'
+#' transformed_te <- bake(ranged_obj, biomass_te)
+#'
+#' biomass_te[1:10, names(transformed_te)]
+#' transformed_te
+
+step_range <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           min = 0,
+           max = 1,
+           ranges = NULL) {
+    add_step(
+      recipe,
+      step_range_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        min = min,
+        max = max,
+        ranges = ranges
+      )
+    )
+  }
+
+step_range_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           min = 0,
+           max = 1,
+           ranges = NULL) {
+    step(
+      subclass = "range",
+      terms = terms,
+      role = role,
+      trained = trained,
+      min = min,
+      max = max,
+      ranges = ranges
+    )
+  }
+
+#' @importFrom stats sd
+#' @export
+prep.step_range <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  mins <-
+    vapply(training[, col_names], min, c(min = 0), na.rm = TRUE)
+  maxs <-
+    vapply(training[, col_names], max, c(max = 0), na.rm = TRUE)
+  step_range_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    min = x$min,
+    max = x$max,
+    ranges = rbind(mins, maxs)
+  )
+}
+
+#' @export
+bake.step_range <- function(object, newdata, ...) {
+  tmp <- as.matrix(newdata[, colnames(object$ranges)])
+  tmp <- sweep(tmp, 2, object$ranges[1, ], "-")
+  tmp <- tmp * (object$max - object$min)
+  tmp <- sweep(tmp, 2, object$ranges[2, ] - object$ranges[1, ], "/")
+  tmp <- tmp + object$min
+  
+  tmp[tmp < object$min] <- object$min
+  tmp[tmp > object$max] <- object$max
+  
+  if (is.matrix(tmp) && ncol(tmp) == 1)
+    tmp <- tmp[, 1]
+  newdata[, colnames(object$ranges)] <- tmp
+  as_tibble(newdata)
+}
+
+print.step_range <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Range scaling to [", x$min, ",", x$max, "] for ", sep = "")
+    printer(names(x$ranges), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/ratio.R b/R/ratio.R
new file mode 100644
index 0000000..38af34c
--- /dev/null
+++ b/R/ratio.R
@@ -0,0 +1,156 @@
+#' Ratio Variable Creation
+#'
+#' \code{step_ratio} creates a a \emph{specification} of a recipe step that
+#'   will create one or more ratios out of numeric variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will
+#'   be used in the \emph{numerator} of the ratio. When used with
+#'   \code{denom_vars}, the dots indicates which variables are used in the
+#'   \emph{denominator}. See \code{\link{selections}} for more details.
+#' @param role For terms created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the newly created
+#'   ratios created by the original variables will be used as
+#'   predictors in a model.
+#' @param denom A call to \code{denom_vars} to specify which variables are
+#'   used in the denominator that can include specific variable names
+#'   separated by commas or different selectors (see
+#'   \code{\link{selections}}).  If a column is included in both lists to be
+#'   numerator and denominator, it will be removed from the listing.
+#' @param naming A function that defines the naming convention for new ratio
+#'   columns.
+#' @param columns The column names used in the ratios. This argument is
+#'   not populated until \code{\link{prep.recipe}} is executed.
+#' @keywords datagen
+#' @concept preprocessing
+#' @export
+#' @examples 
+#' library(recipes)
+#' data(biomass)
+#' 
+#' biomass$total <- apply(biomass[, 3:7], 1, sum)
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#' 
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + 
+#'                     sulfur + total,
+#'               data = biomass_tr)
+#' 
+#' ratio_recipe <- rec %>%
+#'   # all predictors over total
+#'   step_ratio(all_predictors(), denom = denom_vars(total)) %>%
+#'   # get rid of the original predictors 
+#'   step_rm(all_predictors(), -matches("_o_"))
+#'   
+#' 
+#' ratio_recipe <- prep(ratio_recipe, training = biomass_tr)
+#' 
+#' ratio_data <- bake(ratio_recipe, biomass_te)
+#' ratio_data
+
+step_ratio <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           denom = denom_vars(),
+           naming = function(numer, denom)
+             make.names(paste(numer, denom, sep = "_o_")),
+           columns = NULL) {
+    if (is_empty(denom))
+      stop("Please supply at least one denominator variable specification. ",
+           "See ?selections.", call. = FALSE)
+    add_step(
+      recipe,
+      step_ratio_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        denom = denom,
+        naming = naming,
+        columns = columns
+      )
+    )
+  }
+
+step_ratio_new <-
+  function(terms = NULL,
+           role = "predictor",
+           trained = FALSE,
+           denom = NULL,
+           naming = NULL,
+           columns = NULL
+  ) {
+    step(
+      subclass = "ratio",
+      terms = terms,
+      role = role,
+      trained = trained,
+      denom = denom,
+      naming = naming,
+      columns = columns
+    )
+  }
+
+
+#' @export
+prep.step_ratio <- function(x, training, info = NULL, ...) {
+  col_names <- expand.grid(
+    top = terms_select(x$terms, info = info),
+    bottom = terms_select(x$denom, info = info),
+    stringsAsFactors = FALSE
+  )
+  col_names <- col_names[!(col_names$top == col_names$bottom), ]
+  
+  if (nrow(col_names) == 0)
+    stop("No variables were selected for making ratios", call. = FALSE)
+  if (any(info$type[info$variable %in% col_names$top] != "numeric"))
+    stop("The ratio variables should be numeric")
+  if (any(info$type[info$variable %in% col_names$bottom] != "numeric"))
+    stop("The ratio variables should be numeric")
+  
+  step_ratio_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    denom = x$denom,
+    naming = x$naming,
+    columns = col_names
+  )
+}
+
+#' @export
+bake.step_ratio <- function(object, newdata, ...) {
+  res <- newdata[, object$columns$top] /
+    newdata[, object$columns$bottom]
+  colnames(res) <-
+    apply(object$columns, 1, function(x)
+      object$naming(x[1], x[2]))
+  if (!is_tibble(res))
+    res <- as_tibble(res)
+  
+  newdata <- cbind(newdata, res)
+  if (!is_tibble(newdata))
+    newdata <- as_tibble(newdata)
+  newdata
+}
+
+print.step_ratio <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Ratios from ")
+    if (x$trained) {
+      vars <- c(unique(x$columns$top), unique(x$columns$bottom))
+      cat(format_ch_vec(vars, width = width))
+    } else
+      cat(format_selectors(c(x$terms, x$denom), wdth = width))
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
+
+#' @export
+#' @rdname step_ratio
+denom_vars <- function(...) quos(...)
diff --git a/R/recipe.R b/R/recipe.R
new file mode 100644
index 0000000..e3c7a00
--- /dev/null
+++ b/R/recipe.R
@@ -0,0 +1,601 @@
+#' Create a Recipe for Preprocessing Data
+#'
+#' A recipe is a description of what steps should be applied to a data set in
+#'   order to get it ready for data analysis.
+#'
+#' @aliases recipe recipe.default recipe.formula
+#' @author Max Kuhn
+#' @keywords datagen
+#' @concept preprocessing model_specification
+#' @export
+recipe <- function(x, ...)
+  UseMethod("recipe")
+
+#' @rdname recipe
+#' @export
+recipe.default <- function(x, ...)
+  stop("`x` should be a data frame, matrix, or tibble", call. = FALSE)
+
+#' @rdname recipe
+#' @param vars A character string of column names corresponding to variables
+#'   that will be used in any context (see below)
+#' @param roles A character string (the same length of \code{vars}) that
+#'   describes a single role that the variable will take. This value could be
+#'   anything but common roles are \code{"outcome"}, \code{"predictor"},
+#'   \code{"case_weight"}, or \code{"ID"}
+#' @param ... Further arguments passed to or from other methods (not currently
+#'   used).
+#' @param formula A model formula. No in-line functions should be used here
+#'   (e.g. \code{log(x)}, \code{x:y}, etc.). These types of transformations
+#'   should be enacted using \code{step} functions in this package. Dots are
+#'   allowed as are simple multivariate outcome terms (i.e. no need for
+#'   \code{cbind}; see Examples).
+#' @param x,data A data frame or tibble of the \emph{template} data set
+#'   (see below).
+#' @return An object of class \code{recipe} with sub-objects:
+#'   \item{var_info}{A tibble containing information about the original data
+#'   set columns}
+#'   \item{term_info}{A tibble that contains the current set of terms in the
+#'   data set. This initially defaults to the same data contained in
+#'   \code{var_info}.}
+#'   \item{steps}{A list of \code{step} objects that define the sequence of
+#'   preprocessing steps that will be applied to data. The default value is
+#'   \code{NULL}}
+#'   \item{template}{A tibble of the data. This is initialized to be the same
+#'   as the data given in the \code{data} argument but can be different after
+#'   the recipe is trained.}
+#'
+#' @details Recipes are alternative methods for creating design matrices and
+#'   for preprocessing data.
+#'
+#' Variables in recipes can have any type of \emph{role} in subsequent analyses
+#'   such as: outcome, predictor, case weights, stratification variables, etc.
+#'
+#' \code{recipe} objects can be created in several ways. If the analysis only
+#'   contains outcomes and predictors, the simplest way to create one is to use
+#'   a simple formula (e.g. \code{y ~ x1 + x2}) that does not contain inline
+#'   functions such as \code{log(x3)}. An example is given below.
+#'
+#' Alternatively, a \code{recipe} object can be created by first specifying
+#'   which variables in a data set should be used and then sequentially
+#'   defining their roles (see the last example).
+#'
+#' Steps to the recipe can be added sequentially. Steps can include common
+#'   operations like logging a variable, creating dummy variables or
+#'   interactions and so on. More computationally complex actions such as
+#'   dimension reduction or imputation can also be specified.
+#'
+#' Once a recipe has been defined, the \code{\link{prep}} function can be
+#'   used to estimate quants required in the steps from a data set (a.k.a. the
+#'   training data). \code{\link{prep}} returns another recipe.
+#'
+#' To apply the recipe to a data set, the \code{\link{bake}} function is
+#'   used in the same manner as \code{predict} would be for models. This
+#'   applies the steps to any data set.
+#'
+#' Note that the data passed to \code{recipe} need not be the complete data
+#'   that will be used to train the steps (by \code{\link{prep}}). The recipe
+#'   only needs to know the names and types of data that will be used. For
+#'   large data sets, \code{head} could be used to pass the recipe a smaller
+#'   data set to save time and memory.
+#'
+#' @export
+#' @importFrom tibble as_tibble is_tibble tibble
+#' @importFrom dplyr full_join
+#' @importFrom stats predict
+#' @examples
+#'
+#' ###############################################
+#' # simple example:
+#' data(biomass)
+#'
+#' # split data
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' # When only predictors and outcomes, a simplified formula can be used.
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' # Now add preprocessing steps to the recipe.
+#'
+#' sp_signed <- rec %>%
+#'   step_center(all_predictors()) %>%
+#'   step_scale(all_predictors()) %>%
+#'   step_spatialsign(all_predictors())
+#' sp_signed
+#'
+#' # now estimate required parameters
+#' sp_signed_trained <- prep(sp_signed, training = biomass_tr)
+#' sp_signed_trained
+#'
+#' # apply the preprocessing to a data set
+#' test_set_values <- bake(sp_signed_trained, newdata = biomass_te)
+#'
+#' # or use pipes for the entire workflow:
+#' rec <- biomass_tr %>%
+#'   recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur) %>%
+#'   step_center(all_predictors()) %>%
+#'   step_scale(all_predictors()) %>%
+#'   step_spatialsign(all_predictors())
+#'
+#' ###############################################
+#' # multivariate example
+#'
+#' # no need for `cbind(carbon, hydrogen)` for left-hand side
+#' multi_y <- recipe(carbon + hydrogen ~ oxygen + nitrogen + sulfur,
+#'                   data = biomass_tr)
+#' multi_y <- multi_y %>%
+#'   step_center(all_outcomes()) %>%
+#'   step_scale(all_predictors())
+#'
+#' multi_y_trained <- prep(multi_y, training = biomass_tr)
+#'
+#' results <- bake(multi_y_trained, biomass_te)
+#'
+#' ###############################################
+#' # Creating a recipe manually with different roles
+#'
+#' rec <- recipe(biomass_tr) %>%
+#'   add_role(carbon, hydrogen, oxygen, nitrogen, sulfur,
+#'            new_role = "predictor") %>%
+#'   add_role(HHV, new_role = "outcome") %>%
+#'   add_role(sample, new_role = "id variable") %>%
+#'   add_role(dataset, new_role = "splitting indicator")
+#' rec
+recipe.data.frame <-
+  function(x,
+           formula = NULL,
+           ...,
+           vars = NULL,
+           roles = NULL) {
+    
+    if (!is.null(formula)) {
+      if (!is.null(vars))
+        stop("This `vars` specification will be ignored when a formula is ",
+             "used", call. = FALSE)
+      if (!is.null(roles))
+        stop("This `roles` specification will be ignored when a formula is ",
+             "used", call. = FALSE)
+      
+      obj <- recipe.formula(formula, x, ...)
+      return(obj)
+    }
+    
+    if (is.null(vars))
+      vars <- colnames(x)
+    
+    if (!is_tibble(x))
+      x <- as_tibble(x)
+    if (is.null(vars))
+      vars <- colnames(x)
+    if (any(table(vars) > 1))
+      stop("`vars` should have unique members", call. = FALSE)
+    if (any(!(vars %in% colnames(x))))
+      stop("1+ elements of `vars` are not in `x`", call. = FALSE)
+    
+    x <- x[, vars]
+    
+    var_info <- tibble(variable = vars)
+    
+    ## Check and add roles when available
+    if (!is.null(roles)) {
+      if (length(roles) != length(vars))
+        stop("The number of roles should be the same as the number of ",
+             "variables", call. = FALSE)
+      var_info$role <- roles
+    } else
+      var_info$role <- NA
+    
+    ## Add types
+    var_info <- full_join(get_types(x), var_info, by = "variable")
+    var_info$source <- "original"
+    
+    ## Return final object of class `recipe`
+    out <- list(
+      var_info = var_info,
+      term_info = var_info,
+      steps = NULL,
+      template = x,
+      levels = NULL,
+      retained = NA
+    )
+    class(out) <- "recipe"
+    out
+  }
+
+#' @rdname recipe
+#' @export
+recipe.formula <- function(formula, data, ...) {
+  args <- form2args(formula, data, ...)
+  obj <- recipe.data.frame(
+    x = args$x,
+    formula = NULL,
+    ...,
+    vars = args$vars,
+    roles = args$roles
+  )
+}
+
+#' @rdname recipe
+#' @export
+recipe.matrix <- function(x, ...)
+  recipe.data.frame(x, ...)
+
+
+#' @importFrom stats as.formula
+#' @importFrom tibble as_tibble is_tibble
+
+form2args <- function(formula, data, ...) {
+  if (!is_formula(formula))
+    formula <- as.formula(formula)
+  ## check for in-line formulas
+  check_elements(formula, allowed = NULL)
+  
+  if (!is_tibble(data))
+    data <- as_tibble(data)
+  
+  ## use rlang to get both sides of the formula
+  outcomes <- get_lhs_vars(formula, data)
+  predictors <- get_rhs_vars(formula, data)
+  
+  ## get `vars` from lhs and rhs of formula
+  
+  vars <- c(predictors, outcomes)
+  
+  ## subset data columns
+  data <- data[, vars]
+  
+  ## derive roles
+  roles <- rep("predictor", length(predictors))
+  if (length(outcomes) > 0)
+    roles <- c(roles, rep("outcome", length(outcomes)))
+  
+  ## pass to recipe.default with vars and roles
+  
+  list(x = data, vars = vars, roles = roles)
+}
+
+
+#' @aliases prep prep.recipe
+#' @param x an object
+#' @param ... further arguments passed to or from other methods (not currently
+#'   used).
+#' @author Max Kuhn
+#' @keywords datagen
+#' @concept preprocessing model_specification
+#' @export
+prep   <- function(x, ...)
+  UseMethod("prep")
+
+#' Train a Data Recipe
+#'
+#' For a recipe with at least one preprocessing step, estimate the required
+#'   parameters from a training set that can be later applied to other data 
+#'   sets.
+#' @param training A data frame or tibble that will be used to estimate
+#'   parameters for preprocessing.
+#' @param fresh A logical indicating whether already trained steps should be
+#'   re-trained. If \code{TRUE}, you should pass in a data set to the argument
+#'   \code{training}.
+#' @param verbose A logical that controls wether progress is reported as steps
+#'   are executed.
+#' @param retain A logical: should the \emph{preprocessingcessed} training set be saved
+#'   into the \code{template} slot of the recipe after training? This is a good
+#'     idea if you want to add more steps later but want to avoid re-training
+#'     the existing steps.
+#' @param stringsAsFactors A logical: should character columns be converted to
+#'   factors? This affects the preprocessingcessed training set (when
+#'   \code{retain = TRUE}) as well as the results of \code{bake.recipe}.
+#' @return A recipe whose step objects have been updated with the required
+#'   quantities (e.g. parameter estimates, model objects, etc). Also, the
+#'   \code{term_info} object is likely to be modified as the steps are
+#'   executed.
+#' @details Given a data set, this function estimates the required quantities
+#'   and statistics required by any steps.
+#'
+#' \code{\link{prep}} returns an updated recipe with the estimates.
+#'
+#' Note that missing data handling is handled in the steps; there is no global
+#'   \code{na.rm} option at the recipe-level or in  \code{\link{prep}}.
+#'
+#' Also, if a recipe has been trained using \code{\link{prep}} and then steps
+#'   are added, \code{\link{prep}} will only update the new steps. If
+#'   \code{fresh = TRUE}, all of the steps will be (re)estimated.
+#'
+#' As the steps are executed, the \code{training} set is updated. For example,
+#'   if the first step is to center the data and the second is to scale the
+#'   data, the step for scaling is given the centered data.
+#'
+#' @rdname prep
+#' @importFrom tibble as_tibble is_tibble tibble
+#' @export
+prep.recipe <-
+  function(x,
+           training = NULL,
+           fresh = FALSE,
+           verbose = TRUE,
+           retain = FALSE,
+           stringsAsFactors = TRUE,
+           ...) {
+    if (is.null(training)) {
+      if (fresh)
+        stop("A training set must be supplied to the `training` argument ",
+             "when `fresh = TRUE`", call. = FALSE)
+      training <- x$template
+      tr_data <- train_info(training)
+    } else {
+      training <- if (!is_tibble(training))
+        as_tibble(training[, x$var_info$variable, drop = FALSE])
+      else
+        training[, x$var_info$variable]
+    }
+    tr_data <- train_info(training)
+    if (stringsAsFactors) {
+      lvls <- lapply(training, get_levels)
+      training <- strings2factors(training, lvls)
+    } else
+      lvls <- NULL
+    
+    for (i in seq(along = x$steps)) {
+      note <- paste("step", i, gsub("^step_", "", class(x$steps[[i]])[1]))
+      if (!x$steps[[i]]$trained | fresh) {
+        if (verbose)
+          cat(note, "training", "\n")
+        
+        # Compute anything needed for the preprocessing steps
+        # then apply it to the current training set
+        
+        x$steps[[i]] <-
+          prep(x$steps[[i]],
+                  training = training,
+                  info = x$term_info)
+        training <- bake(x$steps[[i]], newdata = training)
+        x$term_info <-
+          merge_term_info(get_types(training), x$term_info)
+        
+        ## Update the roles and the term source
+        ## These next two steps needs to be smarter to find diffs
+        if (!is.na(x$steps[[i]]$role))
+          x$term_info$role[is.na(x$term_info$role)] <-
+          x$steps[[i]]$role
+        
+        x$term_info$source[is.na(x$term_info$source)] <- "derived"
+      } else {
+        if (verbose)
+          cat(note, "[pre-trained]\n")
+      }
+    }
+    
+    ## The steps may have changed the data so reassess the levels
+    if (stringsAsFactors) {
+      lvls <- lapply(training, get_levels)
+      check_lvls <- has_lvls(lvls)
+      if (!any(check_lvls)) lvls <- NULL
+    } else lvls <- NULL
+    
+    if (retain)
+      x$template <- training
+    
+    x$tr_info <- tr_data
+    x$levels <- lvls
+    x$retained <- retain
+    x
+  }
+
+#' @rdname bake
+#' @aliases bake bake.recipe
+#' @author Max Kuhn
+#' @keywords datagen
+#' @concept preprocessing model_specification
+#' @export
+bake <- function(object, ...)
+  UseMethod("bake")
+
+#' Apply a Trained Data Recipe
+#'
+#' For a recipe with at least one preprocessing step that has been trained by
+#'   \code{\link{prep.recipe}}, apply the computations to new data.
+#' @param object A trained object such as a \code{\link{recipe}} with at least
+#'   one preprocessing step.
+#' @param newdata A data frame or tibble for whom the preprocessing will be
+#'   applied.
+#' @param ... One or more selector functions to choose which variables will be
+#'   returned by the function. See \code{\link{selections}} for more details.
+#'   If no selectors are given, the default is to use
+#'   \code{\link{all_predictors}}.
+#' @return A tibble that may have different columns than the original columns
+#'   in \code{newdata}.
+#' @details \code{\link{bake}} takes a trained recipe and applies the
+#'   operations to a data set to create a design matrix.
+#'
+#' If the original data used to train the data are to be processed, time can be
+#'   saved by using the \code{retain = TRUE} option of \code{\link{prep}} to
+#'   avoid duplicating the same operations.
+#'
+#' A tibble is always returned but can be easily converted to a data frame or
+#'   matrix as needed.
+#' @rdname bake
+#' @importFrom tibble as_tibble
+#' @importFrom dplyr filter
+#' @export
+
+bake.recipe <- function(object, newdata = object$template, ...) {
+  if (!is_tibble(newdata)) newdata <- as_tibble(newdata)
+  
+  terms <- quos(...)
+  if (is_empty(terms))
+    terms <- quos(all_predictors())
+  
+  ## determine return variables
+  keepers <- terms_select(terms = terms, info = object$term_info)
+
+  for (i in seq(along = object$steps)) {
+    newdata <- bake(object$steps[[i]], newdata = newdata)
+    if (!is_tibble(newdata)) newdata <- as_tibble(newdata)
+  }
+  
+  newdata <- newdata[, names(newdata) %in% keepers]
+  
+  ## the Levels are not null when no nominal data are present or
+  ## if stringsAsFactors = FALSE in `prep`
+  if (!is.null(object$levels)) {
+    var_levels <- object$levels
+    var_levels <- var_levels[keepers]
+    check_values <-
+      vapply(var_levels, function(x)
+        (!all(is.na(x))), c(all = TRUE))
+    var_levels <- var_levels[check_values]
+    if (length(var_levels) > 0)
+      newdata <- strings2factors(newdata, var_levels)
+  }
+  
+  newdata
+}
+
+#' Print a Recipe
+#'
+#' @aliases print.recipe
+#' @param x A \code{recipe} object
+#' @param form_width The number of characters used to print the variables or
+#'   terms in a formula
+#' @param ... further arguments passed to or from other methods (not currently
+#'   used).
+#' @return The original object (invisibly)
+#'
+#' @author Max Kuhn
+#' @export
+print.recipe <- function(x, form_width = 30, ...) {
+  cat("Data Recipe\n\n")
+  cat("Inputs:\n\n")
+  no_role <- is.na(x$var_info$role)
+  if (any(!no_role)) {
+    tab <- as.data.frame(table(x$var_info$role))
+    colnames(tab) <- c("role", "#variables")
+    print(tab, row.names = FALSE)
+    if (any(no_role)) {
+      cat("\n ", sum(no_role), "variables without declared roles\n")
+    }
+  } else {
+    cat(" ", nrow(x$var_info), "variables (no declared roles)\n")
+  }
+  if ("tr_info" %in% names(x)) {
+    nmiss <- x$tr_info$nrows - x$tr_info$ncomplete
+    cat("\nTraining data contained ",
+        x$tr_info$nrows,
+        " data points and ",
+        sep = "")
+    if (x$tr_info$nrows == x$tr_info$ncomplete)
+      cat("no missing data.\n")
+    else
+      cat(nmiss,
+          "incomplete",
+          ifelse(nmiss > 1, "rows.", "row."),
+          "\n")
+  }
+  if (!is.null(x$steps)) {
+    cat("\nSteps:\n\n")
+    for (i in seq_along(x$steps))
+      print(x$steps[[i]], form_width = form_width)
+  }
+  invisible(x)
+}
+
+#' Summarize a Recipe
+#'
+#' This function prints the current set of variables/features and some of their
+#'   characteristics.
+#' @aliases summary.recipe
+#' @param object A \code{recipe} object
+#' @param original A logical: show the current set of variables or the original
+#'   set when the recipe was defined.
+#' @param ... further arguments passed to or from other methods (not currently
+#'   used).
+#' @return A tibble with columns \code{variable}, \code{type}, \code{role},
+#'   and \code{source}.
+#' @details Note that, until the recipe has been trained, the currrent and
+#'   original variables are the same.
+#' @examples
+#' rec <- recipe( ~ ., data = USArrests)
+#' summary(rec)
+#' rec <- step_pca(rec, all_numeric(), num = 3)
+#' summary(rec) # still the same since not yet trained
+#' rec <- prep(rec, training = USArrests)
+#' summary(rec)
+#' @export
+#' @seealso \code{\link{recipe}} \code{\link{prep.recipe}}
+summary.recipe <- function(object, original = FALSE, ...) {
+  if (original)
+    object$var_info
+  else
+    object$term_info
+}
+
+
+#' Extract Finalized Training Set
+#'
+#' As steps are estimated by \code{prep}, these operations are
+#'  applied to the training set. Rather than running \code{bake} 
+#'  to duplicate this processing, this function will return
+#'  variables from the processed training set. 
+#' @param object A \code{recipe} object that has been prepared 
+#'   with the option \code{retain = TRUE}. 
+#' @param ... One or more selector functions to choose which variables will be
+#'   returned by the function. See \code{\link{selections}} for more details.
+#'   If no selectors are given, the default is to use
+#'   \code{\link{all_predictors}}.
+#' @return A tibble.
+#' @details When preparing a recipe, if the training data set is retained using \code{retain = TRUE}, there is no need to \code{bake} the recipe to get the preprocessed training set. 
+#' @examples
+#' data(biomass)
+#' 
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#' 
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#' 
+#' sp_signed <- rec %>%
+#'   step_center(all_predictors()) %>%
+#'   step_scale(all_predictors()) %>%
+#'   step_spatialsign(all_predictors())
+#' 
+#' sp_signed_trained <- prep(sp_signed, training = biomass_tr, retain = TRUE)
+#' 
+#' tr_values <- bake(sp_signed_trained, newdata = biomass_tr, all_predictors())
+#' og_values <- juice(sp_signed_trained, all_predictors())
+#' 
+#' all.equal(tr_values, og_values)
+#' @export
+#' @seealso \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+juice <- function(object, ...) {
+  if(!isTRUE(object$retained))
+    stop("Use `retain = TRUE` in `prep` to be able to extract the training set",
+         call. = FALSE)
+  tr_steps <- vapply(object$steps, function(x) x$trained, c(logic = TRUE))
+  if(!all(tr_steps))
+    stop("At least one step has not be prepared; cannot extract.", 
+         call. = FALSE)
+  terms <- quos(...)
+  if (is_empty(terms))
+    terms <- quos(all_predictors())
+  keepers <- terms_select(terms = terms, info = object$term_info)
+  
+  newdata <- object$template[, names(object$template) %in% keepers]
+  
+  ## Since most models require factors, do the conversion from character
+  if (!is.null(object$levels)) {
+    var_levels <- object$levels
+    var_levels <- var_levels[keepers]
+    check_values <-
+      vapply(var_levels, function(x)
+        (!all(is.na(x))), c(all = TRUE))
+    var_levels <- var_levels[check_values]
+    if (length(var_levels) > 0)
+      newdata <- strings2factors(newdata, var_levels)
+  }
+  newdata
+}
+
+
+
diff --git a/R/regex.R b/R/regex.R
new file mode 100644
index 0000000..75bd5dd
--- /dev/null
+++ b/R/regex.R
@@ -0,0 +1,146 @@
+#' Create Dummy Variables using Regular Expressions
+#'
+#' \code{step_regex} creates a \emph{specification} of a recipe step that will
+#'   create a new dummy variable based on a regular expression.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... A single selector functions to choose which variable will be
+#'   searched for the pattern. The selector should resolve into a single
+#'   variable. See \code{\link{selections}} for more details.
+#' @param role For a variable created by this step, what analysis role should
+#'   they be assigned?. By default, the function assumes that the new dummy
+#'   variable column created by the original variable will be used as a
+#'   predictors in a model.
+#' @param pattern A character string containing a regular expression (or
+#'   character string for \code{fixed = TRUE}) to be matched in the given
+#'   character vector. Coerced by \code{as.character} to a character string
+#'   if possible.
+#' @param options A list of options to \code{\link{grepl}} that should not
+#'   include \code{x} or \code{pattern}.
+#' @param result A single character value for the name of the new variable. It
+#'   should be a valid column name.
+#' @param input A single character value for the name of the variable being
+#'   searched. This is \code{NULL} until computed by
+#'   \code{\link{prep.recipe}}.
+#' @keywords datagen
+#' @concept preprocessing dummy_variables regular_expressions
+#' @export
+#' @examples
+#' data(covers)
+#'
+#' rec <- recipe(~ description, covers) %>%
+#'   step_regex(description, pattern = "(rock|stony)", result = "rocks") %>%
+#'   step_regex(description, pattern = "ratake families")
+#'
+#' rec2 <- prep(rec, training = covers)
+#' rec2
+#'
+#' with_dummies <- bake(rec2, newdata = covers)
+#' with_dummies
+step_regex <- function(recipe,
+                       ...,
+                       role = "predictor",
+                       trained = FALSE,
+                       pattern = ".",
+                       options = list(),
+                       result = make.names(pattern),
+                       input = NULL) {
+  if (!is.character(pattern))
+    stop("`pattern` should be a character string", call. = FALSE)
+  if (length(pattern) != 1)
+    stop("`pattern` should be a single pattern", call. = FALSE)
+  valid_args <- names(formals(grepl))[- (1:2)]
+  if (any(!(names(options) %in% valid_args)))
+    stop("Valid options are: ",
+         paste0(valid_args, collapse = ", "),
+         call. = FALSE)
+  
+  terms <- check_ellipses(...)
+  if (length(terms) > 1)
+    stop("For this step, only a single selector can be used.", call. = FALSE)
+  
+  add_step(
+    recipe,
+    step_regex_new(
+      terms = terms,
+      role = role,
+      trained = trained,
+      pattern = pattern,
+      options = options,
+      result = result,
+      input = input
+    )
+  )
+}
+
+step_regex_new <- function(terms = NULL,
+                           role = NA,
+                           trained = FALSE,
+                           pattern = NULL,
+                           options = NULL,
+                           result = NULL,
+                           input = NULL) {
+  step(
+    subclass = "regex",
+    terms = terms,
+    role = role,
+    trained = trained,
+    pattern = pattern,
+    options = options,
+    result = result,
+    input = input
+  )
+}
+
+#' @export
+prep.step_regex <- function(x, training, info = NULL, ...) {
+  col_name <- terms_select(x$terms, info = info)
+  if (length(col_name) != 1)
+    stop("The selector should only select a single variable")
+  if (any(info$type[info$variable %in% col_name] != "nominal"))
+    stop("The regular expression input should be character or factor")
+  
+  step_regex_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    pattern = x$pattern,
+    options = x$options,
+    input = col_name,
+    result = x$result
+  )
+}
+
+#' @importFrom rlang expr
+bake.step_regex <- function(object, newdata, ...) {
+  ## sub in options
+  regex <- expr(
+    grepl(
+      x = getElement(newdata, object$input),
+      pattern = object$pattern,
+      ignore.case = FALSE,
+      perl = FALSE,
+      fixed = FALSE,
+      useBytes = FALSE
+    )
+  )
+  if (length(object$options) > 0)
+    regex <- mod_call_args(regex, args = object$options)
+  
+  newdata[, object$result] <- ifelse(eval(regex), 1, 0)
+  newdata
+}
+
+print.step_regex <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Regular expression dummy variable using `",
+        x$pattern,
+        "`",
+        sep = "")
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
diff --git a/R/rm.R b/R/rm.R
new file mode 100644
index 0000000..ce886bd
--- /dev/null
+++ b/R/rm.R
@@ -0,0 +1,98 @@
+#' General Variable Filter
+#'
+#' \code{step_rm} creates a \emph{specification} of a recipe step that will
+#'   remove variables based on their name, type, or role.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables that
+#'   will evaluated by the filtering bake. See \code{\link{selections}} for
+#'   more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param removals A character string that contains the names of columns that
+#'   should be removed. These values are not determined until
+#'   \code{\link{prep.recipe}} is called.
+#' @keywords datagen
+#' @concept preprocessing variable_filters
+#' @export
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' library(dplyr)
+#' smaller_set <- rec %>%
+#'   step_rm(contains("gen"))
+#'
+#' smaller_set <- prep(smaller_set, training = biomass_tr)
+#'
+#' filtered_te <- bake(smaller_set, biomass_te)
+#' filtered_te
+
+step_rm <- function(recipe,
+                    ...,
+                    role = NA,
+                    trained = FALSE,
+                    removals = NULL) {
+  add_step(recipe,
+           step_rm_new(
+             terms = check_ellipses(...),
+             role = role,
+             trained = trained,
+             removals = removals
+           ))
+}
+
+step_rm_new <- function(terms = NULL,
+                        role = NA,
+                        trained = FALSE,
+                        removals = NULL) {
+  step(
+    subclass = "rm",
+    terms = terms,
+    role = role,
+    trained = trained,
+    removals = removals
+  )
+}
+
+#' @export
+prep.step_rm <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_rm_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    removals = col_names
+  )
+}
+
+#' @export
+bake.step_rm <- function(object, newdata, ...) {
+  if (length(object$removals) > 0)
+    newdata <- newdata[, !(colnames(newdata) %in% object$removals)]
+  as_tibble(newdata)
+}
+
+print.step_rm <-
+  function(x, width = max(20, options()$width - 22), ...) {
+    if (x$trained) {
+      if (length(x$removals) > 0) {
+        cat("Variables removed ")
+        cat(format_ch_vec(x$removals, width = width))
+      } else
+        cat("No variables were removed")
+    } else {
+      cat("Delete terms ", sep = "")
+      cat(format_selectors(x$terms, wdth = width))
+    }
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
diff --git a/R/roles.R b/R/roles.R
new file mode 100644
index 0000000..433f895
--- /dev/null
+++ b/R/roles.R
@@ -0,0 +1,63 @@
+#' Manually Add Roles
+#'
+#' \code{add_role} can add a role definition to an existing variable in the
+#'   recipe.
+#'
+#' @param recipe An existing \code{\link{recipe}}.
+#' @param ... One or more selector functions to choose which variables are
+#'   being assigned a role. See \code{\link{selections}} for more details.
+#' @param new_role A character string for a single role.
+#' @return An updated recipe object.
+#' @details If a variable is selected that currently has a role, the role is
+#'   changed and a warning is issued.
+#' @keywords datagen
+#' @concept preprocessing model_specification
+#' @export
+#' @examples
+#'
+#' data(biomass)
+#'
+#' # Create the recipe manually
+#' rec <- recipe(x = biomass)
+#' rec
+#' summary(rec)
+#'
+#' rec <- rec %>%
+#'   add_role(carbon, contains("gen"), sulfur, new_role = "predictor") %>%
+#'   add_role(sample, new_role = "id variable") %>%
+#'   add_role(dataset, new_role = "splitting variable") %>%
+#'   add_role(HHV, new_role = "outcome")
+#' rec
+#'
+#'@importFrom rlang quos
+add_role <- function(recipe, ..., new_role = "predictor") {
+  if (length(new_role) > 1)
+    stop("A single role is required", call. = FALSE)
+  terms <- quos(...)
+  if (is_empty(terms))
+    warning("No selectors were found", call. = FALSE)
+  vars <- terms_select(terms = terms, info = summary(recipe))
+  ## check if there are newly defined variables in the list
+  existing_var <- vars %in% recipe$var_info$variable
+  if (any(!existing_var)) {
+    ## Add new variable with role
+    new_vars <-
+      tibble(variable = vars[!existing_var],
+             role = rep(new_role, sum(!existing_var)))
+    recipe$var_info <- rbind(recipe$var_info, new_vars)
+  } else {
+    ##   check for current roles that are missing
+    vars2 <- vars[existing_var]
+    has_role <-
+      !is.na(recipe$var_info$role[recipe$var_info$variable %in% vars2])
+    if (any(has_role)) {
+      warning("Changing role(s) for ",
+              paste0(vars2[has_role], collapse = ", "),
+              call. = FALSE)
+    }
+    recipe$var_info$role[recipe$var_info$variable %in% vars2] <-
+      new_role
+  }
+  recipe$term_info <- recipe$var_info
+  recipe
+}
diff --git a/R/scale.R b/R/scale.R
new file mode 100644
index 0000000..cf3fbf0
--- /dev/null
+++ b/R/scale.R
@@ -0,0 +1,105 @@
+#' Scaling Numeric Data
+#'
+#' \code{step_scale} creates a \emph{specification} of a recipe step that
+#'   will normalize numeric data to have a standard deviation of one.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role Not used by this step since no new variables are created.
+#' @param sds A named numeric vector of standard deviations This is \code{NULL}
+#'   until computed by \code{\link{prep.recipe}}.
+#' @param na.rm A logical value indicating whether \code{NA} values should be
+#'   removed when computing the standard deviation.
+#' @keywords datagen
+#' @concept preprocessing normalization_methods
+#' @export
+#' @details Scaling data means that the standard deviation of a variable is
+#'   divided out of the data. \code{step_scale} estimates the variable
+#'   standard deviations from the data used in the \code{training} argument of
+#'   \code{prep.recipe}. \code{bake.recipe} then applies the scaling to
+#'   new data sets using these standard deviations.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' scaled_trans <- rec %>%
+#'   step_scale(carbon, hydrogen)
+#'
+#' scaled_obj <- prep(scaled_trans, training = biomass_tr)
+#'
+#' transformed_te <- bake(scaled_obj, biomass_te)
+#'
+#' biomass_te[1:10, names(transformed_te)]
+#' transformed_te
+
+step_scale <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           sds = NULL,
+           na.rm = TRUE) {
+    add_step(
+      recipe,
+      step_scale_new(
+        terms = check_ellipses(...),
+        role = role,
+        trained = trained,
+        sds = sds,
+        na.rm = na.rm
+      )
+    )
+  }
+
+step_scale_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           sds = NULL,
+           na.rm = NULL) {
+    step(
+      subclass = "scale",
+      terms = terms,
+      role = role,
+      trained = trained,
+      sds = sds,
+      na.rm = na.rm
+    )
+  }
+
+#' @importFrom stats sd
+#' @export
+prep.step_scale <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  sds <-
+    vapply(training[, col_names], sd, c(sd = 0), na.rm = x$na.rm)
+  step_scale_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    sds,
+    na.rm = x$na.rm
+  )
+}
+
+#' @export
+bake.step_scale <- function(object, newdata, ...) {
+  res <-
+    sweep(as.matrix(newdata[, names(object$sds)]), 2, object$sds, "/")
+  if (is.matrix(res) && ncol(res) == 1)
+    res <- res[, 1]
+  newdata[, names(object$sds)] <- res
+  as_tibble(newdata)
+}
+
+print.step_scale <-
+  function(x, width = max(20, options()$width - 30), ...) {
+    cat("Scaling for ", sep = "")
+    printer(names(x$sds), x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/selections.R b/R/selections.R
new file mode 100644
index 0000000..d603c4b
--- /dev/null
+++ b/R/selections.R
@@ -0,0 +1,342 @@
+
+
+#' @name selections
+#' @aliases selections
+#' @aliases selection
+#' @title Methods for Select Variables in Step Functions
+#' @description When selecting variables or model terms in \code{step}
+#'   functions, \code{dplyr}-like tools are used. The \emph{selector}
+#'   functions can choose variables based on their name, current role, data
+#'   type, or any combination of these. The selectors are passed as any other
+#'   argument to the step. If the variables are explicitly stated in the step
+#'   function, this might be similar to:
+#'
+#' \preformatted{
+#'   recipe( ~ ., data = USArrests) \%>\%
+#'     step_pca(Murder, Assault, UrbanPop, Rape, num = 3)
+#' }
+#'
+#' The first four arguments indicate which variables should be used in the
+#'   PCA while the last argument is a specific argument to
+#'   \code{\link{step_pca}}.
+#'
+#' Note that:
+#'
+#'   \enumerate{
+#'     \item The selector arguments should not contain functions beyond those
+#'       supported (see below).
+#'     \item These arguments are not evaluated until the \code{prep} function
+#'       for the step is executed.
+#'     \item The \code{dplyr}-like syntax allows for negative sings to exclude
+#'       variables (e.g. \code{-Murder}) and the set of selectors will
+#'       processed in order.
+#'     \item A leading exclusion in these arguments (e.g. \code{-Murder}) has
+#'       the effect of adding all variables to the list except the excluded
+#'       variable(s).
+#'   }
+#'
+#' Also, select helpers from the \code{dplyr} package can also be used:
+#'   \code{\link[dplyr]{starts_with}}, \code{\link[dplyr]{ends_with}},
+#'   \code{\link[dplyr]{contains}}, \code{\link[dplyr]{matches}},
+#'   \code{\link[dplyr]{num_range}}, and \code{\link[dplyr]{everything}}.
+#'   For example:
+#'
+#' \preformatted{
+#'   recipe(Species ~ ., data = iris) \%>\%
+#'     step_center(starts_with("Sepal"), -contains("Width"))
+#' }
+#'
+#' would only select \code{Sepal.Length}
+#'
+#' \bold{Inline} functions that specify computations, such as \code{log(x)},
+#'   should not be used in selectors and will produce an error. A list of
+#'   allowed selector functions is below.
+#'
+#' Columns of the design matrix that may not exist when the step is coded can
+#'   also be selected. For example, when using \code{step_pca}, the number of
+#'   columns created by feature extraction may not be known when subsequent
+#'   steps are defined. In this case, using \code{matches("^PC")} will select
+#'   all of the columns whose names start with "PC" \emph{once those columns
+#'   are created}.
+#'
+#' There are sets of functions that can be used to select variables based on
+#'   their role or type: \code{\link{has_role}} and \code{\link{has_type}}.
+#'   For convenience, there are also functions that are more specific:
+#'   \code{\link{all_numeric}}, \code{\link{all_nominal}},
+#'   \code{\link{all_predictors}}, and \code{\link{all_outcomes}}. These can
+#'   be used in conjunction with the previous functions described for
+#'   selecting variables using their names:
+#'
+#' \preformatted{
+#'   data(biomass)
+#'   recipe(HHV ~ ., data = biomass) \%>\%
+#'     step_center(all_numeric(), -all_outcomes())
+#' }
+#'
+#' This results in all the numeric predictors: carbon, hydrogen, oxygen,
+#'   nitrogen, and sulfur.
+#'
+#' If a role for a variable has not been defined, it will never be selected
+#'   using role-specific selectors.
+#'
+#' All steps use these techniques to define variables for steps
+#'   \emph{except one}: \code{\link{step_interact}} requires traditional model
+#'   formula representations of the interactions and takes a single formula
+#'   as the argument to select the variables.
+#'
+#' The complete list of allowable functions in steps:
+#'
+#'   \itemize{
+#'     \item \bold{By name}: \code{\link[dplyr]{starts_with}},
+#'       \code{\link[dplyr]{ends_with}}, \code{\link[dplyr]{contains}},
+#'       \code{\link[dplyr]{matches}}, \code{\link[dplyr]{num_range}}, and
+#'       \code{\link[dplyr]{everything}}
+#'     \item \bold{By role}: \code{\link{has_role}},
+#'       \code{\link{all_predictors}}, and \code{\link{all_outcomes}}
+#'     \item \bold{By type}: \code{\link{has_type}}, \code{\link{all_numeric}},
+#'       and \code{\link{all_nominal}}
+#'   }
+NULL
+
+## These are the allowable functions for formulas in the the `terms` arguments
+## to the steps or to `recipes.formula`.
+name_selectors <- c("starts_with",
+                    "ends_with",
+                    "contains",
+                    "matches",
+                    "num_range",
+                    "everything",
+                    "_F")
+
+role_selectors <-
+  c("has_role", "all_predictors", "all_outcomes", "_F")
+
+type_selectors <- c("has_type", "all_numeric", "all_nominal", "_F")
+
+selectors <-
+  unique(c(name_selectors, role_selectors, type_selectors))
+
+## Get the components of the formula split by +/-. The
+## function also returns the sign
+f_elements <- function(x) {
+  trms_obj <- terms(x)
+  ## Their order will change here (minus at the end)
+  clls <- attr(trms_obj, "variables")
+  ## Any formula element with a minus prefix will not
+  ## have an colname in the `factor` attribute of the
+  ## terms object. We will check these against the
+  ## list of calls
+  tmp <- colnames(attr(trms_obj, "factors"))
+  kept <- vector(mode = "list", length = length(tmp))
+  for (j in seq_along(tmp))
+    kept[[j]] <- as.name(tmp[j])
+  
+  term_signs <- rep("", length(clls) - 1)
+  for (i in seq_along(term_signs)) {
+    ## Check to see if the elements are in the `factors`
+    ## part of `terms` and these will have a + sign
+    retained <- any(unlist(lapply(kept,
+                                  function(x, y)
+                                    any(y == x),
+                                  y = clls[[i + 1]])))
+    term_signs[i] <- if (retained)
+      "+"
+    else
+      "-"
+  }
+  list(terms  = clls, signs = term_signs)
+}
+
+## This adds the appropriate argument based on whether the call is for
+## a variable name, role, or data type.
+add_arg <- function(cl) {
+  func <- fun_calls(cl)
+  if (func %in% name_selectors) {
+    cl$vars <- quote(var_vals)
+  } else {
+    if (func %in% role_selectors) {
+      cl$roles <- quote(role_vals)
+    } else
+      cl$types <- quote(type_vals)
+  }
+  cl
+}
+
+## This flags formulas that are not allowed. When called from `recipe.formula`
+## `allowed` is NULL.
+check_elements <- function(x, allowed = selectors) {
+  funs <- fun_calls(x)
+  funs <- funs[!(funs %in% c("~", "+", "-"))]
+  if (!is.null(allowed)) {
+    # when called from a step
+    not_good <- funs[!(funs %in% allowed)]
+    if (length(not_good) > 0)
+      stop(
+        "Not all functions are allowed in step function selectors (e.g. ",
+        paste0("`", not_good, "`", collapse = ", "),
+        "). See ?selections.",
+        call. = FALSE
+      )
+  } else {
+    # when called from formula.recipe
+    if (length(funs) > 0)
+      stop(
+        "No in-line functions should be used here; use steps to define ",
+        "baking actions", call. = FALSE
+      )
+  }
+  invisible(NULL)
+}
+
+has_selector <- function(x, allowed = selectors) {
+  res <- rep(NA, length(x) - 1)
+  for (i in 2:length(x))
+    res[[i - 1]] <- isTRUE(fun_calls(x[[i]]) %in% allowed)
+  res
+}
+
+#' Select Terms in a Step Function.
+#'
+#' This function bakees the step function selectors and might be useful
+#'   when creating custom steps.
+#'
+#' @param info A tibble with columns \code{variable}, \code{type}, \code{role},
+#'   and \code{source} that represent the current state of the data. The
+#'   function \code{\link{summary.recipe}} can be used to get this information
+#'   from a recipe.
+#' @param terms A list of formulas whose right-hand side contains quoted
+#'   expressions. See \code{\link[rlang]{quos}} for examples.
+#' @keywords datagen
+#' @concept preprocessing
+#' @return A character string of column names or an error of there are no
+#'   selectors or if no variables are selected.
+#' @seealso \code{\link{recipe}} \code{\link{summary.recipe}}
+#'   \code{\link{prep.recipe}}
+#' @importFrom purrr map_lgl map_if map_chr map
+#' @importFrom rlang names2
+#' @export
+#' @examples
+#' library(rlang)
+#' data(okc)
+#' rec <- recipe(~ ., data = okc)
+#' info <- summary(rec)
+#' terms_select(info = info, quos(all_predictors()))
+terms_select <- function(terms, info) {
+  vars <- info$variable
+  roles <- info$role
+  types <- info$type
+
+  if (is_empty(terms)) {
+    stop("At least one selector should be used", call. = FALSE)
+  }
+
+  ## check arguments against whitelist
+  lapply(terms, check_elements)
+
+  # Set current_info so available to helpers
+  old_info <- set_current_info(info)
+  on.exit(set_current_info(old_info), add = TRUE)
+
+  sel <- with_handlers(tidyselect::vars_select(vars, !!! terms),
+    tidyselect_empty = abort_selection
+  )
+
+  unname(sel)
+}
+
+abort_selection <- exiting(function(cnd) {
+  abort("No variables or terms were selected.")
+})
+
+#' Role Selection
+#'
+#' \code{has_role}, \code{all_predictors}, and \code{all_outcomes} can be used
+#'   to select variables in a formula that have certain roles. Similarly,
+#'   \code{has_type}, \code{all_numeric}, and \code{all_nominal} are used to
+#'   select columns based on their data type. See \code{\link{selections}} for
+#'   more details. \code{current_info} is an internal function that is
+#'   unlikely to help users while the others have limited utility outside of
+#'   step function arguments.
+#'
+#' @param match A single character string for the query. Exact matching is
+#'   used (i.e. regular expressions won't work).
+#' @param roles A character string of roles for the current set of terms.
+#' @param types A character string of roles for the current set of data types
+#' @return Selector functions return an integer vector while
+#'   \code{current_info} returns an environment with vectors \code{vars},
+#'   \code{roles}, and \code{types}.
+#' @keywords datagen
+#' @examples
+#' data(biomass)
+#'
+#' rec <- recipe(biomass) %>%
+#'   add_role(carbon, hydrogen, oxygen, nitrogen, sulfur,
+#'            new_role = "predictor") %>%
+#'   add_role(HHV, new_role = "outcome") %>%
+#'   add_role(sample, new_role = "id variable") %>%
+#'   add_role(dataset, new_role = "splitting indicator")
+#' recipe_info <- summary(rec)
+#' recipe_info
+#'
+#' has_role("id variable", roles = recipe_info$role)
+#' all_outcomes(roles = recipe_info$role)
+#' @export
+
+has_role <-
+  function(match = "predictor",
+           roles = current_info()$roles)
+    which(roles %in% match)
+
+#' @export
+#' @rdname has_role
+#' @inheritParams has_role
+all_predictors <- function(roles = current_info()$roles)
+  has_role("predictor", roles = roles)
+
+#' @export
+#' @rdname has_role
+#' @inheritParams has_role
+all_outcomes <- function(roles = current_info()$roles)
+  has_role("outcome", roles = roles)
+
+#' @export
+#' @rdname has_role
+#' @inheritParams has_role
+has_type <-
+  function(match = "numeric",
+           types = current_info()$types)
+    which(types %in% match)
+
+#' @export
+#' @rdname has_role
+#' @inheritParams has_role
+all_numeric <- function(types = current_info()$types)
+  has_type("numeric", types = types)
+
+#' @export
+#' @rdname has_role
+#' @inheritParams has_role
+all_nominal <- function(types = current_info()$types)
+  has_type("nominal", types = types)
+
+## functions to get current variable info for selectors modeled after
+## dplyr versions
+
+#' @import rlang
+cur_info_env <- child_env(env_parent(env))
+
+set_current_info <- function(x) {
+  # stopifnot(!is.environment(x))
+  old <- cur_info_env
+  cur_info_env$vars <- x$variable
+  cur_info_env$roles <- x$role
+  cur_info_env$types <- x$type
+  
+  invisible(old)
+}
+
+#' @export
+#' @rdname has_role
+current_info <- function() {
+  cur_info_env %||% stop("Variable context not set", call. = FALSE)
+}
diff --git a/R/shuffle.R b/R/shuffle.R
new file mode 100644
index 0000000..03fbd09
--- /dev/null
+++ b/R/shuffle.R
@@ -0,0 +1,87 @@
+#' Shuffle Variables
+#'
+#' \code{step_shuffle} creates a \emph{specification} of a recipe step that will
+#'   randomly change the order of rows for selected variables.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will
+#'    permuted. See \code{\link{selections}} for  more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param columns A character string that contains the names of columns that
+#'   should be shuffled. These values are not determined until
+#'   \code{\link{prep.recipe}} is called.
+#' @keywords datagen
+#' @concept preprocessing randomization permutation
+#' @export
+#' @examples
+#' integers <- data.frame(A = 1:12, B = 13:24, C = 25:36)
+#'
+#' library(dplyr)
+#' rec <- recipe(~ A + B + C, data = integers) %>%
+#'   step_shuffle(A, B)
+#'
+#' rand_set <- prep(rec, training = integers)
+#'
+#' set.seed(5377)
+#' bake(rand_set, integers)
+
+step_shuffle <- function(recipe,
+                         ...,
+                         role = NA,
+                         trained = FALSE,
+                         columns = NULL) {
+  add_step(recipe,
+           step_shuffle_new(
+             terms = check_ellipses(...),
+             role = role,
+             trained = trained,
+             columns = columns
+           ))
+}
+
+step_shuffle_new <- function(terms = NULL,
+                             role = NA,
+                             trained = FALSE,
+                             columns = NULL) {
+  step(
+    subclass = "shuffle",
+    terms = terms,
+    role = role,
+    trained = trained,
+    columns = columns
+  )
+}
+
+#' @export
+prep.step_shuffle <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_shuffle_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    columns = col_names
+  )
+}
+
+#' @export
+bake.step_shuffle <- function(object, newdata, ...) {
+  if (nrow(newdata) == 1) {
+    warning("`newdata` contains a single row; unable to shuffle",
+            call. = FALSE)
+    return(newdata)
+  }
+  
+  if (length(object$columns) > 0)
+    for (i in seq_along(object$columns))
+      newdata[, object$columns[i]] <-
+        sample(getElement(newdata, object$columns[i]))
+    as_tibble(newdata)
+}
+
+print.step_shuffle <-
+  function(x, width = max(20, options()$width - 22), ...) {
+    cat("Shuffled ")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/spatialsign.R b/R/spatialsign.R
new file mode 100644
index 0000000..17c5012
--- /dev/null
+++ b/R/spatialsign.R
@@ -0,0 +1,103 @@
+#' Spatial Sign Preprocessing
+#'
+#' \code{step_spatialsign} is a \emph{specification} of a recipe step that
+#'   will convert numeric data into a projection on to a unit sphere.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   used for the normalization. See \code{\link{selections}} for more details.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?
+#' @param columns A character string of variable names that will be (eventually)
+#'   populated by the \code{terms} argument.
+#' @keywords datagen
+#' @concept preprocessing projection_methods
+#' @export
+#' @details The spatial sign transformation projects the variables onto a unit
+#'   sphere and is related to global contrast normalization. The spatial sign
+#'   of a vector \code{w} is \code{w/norm(w)}.
+#'
+#' The variables should be centered and scaled prior to the computations.
+#' @references Serneels, S., De Nolf, E., and Van Espen, P. (2006). Spatial
+#'   sign preprocessing: a simple way to impart moderate robustness to
+#'   multivariate estimators. \emph{Journal of Chemical Information and
+#'   Modeling}, 46(3), 1402-1409.
+#' @examples
+#' data(biomass)
+#'
+#' biomass_tr <- biomass[biomass$dataset == "Training",]
+#' biomass_te <- biomass[biomass$dataset == "Testing",]
+#'
+#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+#'               data = biomass_tr)
+#'
+#' ss_trans <- rec %>%
+#'   step_center(carbon, hydrogen) %>%
+#'   step_scale(carbon, hydrogen) %>%
+#'   step_spatialsign(carbon, hydrogen)
+#'
+#' ss_obj <- prep(ss_trans, training = biomass_tr)
+#'
+#' transformed_te <- bake(ss_obj, biomass_te)
+#'
+#' plot(biomass_te$carbon, biomass_te$hydrogen)
+#'
+#' plot(transformed_te$carbon, transformed_te$hydrogen)
+
+step_spatialsign <-
+  function(recipe,
+           ...,
+           role = "predictor",
+           trained = FALSE,
+           columns = NULL) {
+    add_step(recipe,
+             step_spatialsign_new(
+               terms = check_ellipses(...),
+               role = role,
+               trained = trained,
+               columns = columns
+             ))
+  }
+
+step_spatialsign_new <-
+  function(terms = NULL,
+           role = "predictor",
+           trained = FALSE,
+           columns = NULL) {
+    step(
+      subclass = "spatialsign",
+      terms = terms,
+      role = role,
+      trained = trained,
+      columns = columns
+    )
+  }
+
+#' @export
+prep.step_spatialsign <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_spatialsign_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    columns = col_names
+  )
+}
+
+#' @export
+bake.step_spatialsign <- function(object, newdata, ...) {
+  col_names <- object$columns
+  ss <- function(x)
+    x / sqrt(sum(x ^ 2))
+  newdata[, col_names] <-
+    t(apply(as.matrix(newdata[, col_names]), 1, ss))
+  as_tibble(newdata)
+}
+
+print.step_spatialsign <-
+  function(x, width = max(20, options()$width - 26), ...) {
+    cat("Spatial sign on  ", sep = "")
+    printer(x$columns, x$terms, x$trained, width = width)
+    invisible(x)
+  }
diff --git a/R/sqrt.R b/R/sqrt.R
new file mode 100644
index 0000000..80b340c
--- /dev/null
+++ b/R/sqrt.R
@@ -0,0 +1,83 @@
+#' Square Root Transformation
+#'
+#' \code{step_sqrt} creates a \emph{specification} of a recipe step that will
+#'   square root transform the data.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param ... One or more selector functions to choose which variables will be
+#'   transformed. See \code{\link{selections}} for more details.
+#' @param role Not used by this step since no new variables are created.
+#' @param columns A character string of variable names that will be (eventually)
+#'   populated by the \code{terms} argument.
+#' @keywords datagen
+#' @concept preprocessing transformation_methods
+#' @export
+#' @examples
+#' set.seed(313)
+#' examples <- matrix(rnorm(40)^2, ncol = 2)
+#' examples <- as.data.frame(examples)
+#'
+#' rec <- recipe(~ V1 + V2, data = examples)
+#'
+#' sqrt_trans <- rec  %>%
+#'   step_sqrt(all_predictors())
+#'
+#' sqrt_obj <- prep(sqrt_trans, training = examples)
+#'
+#' transformed_te <- bake(sqrt_obj, examples)
+#' plot(examples$V1, transformed_te$V1)
+#' @seealso \code{\link{step_logit}} \code{\link{step_invlogit}}
+#'   \code{\link{step_log}}  \code{\link{step_hyperbolic}} \code{\link{recipe}}
+#'   \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+
+step_sqrt <- function(recipe, ..., role = NA, trained = FALSE, columns = NULL) {
+  add_step(
+    recipe,
+    step_sqrt_new(
+      terms = check_ellipses(...),
+      role = role,
+      trained = trained,
+      columns = columns
+    )
+  )
+}
+
+step_sqrt_new <-
+  function(terms = NULL, role = NA, trained = FALSE, columns = NULL) {
+    step(
+      subclass = "sqrt",
+      terms = terms,
+      role = role,
+      trained = trained,
+      columns = columns
+    )
+  }
+
+
+#' @export
+prep.step_sqrt <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  step_sqrt_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    columns = col_names
+  )
+}
+
+#' @export
+bake.step_sqrt <- function(object, newdata, ...) {
+  col_names <- object$columns
+  for (i in seq_along(col_names))
+    newdata[, col_names[i]] <-
+      sqrt(getElement(newdata, col_names[i]))
+  as_tibble(newdata)
+}
+
+print.step_sqrt <- function(x, width = max(20, options()$width - 29), ...) {
+  cat("Square root transformation on ", sep = "")
+  printer(x$columns, x$terms, x$trained, width = width)
+  invisible(x)
+}
+
diff --git a/R/window.R b/R/window.R
new file mode 100644
index 0000000..a028b44
--- /dev/null
+++ b/R/window.R
@@ -0,0 +1,253 @@
+#' Moving Window Functions
+#'
+#' \code{step_window} creates a \emph{specification} of a recipe step that will
+#'   create new columns that are the results of functions that compute
+#'   statistics across moving windows.
+#'
+#' @inheritParams step_center
+#' @inherit step_center return
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned? If \code{names} is left to be \code{NULL}, the rolling
+#'   statistics replace the original columns and the roles are left unchanged.
+#'   If \code{names} is set, those new columns will have a role of \code{NULL}
+#'   unless this argument has a value.
+#' @param size An odd integer \code{>= 3} for the window size.
+#' @param na.rm A logical for whether missing values should be removed from the
+#'   calculations within each window.
+#' @param statistic A character string for the type of statistic that should
+#'   be calculated for each moving window. Possible values are: \code{'max'},
+#'   \code{'mean'}, \code{'median'}, \code{'min'}, \code{'prod'}, \code{'sd'},
+#'   \code{'sum'}, \code{'var'}
+#' @param columns A character string that contains the names of columns that
+#'   should be processed. These values are not determined until
+#'   \code{\link{prep.recipe}} is called.
+#' @param names An optional character string that is the same length of the
+#'   number of terms selected by \code{terms}. If you are not sure what columns
+#'   will be selected, use the \code{summary} function (see the example below).
+#'   These will be the names of the new columns created by the step. 
+#' @keywords datagen
+#' @concept preprocessing moving_windows
+#' @export
+#' @details The calculations use a somewhat atypical method for handling the
+#'   beginning and end parts of the rolling statistics. The process starts
+#'   with the center justified window calculations and the beginning and
+#'   ending parts of the rolling values are determined using the first and
+#'   last rolling values, respectively. For example if a column \code{x} with
+#'   12 values is smoothed with a 5-point moving median, the first three
+#'   smoothed values are estimated by \code{median(x[1:5])} and the fourth
+#'   uses \code{median(x[2:6])}.
+#' @examples
+#' library(recipes)
+#' library(dplyr)
+#' library(rlang)
+#' library(ggplot2, quietly = TRUE)
+#'
+#' set.seed(5522)
+#' sim_dat <- data.frame(x1 = (20:100) / 10)
+#' n <- nrow(sim_dat)
+#' sim_dat$y1 <- sin(sim_dat$x1) + rnorm(n, sd = 0.1)
+#' sim_dat$y2 <- cos(sim_dat$x1) + rnorm(n, sd = 0.1)
+#' sim_dat$x2 <- runif(n)
+#' sim_dat$x3 <- rnorm(n)
+#'
+#' rec <- recipe(y1 + y2 ~ x1 + x2 + x3, data = sim_dat) %>%
+#'   step_window(starts_with("y"), size = 7, statistic = "median",
+#'               names = paste0("med_7pt_", 1:2),
+#'               role = "outcome") %>%
+#'   step_window(starts_with("y"),
+#'               names = paste0("mean_3pt_", 1:2),
+#'               role = "outcome")
+#' rec <- prep(rec, training = sim_dat)
+#'
+#' # If you aren't sure how to set the names, see which variables are selected
+#' # and the order that they are selected:
+#' terms_select(info = summary(rec), terms = quos(starts_with("y")))
+#'
+#' smoothed_dat <- bake(rec, sim_dat, everything())
+#'
+#' ggplot(data = sim_dat, aes(x = x1, y = y1)) +
+#'   geom_point() +
+#'   geom_line(data = smoothed_dat, aes(y = med_7pt_1)) +
+#'   geom_line(data = smoothed_dat, aes(y = mean_3pt_1), col = "red") +
+#'   theme_bw()
+#'
+#' # If you want to replace the selected variables with the rolling statistic
+#' # don't set `names`
+#' sim_dat$original <- sim_dat$y1
+#' rec <- recipe(y1 + y2 + original ~ x1 + x2 + x3, data = sim_dat) %>%
+#'   step_window(starts_with("y"))
+#' rec <- prep(rec, training = sim_dat)
+#' smoothed_dat <- bake(rec, sim_dat, everything())
+#' ggplot(smoothed_dat, aes(x = original, y = y1)) + 
+#'   geom_point() + 
+#'   theme_bw()
+
+step_window <-
+  function(recipe,
+           ...,
+           role = NA,
+           trained = FALSE,
+           size = 3,
+           na.rm = TRUE,
+           statistic = "mean",
+           columns = NULL,
+           names = NULL) {
+    if(!(statistic %in% roll_funs) | length(statistic) != 1)
+      stop("`statistic` should be one of: ",
+           paste0("'", roll_funs, "'", collapse = ", "),
+           call. = FALSE)
+    
+    ## ensure size is odd, integer, and not too small
+    if (is.na(size) | is.null(size))
+      stop("`size` needs a value.", call. = FALSE)
+    
+    if (!is.integer(size)) {
+      tmp <- size
+      size <- as.integer(size)
+      if (!isTRUE(all.equal(tmp, size)))
+        warning("`size` was not an integer (", tmp, ") and was ",
+                "converted to ", size, ".", sep = "", 
+                call. = FALSE)
+    }
+    if (size %% 2 == 0)
+      stop("`size` should be odd.", call. = FALSE)
+    if (size < 3)
+      stop("`size` should be at least 3.", call. = FALSE)
+
+    add_step(
+      recipe,
+      step_window_new(
+        terms = check_ellipses(...),
+        trained = trained,
+        role = role,
+        size = size,
+        na.rm = na.rm,
+        statistic = statistic,
+        columns = columns,
+        names = names
+      )
+    )
+  }
+
+roll_funs <- c("mean", "median", "sd", "var", "sum", "prod", "min", "max")
+
+step_window_new <-
+  function(terms = NULL,
+           role = NA,
+           trained = FALSE,
+           size = NULL,
+           na.rm = NULL,
+           statistic = NULL,
+           columns = NULL,
+           names = names) {
+    step(
+      subclass = "window",
+      terms = terms,
+      role = role,
+      trained = trained,
+      size = size,
+      na.rm = na.rm,
+      statistic = statistic,
+      columns = columns,
+      names = names
+    )
+  }
+
+#' @export
+prep.step_window <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(x$terms, info = info)
+  
+  if (any(info$type[info$variable %in% col_names] != "numeric"))
+    stop("The selected variables should be numeric")
+  
+  if(!is.null(x$names)) {
+    if(length(x$names) != length(col_names))
+      stop("There were ", length(col_names), " term(s) selected but ",
+           length(x$names), " values for the new features ",
+           "were passed to `names`.", call. = FALSE)
+  }
+  
+  step_window_new(
+    terms = x$terms,
+    role = x$role,
+    trained = TRUE,
+    size = x$size,
+    na.rm = x$na.rm,
+    statistic = x$statistic,
+    columns = col_names,
+    names = x$names
+  )
+}
+
+#' @importFrom RcppRoll roll_max roll_maxl roll_maxr
+#' @importFrom RcppRoll roll_mean roll_meanl roll_meanr
+#' @importFrom RcppRoll roll_median roll_medianl roll_medianr
+#' @importFrom RcppRoll roll_min roll_minl roll_minr
+#' @importFrom RcppRoll roll_prod roll_prodl roll_prodr
+#' @importFrom RcppRoll roll_sd roll_sdl roll_sdr
+#' @importFrom RcppRoll roll_sum roll_suml roll_sumr
+#' @importFrom RcppRoll roll_var roll_varl roll_varr
+roller <- function(x, stat = "mean", window = 3L, na.rm = TRUE) {
+
+  m <- length(x)
+  
+  gap <- floor(window / 2)
+  if(m - window <= 2)
+    stop("The window is too large.", call. = FALSE)
+  
+  ## stats for centered window
+  roll_cl <- quote(
+    roll_mean(
+      x = x, n = window, weights = NULL, by = 1L,
+      fill = NA, partial = FALSE,
+      normalize = TRUE, na.rm = na.rm
+    )
+  )
+  
+  roll_cl[[1]] <- as.name(paste0("roll_", stat))
+  x2 <- eval(roll_cl)
+  
+  ## Fill in the left-hand points. Add enough data so that the
+  ## missing values at the start can be estimated and filled in
+  x2[1:gap] <- x2[gap + 1]
+  
+  ## Right-hand points
+  x2[(m - gap + 1):m] <- x2[m - gap]
+  x2
+}
+
+#' @importFrom tibble as_tibble is_tibble
+#' @export
+bake.step_window <- function(object, newdata, ...) {
+  for (i in seq(along = object$columns)) {
+    if (!is.null(object$names)) {
+      newdata[, object$names[i]] <-
+        roller(x = getElement(newdata, object$columns[i]),
+               stat = object$statistic,
+               na.rm = object$na.rm,
+               window = object$size)
+    } else {
+      newdata[, object$columns[i]] <-
+        roller(x = getElement(newdata, object$columns[i]),
+               stat = object$statistic,
+               na.rm = object$na.rm,
+               window = object$size)
+    }
+  }
+  newdata
+}
+
+
+print.step_window <-
+  function(x, width = max(20, options()$width - 28), ...) {
+    cat("Moving ", x$size, "-point ", x$statistic, " on ", sep = "")
+    if (x$trained) {
+      cat(format_ch_vec(x$columns, width = width))
+    } else
+      cat(format_selectors(x$terms, width = width))
+    if (x$trained)
+      cat(" [trained]\n")
+    else
+      cat("\n")
+    invisible(x)
+  }
diff --git a/build/vignette.rds b/build/vignette.rds
new file mode 100644
index 0000000..fef271a
Binary files /dev/null and b/build/vignette.rds differ
diff --git a/data/biomass.RData b/data/biomass.RData
new file mode 100644
index 0000000..fe3a25a
Binary files /dev/null and b/data/biomass.RData differ
diff --git a/data/covers.RData b/data/covers.RData
new file mode 100644
index 0000000..0b791f1
Binary files /dev/null and b/data/covers.RData differ
diff --git a/data/credit_data.RData b/data/credit_data.RData
new file mode 100644
index 0000000..b9ad9c3
Binary files /dev/null and b/data/credit_data.RData differ
diff --git a/data/datalist b/data/datalist
new file mode 100644
index 0000000..b5c6b75
--- /dev/null
+++ b/data/datalist
@@ -0,0 +1,4 @@
+Biomass: Biomass
+okc: okc
+credit_data: credit_data
+covers: covers
\ No newline at end of file
diff --git a/data/okc.RData b/data/okc.RData
new file mode 100644
index 0000000..c7e4497
Binary files /dev/null and b/data/okc.RData differ
diff --git a/inst/doc/Custom_Steps.R b/inst/doc/Custom_Steps.R
new file mode 100644
index 0000000..e3d3b56
--- /dev/null
+++ b/inst/doc/Custom_Steps.R
@@ -0,0 +1,154 @@
+## ----ex_setup, include=FALSE---------------------------------------------
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+
+## ----step_list-----------------------------------------------------------
+library(recipes)
+steps <- apropos("^step_")
+steps[!grepl("new$", steps)]
+
+## ----initial-------------------------------------------------------------
+data(biomass)
+str(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+## ----carbon_dist---------------------------------------------------------
+library(ggplot2)
+theme_set(theme_bw())
+ggplot(biomass_tr, aes(x = carbon)) + 
+  geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + 
+  geom_vline(xintercept = biomass_te$carbon[1], lty = 2)
+
+## ----initial_def---------------------------------------------------------
+step_percentile <- function(recipe, ..., role = NA, 
+                            trained = FALSE, ref_dist = NULL,
+                            approx = FALSE, 
+                            options = list(probs = (0:100)/100, names = TRUE)) {
+## bake but do not evaluate the variable selectors with
+## the `quos` function in `rlang`
+  terms <- rlang::quos(...) 
+  if(length(terms) == 0)
+    stop("Please supply at least one variable specification. See ?selections.")
+  add_step(
+    recipe, 
+    step_percentile_new(
+      terms = terms, 
+      trained = trained,
+      role = role, 
+      ref_dist = ref_dist,
+      approx = approx,
+      options = options))
+}
+
+## ----initialize----------------------------------------------------------
+step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, 
+                                ref_dist = NULL, approx = NULL, options = NULL) {
+  step(
+    subclass = "percentile", 
+    terms = terms,
+    role = role,
+    trained = trained,
+    ref_dist = ref_dist,
+    approx = approx,
+    options = options
+  )
+}
+
+## ----prep_1, eval = FALSE------------------------------------------------
+#  prep.step_percentile <- function(x, training, info = NULL, ...) {
+#    col_names <- terms_select(terms = x$terms, info = info)
+#  }
+
+## ----prep_2--------------------------------------------------------------
+get_pctl <- function(x, args) {
+  args$x <- x
+  do.call("quantile", args)
+}
+
+prep.step_percentile <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(terms = x$terms, info = info) 
+  ## You can add error trapping for non-numeric data here and so on.
+  ## We'll use the names later so
+  if(x$options$names == FALSE)
+    stop("`names` should be set to TRUE")
+  
+  if(!x$approx) {
+    x$ref_dist <- training[, col_names]
+  } else {
+    pctl <- lapply(
+      training[, col_names],  
+      get_pctl, 
+      args = x$options
+    )
+    x$ref_dist <- pctl
+  }
+  ## Always return the updated step
+  x
+}
+
+## ----bake----------------------------------------------------------------
+## Two helper functions
+pctl_by_mean <- function(x, ref) mean(ref <= x)
+
+pctl_by_approx <- function(x, ref) {
+  ## go from 1 column tibble to vector
+  x <- getElement(x, names(x))
+  ## get the percentiles values from the names (e.g. "10%")
+  p_grid <- as.numeric(gsub("%$", "", names(ref))) 
+  approx(x = ref, y = p_grid, xout = x)$y/100
+}
+
+bake.step_percentile <- function(object, newdata, ...) {
+  require(tibble)
+  ## For illustration (and not speed), we will loop through the affected variables
+  ## and do the computations
+  vars <- names(object$ref_dist)
+  
+  for(i in vars) {
+    if(!object$approx) {
+      ## We can use `apply` since tibbles do not drop dimensions:
+      newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, 
+                            ref = object$ref_dist[, i])
+    } else 
+      newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]])
+  }
+  ## Always convert to tibbles on the way out
+  as_tibble(newdata)
+}
+
+## ----example-------------------------------------------------------------
+rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)])
+rec_obj <- rec_obj %>%
+  step_percentile(all_predictors(), approx = TRUE) 
+
+rec_obj <- prep(rec_obj, training = biomass_tr)
+
+percentiles <- bake(rec_obj, biomass_te)
+percentiles
+
+## ----cdf_plot, echo = FALSE----------------------------------------------
+grid_pct <- rec_obj$steps[[1]]$options$probs
+plot_data <- data.frame(
+  carbon = c(
+    quantile(biomass_tr$carbon, probs = grid_pct), 
+    biomass_te$carbon
+  ),
+  percentile = c(grid_pct, percentiles$carbon),
+  dataset = rep(
+    c("Training", "Testing"), 
+    c(length(grid_pct), nrow(percentiles))
+  )
+)
+
+ggplot(plot_data, 
+       aes(x = carbon, y = percentile, col = dataset)) + 
+  geom_point(alpha = .4, cex = 2) + 
+  theme(legend.position = "top")
+
diff --git a/inst/doc/Custom_Steps.Rmd b/inst/doc/Custom_Steps.Rmd
new file mode 100644
index 0000000..a1cc3e1
--- /dev/null
+++ b/inst/doc/Custom_Steps.Rmd
@@ -0,0 +1,247 @@
+---
+title: "Creating Custom Step Functions"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Custom Steps}
+  %\VignetteEncoding{UTF-8}  
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+```{r ex_setup, include=FALSE}
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+```
+
+`recipes` contains a number of different steps included in the package:
+
+```{r step_list}
+library(recipes)
+steps <- apropos("^step_")
+steps[!grepl("new$", steps)]
+```
+
+You might want to make your own and this page describes how to do that. If you are looking for good examples of existing steps, I would suggest looking at the code for [centering](https://github.com/topepo/recipes/blob/master/R/center.R) or [PCA](https://github.com/topepo/recipes/blob/master/R/pca.R) to start. 
+
+
+# A new step definition
+
+At an example, let's create a step that replaces the value of a variable with its percentile from the training set. The date that I'll use is from the `recipes` package:
+
+```{r initial}
+data(biomass)
+str(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+```
+
+To illustrate the transformation with the `carbon` variable, the training set distribution of that variables is shown below with a vertical line for the first value of the test set. 
+
+```{r carbon_dist}
+library(ggplot2)
+theme_set(theme_bw())
+ggplot(biomass_tr, aes(x = carbon)) + 
+  geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + 
+  geom_vline(xintercept = biomass_te$carbon[1], lty = 2)
+```
+
+Based on the training set, `r round(mean(biomass_tr$carbon <= biomass_te$carbon[1])*100, 1)`% of the data are less than a value of `r biomass_te$carbon[1]`. There are some applications where it might be advantageous to represent the predictor values are percentiles rather than their original values. 
+
+Our new step will do this computation for any numeric variables of interest. We will call this `step_percentile`. The code below is designed for illustration and not speed or best practices. I've left out a lot of error trapping that we would want in a real implementation.  
+
+# Create the initial function. 
+
+The user-exposed function `step_percentile` is just a simple wrapper around an internal function called `add_step`. This function takes the same arguments as your function and simply adds it to a new recipe. The `...` signfies the variable selectors that can be used.
+
+```{r initial_def}
+step_percentile <- function(recipe, ..., role = NA, 
+                            trained = FALSE, ref_dist = NULL,
+                            approx = FALSE, 
+                            options = list(probs = (0:100)/100, names = TRUE)) {
+## bake but do not evaluate the variable selectors with
+## the `quos` function in `rlang`
+  terms <- rlang::quos(...) 
+  if(length(terms) == 0)
+    stop("Please supply at least one variable specification. See ?selections.")
+  add_step(
+    recipe, 
+    step_percentile_new(
+      terms = terms, 
+      trained = trained,
+      role = role, 
+      ref_dist = ref_dist,
+      approx = approx,
+      options = options))
+}
+```
+
+You should always keep the first four arguments (`recipe` though `trained`) the same as listed above. Some notes:
+
+ * the `role` argument is used when you either 1) create new variables and want their role to be pre-set or 2) replace the existing variables with new values. The latter is what we will be doing and using `role = NA` will leave the existing role intact. 
+ * `trained` is set by the package when the estimation step has been run. You should default your function definition's argument to `FALSE`.  
+
+I've added extra arguments specific to this step. In order to calculate the percentile, the training data for the relevant columns will need to be saved. This data will be saved in the `ref_dist` object. 
+However, this might be problematic if the data set is large. `approx` would be used when you want to save a grid of pre-computed percentiles from the training set and use these to estimate the percentile for a new data point. If `approx = TRUE`, the argument `ref_dist` will contain the grid for each variable. 
+
+We will use the `stats::quantile` to compute the grid. However, we might also want to have control over the granularity of this grid, so the `options` argument will be used to define how that calculations is done. We could just use the ellipses (aka `...`) so that any options passed to `step_percentile` that are not one of its arguments will then be passed to `stats::quantile`. We recommend making a seperate list object with the options and use these inside the function. 
+
+
+# Initialization of new objects
+
+Next, you can utilize the internal function `step` that sets the class of new objects. Using `subclass = "percentile"` will set the class of new objects to `"step_percentile". 
+
+```{r initialize}
+step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, 
+                                ref_dist = NULL, approx = NULL, options = NULL) {
+  step(
+    subclass = "percentile", 
+    terms = terms,
+    role = role,
+    trained = trained,
+    ref_dist = ref_dist,
+    approx = approx,
+    options = options
+  )
+}
+```
+
+# Define the estimation procedure
+
+You will need to create a new `prep` method for your step's class. To do this, three arguments that the method should have:
+
+```r
+function(x, training, info = NULL)
+```
+
+where
+
+ * `x` will be the `step_percentile` object
+ * `training` will be a _tibble_ that has the training set data
+ * `info` will also be a tibble that has information on the current set of data available. This information is updated as each step is evaluated by its specific `prep` method so it may not have the variables from the original data. The columns in this tibble are `variable` (the variable name), `type` (currently either "numeric" or "nominal"), `role` (defining the variable's role), and `source` (either "original" or "derived" depending on where it originated).
+
+You can define other options. 
+
+The first thing that you might want to do in the `prep` function is to translate the specification listed in the `terms` argument to column names in the current data. There is an internal function called `terms_select` that can be used to obtain this. 
+
+```{r prep_1, eval = FALSE}
+prep.step_percentile <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(terms = x$terms, info = info) 
+}
+```
+
+Once we have this, we can either save the original data columns or estimate the approximation grid. For the grid, we will use a helper function that enables us to run `do.call` on a list of arguments that include the `options` list.  
+
+```{r prep_2}
+get_pctl <- function(x, args) {
+  args$x <- x
+  do.call("quantile", args)
+}
+
+prep.step_percentile <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(terms = x$terms, info = info) 
+  ## You can add error trapping for non-numeric data here and so on.
+  ## We'll use the names later so
+  if(x$options$names == FALSE)
+    stop("`names` should be set to TRUE")
+  
+  if(!x$approx) {
+    x$ref_dist <- training[, col_names]
+  } else {
+    pctl <- lapply(
+      training[, col_names],  
+      get_pctl, 
+      args = x$options
+    )
+    x$ref_dist <- pctl
+  }
+  ## Always return the updated step
+  x
+}
+```
+
+# Create the `bake` method
+
+Remember that the `prep` function does not _apply_ the step to the data; it only estimates any required values such as `ref_dist`. We will need to create a new method for our `step_percentile` class. The minimum arguments for this are
+
+```r
+function(object, newdata, ...)
+```
+
+where `object` is the updated step function that has been through the corresponding `prep` code and `newdata` is a tibble of data to be preprocessingcessed. 
+
+Here is the code to convert the new data to percentiles. Two initial helper functions handle the two cases (approximation or not). We always return a tibble as the output. 
+
+```{r bake}
+## Two helper functions
+pctl_by_mean <- function(x, ref) mean(ref <= x)
+
+pctl_by_approx <- function(x, ref) {
+  ## go from 1 column tibble to vector
+  x <- getElement(x, names(x))
+  ## get the percentiles values from the names (e.g. "10%")
+  p_grid <- as.numeric(gsub("%$", "", names(ref))) 
+  approx(x = ref, y = p_grid, xout = x)$y/100
+}
+
+bake.step_percentile <- function(object, newdata, ...) {
+  require(tibble)
+  ## For illustration (and not speed), we will loop through the affected variables
+  ## and do the computations
+  vars <- names(object$ref_dist)
+  
+  for(i in vars) {
+    if(!object$approx) {
+      ## We can use `apply` since tibbles do not drop dimensions:
+      newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, 
+                            ref = object$ref_dist[, i])
+    } else 
+      newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]])
+  }
+  ## Always convert to tibbles on the way out
+  as_tibble(newdata)
+}
+```
+
+# Running the example
+
+Let's use the example data to make sure that it works: 
+
+```{r example}
+rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)])
+rec_obj <- rec_obj %>%
+  step_percentile(all_predictors(), approx = TRUE) 
+
+rec_obj <- prep(rec_obj, training = biomass_tr)
+
+percentiles <- bake(rec_obj, biomass_te)
+percentiles
+```
+
+The plot below shows how the original data line up with the percentiles for each split of the data for one of the predictors:
+
+```{r cdf_plot, echo = FALSE}
+grid_pct <- rec_obj$steps[[1]]$options$probs
+plot_data <- data.frame(
+  carbon = c(
+    quantile(biomass_tr$carbon, probs = grid_pct), 
+    biomass_te$carbon
+  ),
+  percentile = c(grid_pct, percentiles$carbon),
+  dataset = rep(
+    c("Training", "Testing"), 
+    c(length(grid_pct), nrow(percentiles))
+  )
+)
+
+ggplot(plot_data, 
+       aes(x = carbon, y = percentile, col = dataset)) + 
+  geom_point(alpha = .4, cex = 2) + 
+  theme(legend.position = "top")
+```
diff --git a/inst/doc/Custom_Steps.html b/inst/doc/Custom_Steps.html
new file mode 100644
index 0000000..856fd24
--- /dev/null
+++ b/inst/doc/Custom_Steps.html
@@ -0,0 +1,315 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8" />
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+
+<title>Creating Custom Step Functions</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+div.sourceCode { overflow-x: auto; }
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+</style>
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20800px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%2020px%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200% [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Creating Custom Step Functions</h1>
+
+
+<div id="TOC">
+<ul>
+<li><a href="#a-new-step-definition">A new step definition</a></li>
+<li><a href="#create-the-initial-function.">Create the initial function.</a></li>
+<li><a href="#initialization-of-new-objects">Initialization of new objects</a></li>
+<li><a href="#define-the-estimation-procedure">Define the estimation procedure</a></li>
+<li><a href="#create-the-bake-method">Create the <code>bake</code> method</a></li>
+<li><a href="#running-the-example">Running the example</a></li>
+</ul>
+</div>
+
+<p><code>recipes</code> contains a number of different steps included in the package:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(recipes)
+steps <-<span class="st"> </span><span class="kw">apropos</span>(<span class="st">"^step_"</span>)
+steps[<span class="op">!</span><span class="kw">grepl</span>(<span class="st">"new$"</span>, steps)]
+<span class="co">#>  [1] "step_BoxCox"       "step_YeoJohnson"   "step_bagimpute"   </span>
+<span class="co">#>  [4] "step_bin2factor"   "step_center"       "step_classdist"   </span>
+<span class="co">#>  [7] "step_corr"         "step_date"         "step_depth"       </span>
+<span class="co">#> [10] "step_discretize"   "step_dummy"        "step_holiday"     </span>
+<span class="co">#> [13] "step_hyperbolic"   "step_ica"          "step_interact"    </span>
+<span class="co">#> [16] "step_intercept"    "step_invlogit"     "step_isomap"      </span>
+<span class="co">#> [19] "step_knnimpute"    "step_kpca"         "step_lincomb"     </span>
+<span class="co">#> [22] "step_log"          "step_logit"        "step_meanimpute"  </span>
+<span class="co">#> [25] "step_modeimpute"   "step_ns"           "step_nzv"         </span>
+<span class="co">#> [28] "step_ordinalscore" "step_other"        "step_pca"         </span>
+<span class="co">#> [31] "step_poly"         "step_range"        "step_ratio"       </span>
+<span class="co">#> [34] "step_regex"        "step_rm"           "step_scale"       </span>
+<span class="co">#> [37] "step_shuffle"      "step_spatialsign"  "step_sqrt"        </span>
+<span class="co">#> [40] "step_window"</span></code></pre></div>
+<p>You might want to make your own and this page describes how to do that. If you are looking for good examples of existing steps, I would suggest looking at the code for <a href="https://github.com/topepo/recipes/blob/master/R/center.R">centering</a> or <a href="https://github.com/topepo/recipes/blob/master/R/pca.R">PCA</a> to start.</p>
+<div id="a-new-step-definition" class="section level1">
+<h1>A new step definition</h1>
+<p>At an example, let’s create a step that replaces the value of a variable with its percentile from the training set. The date that I’ll use is from the <code>recipes</code> package:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">data</span>(biomass)
+<span class="kw">str</span>(biomass)
+<span class="co">#> 'data.frame':    536 obs. of  8 variables:</span>
+<span class="co">#>  $ sample  : chr  "Akhrot Shell" "Alabama Oak Wood Waste" "Alder" "Alfalfa" ...</span>
+<span class="co">#>  $ dataset : chr  "Training" "Training" "Training" "Training" ...</span>
+<span class="co">#>  $ carbon  : num  49.8 49.5 47.8 45.1 46.8 ...</span>
+<span class="co">#>  $ hydrogen: num  5.64 5.7 5.8 4.97 5.4 5.75 5.99 5.7 5.5 5.9 ...</span>
+<span class="co">#>  $ oxygen  : num  42.9 41.3 46.2 35.6 40.7 ...</span>
+<span class="co">#>  $ nitrogen: num  0.41 0.2 0.11 3.3 1 2.04 2.68 1.7 0.8 1.2 ...</span>
+<span class="co">#>  $ sulfur  : num  0 0 0.02 0.16 0.02 0.1 0.2 0.2 0 0.1 ...</span>
+<span class="co">#>  $ HHV     : num  20 19.2 18.3 18.2 18.4 ...</span>
+
+biomass_tr <-<span class="st"> </span>biomass[biomass<span class="op">$</span>dataset <span class="op">==</span><span class="st"> "Training"</span>,]
+biomass_te <-<span class="st"> </span>biomass[biomass<span class="op">$</span>dataset <span class="op">==</span><span class="st"> "Testing"</span>,]</code></pre></div>
+<p>To illustrate the transformation with the <code>carbon</code> variable, the training set distribution of that variables is shown below with a vertical line for the first value of the test set.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
+<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
+<span class="kw">ggplot</span>(biomass_tr, <span class="kw">aes</span>(<span class="dt">x =</span> carbon)) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_histogram</span>(<span class="dt">binwidth =</span> <span class="dv">5</span>, <span class="dt">col =</span> <span class="st">"blue"</span>, <span class="dt">fill =</span> <span class="st">"blue"</span>, <span class="dt">alpha =</span> .<span class="dv">5</span>) <span class="op">+</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">geom_vline</span>(<span class="dt">xintercept =</span> biomass_te<span class="op">$</span>carbon[<span class="dv">1</span>], <span class="dt">lty =</span> <span class="dv">2</span>)</code></pre></div>
+<p><img src=" [...]
+<p>Based on the training set, 42.1% of the data are less than a value of 46.35. There are some applications where it might be advantageous to represent the predictor values are percentiles rather than their original values.</p>
+<p>Our new step will do this computation for any numeric variables of interest. We will call this <code>step_percentile</code>. The code below is designed for illustration and not speed or best practices. I’ve left out a lot of error trapping that we would want in a real implementation.</p>
+</div>
+<div id="create-the-initial-function." class="section level1">
+<h1>Create the initial function.</h1>
+<p>The user-exposed function <code>step_percentile</code> is just a simple wrapper around an internal function called <code>add_step</code>. This function takes the same arguments as your function and simply adds it to a new recipe. The <code>...</code> signfies the variable selectors that can be used.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">step_percentile <-<span class="st"> </span><span class="cf">function</span>(recipe, ..., <span class="dt">role =</span> <span class="ot">NA</span>, 
+                            <span class="dt">trained =</span> <span class="ot">FALSE</span>, <span class="dt">ref_dist =</span> <span class="ot">NULL</span>,
+                            <span class="dt">approx =</span> <span class="ot">FALSE</span>, 
+                            <span class="dt">options =</span> <span class="kw">list</span>(<span class="dt">probs =</span> (<span class="dv">0</span><span class="op">:</span><span class="dv">100</span>)<span class="op">/</span><span class="dv">100</span>, <span class="dt">names =</span> <span class="ot">TRUE</span>)) {
+## bake but do not evaluate the variable selectors with
+## the `quos` function in `rlang`
+  terms <-<span class="st"> </span>rlang<span class="op">::</span><span class="kw">quos</span>(...) 
+  <span class="cf">if</span>(<span class="kw">length</span>(terms) <span class="op">==</span><span class="st"> </span><span class="dv">0</span>)
+    <span class="kw">stop</span>(<span class="st">"Please supply at least one variable specification. See ?selections."</span>)
+  <span class="kw">add_step</span>(
+    recipe, 
+    <span class="kw">step_percentile_new</span>(
+      <span class="dt">terms =</span> terms, 
+      <span class="dt">trained =</span> trained,
+      <span class="dt">role =</span> role, 
+      <span class="dt">ref_dist =</span> ref_dist,
+      <span class="dt">approx =</span> approx,
+      <span class="dt">options =</span> options))
+}</code></pre></div>
+<p>You should always keep the first four arguments (<code>recipe</code> though <code>trained</code>) the same as listed above. Some notes:</p>
+<ul>
+<li>the <code>role</code> argument is used when you either 1) create new variables and want their role to be pre-set or 2) replace the existing variables with new values. The latter is what we will be doing and using <code>role = NA</code> will leave the existing role intact.</li>
+<li><code>trained</code> is set by the package when the estimation step has been run. You should default your function definition’s argument to <code>FALSE</code>.</li>
+</ul>
+<p>I’ve added extra arguments specific to this step. In order to calculate the percentile, the training data for the relevant columns will need to be saved. This data will be saved in the <code>ref_dist</code> object. However, this might be problematic if the data set is large. <code>approx</code> would be used when you want to save a grid of pre-computed percentiles from the training set and use these to estimate the percentile for a new data point. If <code>approx = TRUE</code>, the ar [...]
+<p>We will use the <code>stats::quantile</code> to compute the grid. However, we might also want to have control over the granularity of this grid, so the <code>options</code> argument will be used to define how that calculations is done. We could just use the ellipses (aka <code>...</code>) so that any options passed to <code>step_percentile</code> that are not one of its arguments will then be passed to <code>stats::quantile</code>. We recommend making a seperate list object with the o [...]
+</div>
+<div id="initialization-of-new-objects" class="section level1">
+<h1>Initialization of new objects</h1>
+<p>Next, you can utilize the internal function <code>step</code> that sets the class of new objects. Using <code>subclass = "percentile"</code> will set the class of new objects to `“step_percentile”.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">step_percentile_new <-<span class="st"> </span><span class="cf">function</span>(<span class="dt">terms =</span> <span class="ot">NULL</span>, <span class="dt">role =</span> <span class="ot">NA</span>, <span class="dt">trained =</span> <span class="ot">FALSE</span>, 
+                                <span class="dt">ref_dist =</span> <span class="ot">NULL</span>, <span class="dt">approx =</span> <span class="ot">NULL</span>, <span class="dt">options =</span> <span class="ot">NULL</span>) {
+  <span class="kw">step</span>(
+    <span class="dt">subclass =</span> <span class="st">"percentile"</span>, 
+    <span class="dt">terms =</span> terms,
+    <span class="dt">role =</span> role,
+    <span class="dt">trained =</span> trained,
+    <span class="dt">ref_dist =</span> ref_dist,
+    <span class="dt">approx =</span> approx,
+    <span class="dt">options =</span> options
+  )
+}</code></pre></div>
+</div>
+<div id="define-the-estimation-procedure" class="section level1">
+<h1>Define the estimation procedure</h1>
+<p>You will need to create a new <code>prep</code> method for your step’s class. To do this, three arguments that the method should have:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="cf">function</span>(x, training, <span class="dt">info =</span> <span class="ot">NULL</span>)</code></pre></div>
+<p>where</p>
+<ul>
+<li><code>x</code> will be the <code>step_percentile</code> object</li>
+<li><code>training</code> will be a <em>tibble</em> that has the training set data</li>
+<li><code>info</code> will also be a tibble that has information on the current set of data available. This information is updated as each step is evaluated by its specific <code>prep</code> method so it may not have the variables from the original data. The columns in this tibble are <code>variable</code> (the variable name), <code>type</code> (currently either “numeric” or “nominal”), <code>role</code> (defining the variable’s role), and <code>source</code> (either “original” or “deriv [...]
+</ul>
+<p>You can define other options.</p>
+<p>The first thing that you might want to do in the <code>prep</code> function is to translate the specification listed in the <code>terms</code> argument to column names in the current data. There is an internal function called <code>terms_select</code> that can be used to obtain this.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">prep.step_percentile <-<span class="st"> </span><span class="cf">function</span>(x, training, <span class="dt">info =</span> <span class="ot">NULL</span>, ...) {
+  col_names <-<span class="st"> </span><span class="kw">terms_select</span>(<span class="dt">terms =</span> x<span class="op">$</span>terms, <span class="dt">info =</span> info) 
+}</code></pre></div>
+<p>Once we have this, we can either save the original data columns or estimate the approximation grid. For the grid, we will use a helper function that enables us to run <code>do.call</code> on a list of arguments that include the <code>options</code> list.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">get_pctl <-<span class="st"> </span><span class="cf">function</span>(x, args) {
+  args<span class="op">$</span>x <-<span class="st"> </span>x
+  <span class="kw">do.call</span>(<span class="st">"quantile"</span>, args)
+}
+
+prep.step_percentile <-<span class="st"> </span><span class="cf">function</span>(x, training, <span class="dt">info =</span> <span class="ot">NULL</span>, ...) {
+  col_names <-<span class="st"> </span><span class="kw">terms_select</span>(<span class="dt">terms =</span> x<span class="op">$</span>terms, <span class="dt">info =</span> info) 
+  ## You can add error trapping for non-numeric data here and so on.
+  ## We'll use the names later so
+  <span class="cf">if</span>(x<span class="op">$</span>options<span class="op">$</span>names <span class="op">==</span><span class="st"> </span><span class="ot">FALSE</span>)
+    <span class="kw">stop</span>(<span class="st">"`names` should be set to TRUE"</span>)
+  
+  <span class="cf">if</span>(<span class="op">!</span>x<span class="op">$</span>approx) {
+    x<span class="op">$</span>ref_dist <-<span class="st"> </span>training[, col_names]
+  } <span class="cf">else</span> {
+    pctl <-<span class="st"> </span><span class="kw">lapply</span>(
+      training[, col_names],  
+      get_pctl, 
+      <span class="dt">args =</span> x<span class="op">$</span>options
+    )
+    x<span class="op">$</span>ref_dist <-<span class="st"> </span>pctl
+  }
+  ## Always return the updated step
+  x
+}</code></pre></div>
+</div>
+<div id="create-the-bake-method" class="section level1">
+<h1>Create the <code>bake</code> method</h1>
+<p>Remember that the <code>prep</code> function does not <em>apply</em> the step to the data; it only estimates any required values such as <code>ref_dist</code>. We will need to create a new method for our <code>step_percentile</code> class. The minimum arguments for this are</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="cf">function</span>(object, newdata, ...)</code></pre></div>
+<p>where <code>object</code> is the updated step function that has been through the corresponding <code>prep</code> code and <code>newdata</code> is a tibble of data to be preprocessingcessed.</p>
+<p>Here is the code to convert the new data to percentiles. Two initial helper functions handle the two cases (approximation or not). We always return a tibble as the output.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## Two helper functions
+pctl_by_mean <-<span class="st"> </span><span class="cf">function</span>(x, ref) <span class="kw">mean</span>(ref <span class="op"><=</span><span class="st"> </span>x)
+
+pctl_by_approx <-<span class="st"> </span><span class="cf">function</span>(x, ref) {
+  ## go from 1 column tibble to vector
+  x <-<span class="st"> </span><span class="kw">getElement</span>(x, <span class="kw">names</span>(x))
+  ## get the percentiles values from the names (e.g. "10%")
+  p_grid <-<span class="st"> </span><span class="kw">as.numeric</span>(<span class="kw">gsub</span>(<span class="st">"%$"</span>, <span class="st">""</span>, <span class="kw">names</span>(ref))) 
+  <span class="kw">approx</span>(<span class="dt">x =</span> ref, <span class="dt">y =</span> p_grid, <span class="dt">xout =</span> x)<span class="op">$</span>y<span class="op">/</span><span class="dv">100</span>
+}
+
+bake.step_percentile <-<span class="st"> </span><span class="cf">function</span>(object, newdata, ...) {
+  <span class="kw">require</span>(tibble)
+  ## For illustration (and not speed), we will loop through the affected variables
+  ## and do the computations
+  vars <-<span class="st"> </span><span class="kw">names</span>(object<span class="op">$</span>ref_dist)
+  
+  <span class="cf">for</span>(i <span class="cf">in</span> vars) {
+    <span class="cf">if</span>(<span class="op">!</span>object<span class="op">$</span>approx) {
+      ## We can use `apply` since tibbles do not drop dimensions:
+      newdata[, i] <-<span class="st"> </span><span class="kw">apply</span>(newdata[, i], <span class="dv">1</span>, pctl_by_mean, 
+                            <span class="dt">ref =</span> object<span class="op">$</span>ref_dist[, i])
+    } <span class="cf">else</span> 
+      newdata[, i] <-<span class="st"> </span><span class="kw">pctl_by_approx</span>(newdata[, i], object<span class="op">$</span>ref_dist[[i]])
+  }
+  ## Always convert to tibbles on the way out
+  <span class="kw">as_tibble</span>(newdata)
+}</code></pre></div>
+</div>
+<div id="running-the-example" class="section level1">
+<h1>Running the example</h1>
+<p>Let’s use the example data to make sure that it works:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">rec_obj <-<span class="st"> </span><span class="kw">recipe</span>(HHV <span class="op">~</span><span class="st"> </span>., <span class="dt">data =</span> biomass_tr[, <span class="op">-</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">2</span>)])
+rec_obj <-<span class="st"> </span>rec_obj <span class="op">%>%</span>
+<span class="st">  </span><span class="kw">step_percentile</span>(<span class="kw">all_predictors</span>(), <span class="dt">approx =</span> <span class="ot">TRUE</span>) 
+
+rec_obj <-<span class="st"> </span><span class="kw">prep</span>(rec_obj, <span class="dt">training =</span> biomass_tr)
+<span class="co">#> step 1 percentile training</span>
+
+percentiles <-<span class="st"> </span><span class="kw">bake</span>(rec_obj, biomass_te)
+percentiles
+<span class="co">#> # A tibble: 80 x 5</span>
+<span class="co">#>    carbon hydrogen oxygen nitrogen sulfur</span>
+<span class="co">#>     <dbl>    <dbl>  <dbl>    <dbl>  <dbl></span>
+<span class="co">#>  1 0.4209   0.4500 0.9026    0.215  0.735</span>
+<span class="co">#>  2 0.1800   0.3850 0.9217    0.928  0.839</span>
+<span class="co">#>  3 0.1561   0.3850 0.9447    0.900  0.805</span>
+<span class="co">#>  4 0.4233   0.7750 0.2800    0.845  0.902</span>
+<span class="co">#>  5 0.6662   0.8667 0.6314    0.155  0.090</span>
+<span class="co">#>  6 0.2175   0.3850 0.5363    0.495  0.700</span>
+<span class="co">#>  7 0.0803   0.2713 0.9859    0.695  0.903</span>
+<span class="co">#>  8 0.1395   0.1260 0.1604    0.606  0.700</span>
+<span class="co">#>  9 0.0226   0.1035 0.1312    0.126  0.996</span>
+<span class="co">#> 10 0.0178   0.0821 0.0987    0.972  0.974</span>
+<span class="co">#> # ... with 70 more rows</span></code></pre></div>
+<p>The plot below shows how the original data line up with the percentiles for each split of the data for one of the predictors:</p>
+<p><img src=" [...]
+</div>
+
+<script type="text/javascript">
+window.onload = function() {
+  var i, fig = 1, caps = document.getElementsByClassName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.className !== 'figure' || cap.nodeName !== 'P')
+      continue;
+    cap.innerHTML = '<span>Figure ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+  fig = 1;
+  caps = document.getElementsByTagName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.nodeName !== 'TABLE') continue;
+    cap.innerHTML = '<span>Table ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+}
+</script>
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/Ordering.Rmd b/inst/doc/Ordering.Rmd
new file mode 100644
index 0000000..1a1c513
--- /dev/null
+++ b/inst/doc/Ordering.Rmd
@@ -0,0 +1,28 @@
+---
+title: "Ordering of Steps"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Ordering of Steps}
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider:
+
+* If using a Box-Cox transformation, don't center the data first or do any operations that might make the data non-positive. Alternatively, use the Yeo-Johnson transformation so you don't have to worry about this. 
+* Recipes do not automatically create dummy variables (unlike _most_ formula methods). If you want to center, scale, or do any other operations on _all_ of the predictors, run `step_dummy` first so that numeric columns are in the data set instead of factors. 
+* As noted in the help file for `step_interact`, you should make dummy variables _before_ creating the interactions.
+* If you are lumping infrequently categories together with `step_other`, call `step_other` before `step_dummy`.
+
+While your project's needs may vary, here is a suggested order of _potential_ steps that should work for most problems:
+
+1. Impute
+1. Individual transformations for skewness and other issues
+1. Discretize (if needed and if you have no other choice) 
+1. Create dummy variables
+1. Create interactions
+1. Normalization steps (center, scale, range, etc) 
+1. Multivariate transformation (e.g. PCA, spatial sign, etc) 
+
+Again, your milage may vary for your particular problem. 
diff --git a/inst/doc/Ordering.html b/inst/doc/Ordering.html
new file mode 100644
index 0000000..74dc178
--- /dev/null
+++ b/inst/doc/Ordering.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8" />
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+
+<title>Ordering of Steps</title>
+
+
+
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20800px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%2020px%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200% [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Ordering of Steps</h1>
+
+
+
+<p>In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider:</p>
+<ul>
+<li>If using a Box-Cox transformation, don’t center the data first or do any operations that might make the data non-positive. Alternatively, use the Yeo-Johnson transformation so you don’t have to worry about this.</li>
+<li>Recipes do not automatically create dummy variables (unlike <em>most</em> formula methods). If you want to center, scale, or do any other operations on <em>all</em> of the predictors, run <code>step_dummy</code> first so that numeric columns are in the data set instead of factors.</li>
+<li>As noted in the help file for <code>step_interact</code>, you should make dummy variables <em>before</em> creating the interactions.</li>
+<li>If you are lumping infrequently categories together with <code>step_other</code>, call <code>step_other</code> before <code>step_dummy</code>.</li>
+</ul>
+<p>While your project’s needs may vary, here is a suggested order of <em>potential</em> steps that should work for most problems:</p>
+<ol style="list-style-type: decimal">
+<li>Impute</li>
+<li>Individual transformations for skewness and other issues</li>
+<li>Discretize (if needed and if you have no other choice)</li>
+<li>Create dummy variables</li>
+<li>Create interactions</li>
+<li>Normalization steps (center, scale, range, etc)</li>
+<li>Multivariate transformation (e.g. PCA, spatial sign, etc)</li>
+</ol>
+<p>Again, your milage may vary for your particular problem.</p>
+
+<script type="text/javascript">
+window.onload = function() {
+  var i, fig = 1, caps = document.getElementsByClassName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.className !== 'figure' || cap.nodeName !== 'P')
+      continue;
+    cap.innerHTML = '<span>Figure ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+  fig = 1;
+  caps = document.getElementsByTagName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.nodeName !== 'TABLE') continue;
+    cap.innerHTML = '<span>Table ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+}
+</script>
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/Selecting_Variables.R b/inst/doc/Selecting_Variables.R
new file mode 100644
index 0000000..dad1ba7
--- /dev/null
+++ b/inst/doc/Selecting_Variables.R
@@ -0,0 +1,33 @@
+## ----ex_setup, include=FALSE---------------------------------------------
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+
+## ----credit--------------------------------------------------------------
+library(recipes)
+data("credit_data")
+str(credit_data)
+
+rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data)
+rec
+
+## ----var_info_orig-------------------------------------------------------
+summary(rec, original = TRUE)
+
+## ----dummy_1-------------------------------------------------------------
+dummied <- rec %>% step_dummy(all_nominal())
+
+## ----dummy_2-------------------------------------------------------------
+dummied <- rec %>% step_dummy(Records) # or
+dummied <- rec %>% step_dummy(all_nominal(), - Status) # or
+dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) 
+
+## ----dummy_3-------------------------------------------------------------
+dummied <- prep(dummied, training = credit_data)
+with_dummy <- bake(dummied, newdata = credit_data)
+with_dummy
+
diff --git a/inst/doc/Selecting_Variables.Rmd b/inst/doc/Selecting_Variables.Rmd
new file mode 100644
index 0000000..9f7b6ea
--- /dev/null
+++ b/inst/doc/Selecting_Variables.Rmd
@@ -0,0 +1,73 @@
+---
+title: "Selecting Variables"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Selecting Variables}
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+```{r ex_setup, include=FALSE}
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+```
+
+When recipe steps are used, there are different approaches that can be used to select which variables or features should be used. 
+
+The three main characteristics of variables that can be queried: 
+
+ * the name of the variable
+ * the data type (e.g. numeric or nominal)
+ * the role that was declared by the recipe
+ 
+The manual pages for `?selections` and  `?has_role` have details about the available selection methods. 
+ 
+To illustrate this, the credit data will be used: 
+
+```{r credit}
+library(recipes)
+data("credit_data")
+str(credit_data)
+
+rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data)
+rec
+```
+
+Before any steps are used the information on the original variables is:
+
+```{r var_info_orig}
+summary(rec, original = TRUE)
+```
+
+We can add a step to compute dummy variables on the non-numeric data after we impute any missing data:
+
+```{r dummy_1}
+dummied <- rec %>% step_dummy(all_nominal())
+```
+
+This will capture _any_ variables that are either character strings or factors: `Status` and `Records`. However, since `Status` is our outcome, we might want to keep it as a factor so we can _subtract_ that variable out either by name or by role:
+
+```{r dummy_2}
+dummied <- rec %>% step_dummy(Records) # or
+dummied <- rec %>% step_dummy(all_nominal(), - Status) # or
+dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) 
+```
+
+Using the last definition: 
+
+```{r dummy_3}
+dummied <- prep(dummied, training = credit_data)
+with_dummy <- bake(dummied, newdata = credit_data)
+with_dummy
+```
+
+`Status` is unaffected. 
+
+One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, `Records` is a factor variable before the step is executed. Afterwards, `Records` is gone and the binary variable `Records_yes` is in its place. One reason to have general selection routines like `all_predictors` or `contains` is to be able to select variables that have not be created yet. 
+
diff --git a/inst/doc/Selecting_Variables.html b/inst/doc/Selecting_Variables.html
new file mode 100644
index 0000000..dcc12f7
--- /dev/null
+++ b/inst/doc/Selecting_Variables.html
@@ -0,0 +1,181 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8" />
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+
+<title>Selecting Variables</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+div.sourceCode { overflow-x: auto; }
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+</style>
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20800px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%2020px%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200% [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Selecting Variables</h1>
+
+
+
+<p>When recipe steps are used, there are different approaches that can be used to select which variables or features should be used.</p>
+<p>The three main characteristics of variables that can be queried:</p>
+<ul>
+<li>the name of the variable</li>
+<li>the data type (e.g. numeric or nominal)</li>
+<li>the role that was declared by the recipe</li>
+</ul>
+<p>The manual pages for <code>?selections</code> and <code>?has_role</code> have details about the available selection methods.</p>
+<p>To illustrate this, the credit data will be used:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(recipes)
+<span class="kw">data</span>(<span class="st">"credit_data"</span>)
+<span class="kw">str</span>(credit_data)
+<span class="co">#> 'data.frame':    4454 obs. of  14 variables:</span>
+<span class="co">#>  $ Status   : Factor w/ 2 levels "bad","good": 2 2 1 2 2 2 2 2 2 1 ...</span>
+<span class="co">#>  $ Seniority: int  9 17 10 0 0 1 29 9 0 0 ...</span>
+<span class="co">#>  $ Home     : Factor w/ 6 levels "ignore","other",..: 6 6 3 6 6 3 3 4 3 4 ...</span>
+<span class="co">#>  $ Time     : int  60 60 36 60 36 60 60 12 60 48 ...</span>
+<span class="co">#>  $ Age      : int  30 58 46 24 26 36 44 27 32 41 ...</span>
+<span class="co">#>  $ Marital  : Factor w/ 5 levels "divorced","married",..: 2 5 2 4 4 2 2 4 2 2 ...</span>
+<span class="co">#>  $ Records  : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 1 ...</span>
+<span class="co">#>  $ Job      : Factor w/ 4 levels "fixed","freelance",..: 2 1 2 1 1 1 1 1 2 4 ...</span>
+<span class="co">#>  $ Expenses : int  73 48 90 63 46 75 75 35 90 90 ...</span>
+<span class="co">#>  $ Income   : int  129 131 200 182 107 214 125 80 107 80 ...</span>
+<span class="co">#>  $ Assets   : int  0 0 3000 2500 0 3500 10000 0 15000 0 ...</span>
+<span class="co">#>  $ Debt     : int  0 0 0 0 0 0 0 0 0 0 ...</span>
+<span class="co">#>  $ Amount   : int  800 1000 2000 900 310 650 1600 200 1200 1200 ...</span>
+<span class="co">#>  $ Price    : int  846 1658 2985 1325 910 1645 1800 1093 1957 1468 ...</span>
+
+rec <-<span class="st"> </span><span class="kw">recipe</span>(Status <span class="op">~</span><span class="st"> </span>Seniority <span class="op">+</span><span class="st"> </span>Time <span class="op">+</span><span class="st"> </span>Age <span class="op">+</span><span class="st"> </span>Records, <span class="dt">data =</span> credit_data)
+rec
+<span class="co">#> Data Recipe</span>
+<span class="co">#> </span>
+<span class="co">#> Inputs:</span>
+<span class="co">#> </span>
+<span class="co">#>       role #variables</span>
+<span class="co">#>    outcome          1</span>
+<span class="co">#>  predictor          4</span></code></pre></div>
+<p>Before any steps are used the information on the original variables is:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">summary</span>(rec, <span class="dt">original =</span> <span class="ot">TRUE</span>)
+<span class="co">#> # A tibble: 5 x 4</span>
+<span class="co">#>    variable    type      role   source</span>
+<span class="co">#>       <chr>   <chr>     <chr>    <chr></span>
+<span class="co">#> 1 Seniority numeric predictor original</span>
+<span class="co">#> 2      Time numeric predictor original</span>
+<span class="co">#> 3       Age numeric predictor original</span>
+<span class="co">#> 4   Records nominal predictor original</span>
+<span class="co">#> 5    Status nominal   outcome original</span></code></pre></div>
+<p>We can add a step to compute dummy variables on the non-numeric data after we impute any missing data:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dummied <-<span class="st"> </span>rec <span class="op">%>%</span><span class="st"> </span><span class="kw">step_dummy</span>(<span class="kw">all_nominal</span>())</code></pre></div>
+<p>This will capture <em>any</em> variables that are either character strings or factors: <code>Status</code> and <code>Records</code>. However, since <code>Status</code> is our outcome, we might want to keep it as a factor so we can <em>subtract</em> that variable out either by name or by role:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dummied <-<span class="st"> </span>rec <span class="op">%>%</span><span class="st"> </span><span class="kw">step_dummy</span>(Records) <span class="co"># or</span>
+dummied <-<span class="st"> </span>rec <span class="op">%>%</span><span class="st"> </span><span class="kw">step_dummy</span>(<span class="kw">all_nominal</span>(), <span class="op">-</span><span class="st"> </span>Status) <span class="co"># or</span>
+dummied <-<span class="st"> </span>rec <span class="op">%>%</span><span class="st"> </span><span class="kw">step_dummy</span>(<span class="kw">all_nominal</span>(), <span class="op">-</span><span class="st"> </span><span class="kw">all_outcomes</span>()) </code></pre></div>
+<p>Using the last definition:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dummied <-<span class="st"> </span><span class="kw">prep</span>(dummied, <span class="dt">training =</span> credit_data)
+<span class="co">#> step 1 dummy training</span>
+with_dummy <-<span class="st"> </span><span class="kw">bake</span>(dummied, <span class="dt">newdata =</span> credit_data)
+with_dummy
+<span class="co">#> # A tibble: 4,454 x 4</span>
+<span class="co">#>    Seniority  Time   Age Records_yes</span>
+<span class="co">#>        <int> <int> <int>       <dbl></span>
+<span class="co">#>  1         9    60    30           0</span>
+<span class="co">#>  2        17    60    58           0</span>
+<span class="co">#>  3        10    36    46           1</span>
+<span class="co">#>  4         0    60    24           0</span>
+<span class="co">#>  5         0    36    26           0</span>
+<span class="co">#>  6         1    60    36           0</span>
+<span class="co">#>  7        29    60    44           0</span>
+<span class="co">#>  8         9    12    27           0</span>
+<span class="co">#>  9         0    60    32           0</span>
+<span class="co">#> 10         0    48    41           0</span>
+<span class="co">#> # ... with 4,444 more rows</span></code></pre></div>
+<p><code>Status</code> is unaffected.</p>
+<p>One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, <code>Records</code> is a factor variable before the step is executed. Afterwards, <code>Records</code> is gone and the binary variable <code>Records_yes</code> is in its place. One reason to have general selection routines like <code>all_predictors</code> or <code>contains</code> is to be able to select variables that have not be c [...]
+
+<script type="text/javascript">
+window.onload = function() {
+  var i, fig = 1, caps = document.getElementsByClassName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.className !== 'figure' || cap.nodeName !== 'P')
+      continue;
+    cap.innerHTML = '<span>Figure ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+  fig = 1;
+  caps = document.getElementsByTagName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.nodeName !== 'TABLE') continue;
+    cap.innerHTML = '<span>Table ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+}
+</script>
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/Simple_Example.R b/inst/doc/Simple_Example.R
new file mode 100644
index 0000000..f7c138d
--- /dev/null
+++ b/inst/doc/Simple_Example.R
@@ -0,0 +1,62 @@
+## ----ex_setup, include=FALSE---------------------------------------------
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+
+## ----data----------------------------------------------------------------
+library(recipes)
+library(caret)
+data(segmentationData)
+
+seg_train <- segmentationData %>% 
+  filter(Case == "Train") %>% 
+  select(-Case, -Cell)
+seg_test  <- segmentationData %>% 
+  filter(Case == "Test")  %>% 
+  select(-Case, -Cell)
+
+## ----first_rec-----------------------------------------------------------
+rec_obj <- recipe(Class ~ ., data = seg_train)
+rec_obj
+
+## ----step_code, eval = FALSE---------------------------------------------
+#  rec_obj <- step_name(rec_obj, arguments)    ## or
+#  rec_obj <- rec_obj %>% step_name(arguments)
+
+## ----center_scale--------------------------------------------------------
+standardized <- rec_obj %>%
+  step_center(all_predictors()) %>%
+  step_scale(all_predictors()) 
+standardized
+
+## ----trained-------------------------------------------------------------
+trained_rec <- prep(standardized, training = seg_train)
+
+## ----apply---------------------------------------------------------------
+train_data <- bake(trained_rec, newdata = seg_train)
+test_data  <- bake(trained_rec, newdata = seg_test)
+
+## ----tibbles-------------------------------------------------------------
+class(test_data)
+test_data
+
+## ----pca-----------------------------------------------------------------
+trained_rec <- trained_rec %>%
+  step_pca(ends_with("Ch1"), contains("area"), num = 5)
+trained_rec
+
+## ----pca_training--------------------------------------------------------
+trained_rec <- prep(trained_rec, training = seg_train)
+
+## ----pca_bake------------------------------------------------------------
+test_data  <- bake(trained_rec, newdata = seg_test)
+names(test_data)
+
+## ----step_list-----------------------------------------------------------
+steps <- apropos("^step_")
+steps[!grepl("new$", steps)]
+
diff --git a/inst/doc/Simple_Example.Rmd b/inst/doc/Simple_Example.Rmd
new file mode 100644
index 0000000..1828330
--- /dev/null
+++ b/inst/doc/Simple_Example.Rmd
@@ -0,0 +1,134 @@
+---
+title: "Basic Recipes"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Basic Recipes}
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+```{r ex_setup, include=FALSE}
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+```
+
+This document demonstrates some basic uses of recipes. First, some definitions are required: 
+
+ * __variables__ are the original (raw) data columns in a data frame or tibble. For example, in a traditional formula `Y ~ A + B + A:B`, the variables are `A`, `B`, and `Y`. 
+ * __roles__ define how variables will be used in the model. Examples are: `predictor` (independent variables), `response`, and `case weight`. This is meant to be open-ended and extensible. 
+ * __terms__ are columns in a design matrix such as `A`, `B`, and `A:B`. These can be other derived entities that are grouped such a a set of principal components or a set of columns that define a basis function for a variable. These are synonymous with features in machine learning. Variables that have `predictor` roles would automatically be main effect terms  
+
+## An Example
+
+The cell segmentation data will be used. It has 58 predictor columns, a factor variable `Class` (the outcome), and two extra labelling columns. Each of the predictors has a suffix for the optical channel (`"Ch1"`-`"Ch4"`). We will first separate the data into a training and test set then remove unimportant variables:
+
+```{r data}
+library(recipes)
+library(caret)
+data(segmentationData)
+
+seg_train <- segmentationData %>% 
+  filter(Case == "Train") %>% 
+  select(-Case, -Cell)
+seg_test  <- segmentationData %>% 
+  filter(Case == "Test")  %>% 
+  select(-Case, -Cell)
+```
+
+The idea is that the preprocessing operations will all be created using the training set and then these steps will be applied to both the training and test set. 
+
+## An Initial Recipe
+
+For a first recipe, let's plan on centering and scaling the predictors. First, we will create a recipe from the original data and then specify the processing steps. 
+
+Recipes can be created manually by sequentially adding roles to variables in a data set. 
+
+If the analysis only required **outcomes** and **predictors**, the easiest way to create the initial recipe is to use the standard formula method:
+
+```{r first_rec}
+rec_obj <- recipe(Class ~ ., data = seg_train)
+rec_obj
+```
+
+The data contained in the `data` argument need not be the training set; this data is only used to catalog the names of the variables and their types (e.g. numeric, etc.).  
+
+(Note that the formula method here is used to declare the variables and their roles and nothing else. If you use inline functions (e.g. `log`) it will complain. These types of operations can be added later.)
+
+## Preprocessing Steps
+
+From here, preprocessing steps can be added sequentially in one of two ways:
+```{r step_code, eval = FALSE}
+rec_obj <- step_name(rec_obj, arguments)    ## or
+rec_obj <- rec_obj %>% step_name(arguments)
+```
+`step_center` and the other functions will always return updated recipes. 
+
+One other important facet of the code is the method for specifying which variables should be used in different steps. The manual page `?selections` has more details but [`dplyr`](https://cran.r-project.org/package=dplyr)-like selector functions can be used: 
+
+ * use basic variable names (e.g. `x1, x2`),
+ *  [`dplyr`](https://cran.r-project.org/package=dplyr) functions for selecting variables: `contains`, `ends_with`, `everything`, `matches`, `num_range`, and `starts_with`,
+ * functions that subset on the role of the variables that have been specified so far: `all_outcomes`, `all_predictors`, `has_role`, or 
+ * similar functions for the type of data: `all_nominal`, `all_numeric`, and `has_type`. 
+
+Note that the functions listed above are the only ones that can be used to selecto variables inside the steps. Also, minus signs can be used to deselect variables. 
+
+For our data, we can add the two operations for all of the predictors:
+```{r center_scale}
+standardized <- rec_obj %>%
+  step_center(all_predictors()) %>%
+  step_scale(all_predictors()) 
+standardized
+```
+
+It is important to realize that the _specific_ variables have not been declared yet (in this example). In some preprocessing steps, variables will be added or removed from the current list of possible variables. 
+
+If this is the only preprocessing steps for the predictors, we can now estimate the means and standard deviations from the training set. The `prep` function is used with a recipe and a data set:
+```{r trained}
+trained_rec <- prep(standardized, training = seg_train)
+```
+Now that the statistics have been estimated, the preprocessing can be applied to the training and test set:
+```{r apply}
+train_data <- bake(trained_rec, newdata = seg_train)
+test_data  <- bake(trained_rec, newdata = seg_test)
+```
+`bake` returns a tibble: 
+```{r tibbles}
+class(test_data)
+test_data
+```
+
+
+## Adding Steps
+
+After exploring the data, more preprocessing might be required. Steps can be added to the trained recipe. Suppose that we need to create PCA components but only from the predictors from channel 1 and any predictors that are areas: 
+```{r pca}
+trained_rec <- trained_rec %>%
+  step_pca(ends_with("Ch1"), contains("area"), num = 5)
+trained_rec
+```
+Note that only the last step has been estimated; the first two were previously trained and these activities are not duplicated. We can add the PCA estimates using `prep` again:
+```{r pca_training}
+trained_rec <- prep(trained_rec, training = seg_train)
+```
+`bake` can be reapplied to get the principal components in addition to the other variables:
+
+```{r pca_bake}
+test_data  <- bake(trained_rec, newdata = seg_test)
+names(test_data)
+```
+
+Note that the PCA components have replaced the original variables that were from channel 1 or measured an area aspect of the cells. 
+
+
+There are a number of different steps included in the package:
+
+```{r step_list}
+steps <- apropos("^step_")
+steps[!grepl("new$", steps)]
+```
diff --git a/inst/doc/Simple_Example.html b/inst/doc/Simple_Example.html
new file mode 100644
index 0000000..1b6a2d3
--- /dev/null
+++ b/inst/doc/Simple_Example.html
@@ -0,0 +1,291 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8" />
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+
+<title>Basic Recipes</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+div.sourceCode { overflow-x: auto; }
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+</style>
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20800px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%2020px%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200% [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Basic Recipes</h1>
+
+
+<div id="TOC">
+<ul>
+<li><a href="#an-example">An Example</a></li>
+<li><a href="#an-initial-recipe">An Initial Recipe</a></li>
+<li><a href="#preprocessing-steps">Preprocessing Steps</a></li>
+<li><a href="#adding-steps">Adding Steps</a></li>
+</ul>
+</div>
+
+<p>This document demonstrates some basic uses of recipes. First, some definitions are required:</p>
+<ul>
+<li><strong>variables</strong> are the original (raw) data columns in a data frame or tibble. For example, in a traditional formula <code>Y ~ A + B + A:B</code>, the variables are <code>A</code>, <code>B</code>, and <code>Y</code>.</li>
+<li><strong>roles</strong> define how variables will be used in the model. Examples are: <code>predictor</code> (independent variables), <code>response</code>, and <code>case weight</code>. This is meant to be open-ended and extensible.</li>
+<li><strong>terms</strong> are columns in a design matrix such as <code>A</code>, <code>B</code>, and <code>A:B</code>. These can be other derived entities that are grouped such a a set of principal components or a set of columns that define a basis function for a variable. These are synonymous with features in machine learning. Variables that have <code>predictor</code> roles would automatically be main effect terms</li>
+</ul>
+<div id="an-example" class="section level2">
+<h2>An Example</h2>
+<p>The cell segmentation data will be used. It has 58 predictor columns, a factor variable <code>Class</code> (the outcome), and two extra labelling columns. Each of the predictors has a suffix for the optical channel (<code>"Ch1"</code>-<code>"Ch4"</code>). We will first separate the data into a training and test set then remove unimportant variables:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(recipes)
+<span class="kw">library</span>(caret)
+<span class="kw">data</span>(segmentationData)
+
+seg_train <-<span class="st"> </span>segmentationData <span class="op">%>%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(Case <span class="op">==</span><span class="st"> "Train"</span>) <span class="op">%>%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>Case, <span class="op">-</span>Cell)
+seg_test  <-<span class="st"> </span>segmentationData <span class="op">%>%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">filter</span>(Case <span class="op">==</span><span class="st"> "Test"</span>)  <span class="op">%>%</span><span class="st"> </span>
+<span class="st">  </span><span class="kw">select</span>(<span class="op">-</span>Case, <span class="op">-</span>Cell)</code></pre></div>
+<p>The idea is that the preprocessing operations will all be created using the training set and then these steps will be applied to both the training and test set.</p>
+</div>
+<div id="an-initial-recipe" class="section level2">
+<h2>An Initial Recipe</h2>
+<p>For a first recipe, let’s plan on centering and scaling the predictors. First, we will create a recipe from the original data and then specify the processing steps.</p>
+<p>Recipes can be created manually by sequentially adding roles to variables in a data set.</p>
+<p>If the analysis only required <strong>outcomes</strong> and <strong>predictors</strong>, the easiest way to create the initial recipe is to use the standard formula method:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">rec_obj <-<span class="st"> </span><span class="kw">recipe</span>(Class <span class="op">~</span><span class="st"> </span>., <span class="dt">data =</span> seg_train)
+rec_obj
+<span class="co">#> Data Recipe</span>
+<span class="co">#> </span>
+<span class="co">#> Inputs:</span>
+<span class="co">#> </span>
+<span class="co">#>       role #variables</span>
+<span class="co">#>    outcome          1</span>
+<span class="co">#>  predictor         58</span></code></pre></div>
+<p>The data contained in the <code>data</code> argument need not be the training set; this data is only used to catalog the names of the variables and their types (e.g. numeric, etc.).</p>
+<p>(Note that the formula method here is used to declare the variables and their roles and nothing else. If you use inline functions (e.g. <code>log</code>) it will complain. These types of operations can be added later.)</p>
+</div>
+<div id="preprocessing-steps" class="section level2">
+<h2>Preprocessing Steps</h2>
+<p>From here, preprocessing steps can be added sequentially in one of two ways:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">rec_obj <-<span class="st"> </span><span class="kw">step_name</span>(rec_obj, arguments)    ## or
+rec_obj <-<span class="st"> </span>rec_obj <span class="op">%>%</span><span class="st"> </span><span class="kw">step_name</span>(arguments)</code></pre></div>
+<p><code>step_center</code> and the other functions will always return updated recipes.</p>
+<p>One other important facet of the code is the method for specifying which variables should be used in different steps. The manual page <code>?selections</code> has more details but <a href="https://cran.r-project.org/package=dplyr"><code>dplyr</code></a>-like selector functions can be used:</p>
+<ul>
+<li>use basic variable names (e.g. <code>x1, x2</code>),</li>
+<li><a href="https://cran.r-project.org/package=dplyr"><code>dplyr</code></a> functions for selecting variables: <code>contains</code>, <code>ends_with</code>, <code>everything</code>, <code>matches</code>, <code>num_range</code>, and <code>starts_with</code>,</li>
+<li>functions that subset on the role of the variables that have been specified so far: <code>all_outcomes</code>, <code>all_predictors</code>, <code>has_role</code>, or</li>
+<li>similar functions for the type of data: <code>all_nominal</code>, <code>all_numeric</code>, and <code>has_type</code>.</li>
+</ul>
+<p>Note that the functions listed above are the only ones that can be used to selecto variables inside the steps. Also, minus signs can be used to deselect variables.</p>
+<p>For our data, we can add the two operations for all of the predictors:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">standardized <-<span class="st"> </span>rec_obj <span class="op">%>%</span>
+<span class="st">  </span><span class="kw">step_center</span>(<span class="kw">all_predictors</span>()) <span class="op">%>%</span>
+<span class="st">  </span><span class="kw">step_scale</span>(<span class="kw">all_predictors</span>()) 
+standardized
+<span class="co">#> Data Recipe</span>
+<span class="co">#> </span>
+<span class="co">#> Inputs:</span>
+<span class="co">#> </span>
+<span class="co">#>       role #variables</span>
+<span class="co">#>    outcome          1</span>
+<span class="co">#>  predictor         58</span>
+<span class="co">#> </span>
+<span class="co">#> Steps:</span>
+<span class="co">#> </span>
+<span class="co">#> Centering for all_predictors()</span>
+<span class="co">#> Scaling for all_predictors()</span></code></pre></div>
+<p>It is important to realize that the <em>specific</em> variables have not been declared yet (in this example). In some preprocessing steps, variables will be added or removed from the current list of possible variables.</p>
+<p>If this is the only preprocessing steps for the predictors, we can now estimate the means and standard deviations from the training set. The <code>prep</code> function is used with a recipe and a data set:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">trained_rec <-<span class="st"> </span><span class="kw">prep</span>(standardized, <span class="dt">training =</span> seg_train)
+<span class="co">#> step 1 center training </span>
+<span class="co">#> step 2 scale training</span></code></pre></div>
+<p>Now that the statistics have been estimated, the preprocessing can be applied to the training and test set:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">train_data <-<span class="st"> </span><span class="kw">bake</span>(trained_rec, <span class="dt">newdata =</span> seg_train)
+test_data  <-<span class="st"> </span><span class="kw">bake</span>(trained_rec, <span class="dt">newdata =</span> seg_test)</code></pre></div>
+<p><code>bake</code> returns a tibble:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">class</span>(test_data)
+<span class="co">#> [1] "tbl_df"     "tbl"        "data.frame"</span>
+test_data
+<span class="co">#> # A tibble: 1,010 x 58</span>
+<span class="co">#>    AngleCh1 AreaCh1 AvgIntenCh1 AvgIntenCh2 AvgIntenCh3 AvgIntenCh4</span>
+<span class="co">#>       <dbl>   <dbl>       <dbl>       <dbl>       <dbl>       <dbl></span>
+<span class="co">#>  1   1.0656  -0.647      -0.684      -1.177      -0.926     -0.9238</span>
+<span class="co">#>  2  -1.8040  -0.185      -0.632      -0.479      -0.809     -0.6666</span>
+<span class="co">#>  3  -1.0300  -0.707       1.207       3.035       0.348      1.3864</span>
+<span class="co">#>  4   1.6935  -0.684       0.806       2.664       0.296      0.8934</span>
+<span class="co">#>  5   1.8129  -0.342      -0.668      -1.172      -0.843     -0.9282</span>
+<span class="co">#>  6  -1.4759   0.784      -0.682      -0.628      -0.881     -0.5939</span>
+<span class="co">#>  7   1.2702   0.272      -0.672      -0.625      -0.809     -0.5156</span>
+<span class="co">#>  8  -1.5837   0.457       0.283       1.320      -0.613     -0.0891</span>
+<span class="co">#>  9  -0.7957  -0.412      -0.669      -1.168      -0.845     -0.9258</span>
+<span class="co">#> 10   0.0363  -0.638      -0.535       0.182      -0.555     -0.0253</span>
+<span class="co">#> # ... with 1,000 more rows, and 52 more variables:</span>
+<span class="co">#> #   ConvexHullAreaRatioCh1 <dbl>, ConvexHullPerimRatioCh1 <dbl>,</span>
+<span class="co">#> #   DiffIntenDensityCh1 <dbl>, DiffIntenDensityCh3 <dbl>,</span>
+<span class="co">#> #   DiffIntenDensityCh4 <dbl>, EntropyIntenCh1 <dbl>,</span>
+<span class="co">#> #   EntropyIntenCh3 <dbl>, EntropyIntenCh4 <dbl>, EqCircDiamCh1 <dbl>,</span>
+<span class="co">#> #   EqEllipseLWRCh1 <dbl>, EqEllipseOblateVolCh1 <dbl>,</span>
+<span class="co">#> #   EqEllipseProlateVolCh1 <dbl>, EqSphereAreaCh1 <dbl>,</span>
+<span class="co">#> #   EqSphereVolCh1 <dbl>, FiberAlign2Ch3 <dbl>, FiberAlign2Ch4 <dbl>,</span>
+<span class="co">#> #   FiberLengthCh1 <dbl>, FiberWidthCh1 <dbl>, IntenCoocASMCh3 <dbl>,</span>
+<span class="co">#> #   IntenCoocASMCh4 <dbl>, IntenCoocContrastCh3 <dbl>,</span>
+<span class="co">#> #   IntenCoocContrastCh4 <dbl>, IntenCoocEntropyCh3 <dbl>,</span>
+<span class="co">#> #   IntenCoocEntropyCh4 <dbl>, IntenCoocMaxCh3 <dbl>,</span>
+<span class="co">#> #   IntenCoocMaxCh4 <dbl>, KurtIntenCh1 <dbl>, KurtIntenCh3 <dbl>,</span>
+<span class="co">#> #   KurtIntenCh4 <dbl>, LengthCh1 <dbl>, NeighborAvgDistCh1 <dbl>,</span>
+<span class="co">#> #   NeighborMinDistCh1 <dbl>, NeighborVarDistCh1 <dbl>, PerimCh1 <dbl>,</span>
+<span class="co">#> #   ShapeBFRCh1 <dbl>, ShapeLWRCh1 <dbl>, ShapeP2ACh1 <dbl>,</span>
+<span class="co">#> #   SkewIntenCh1 <dbl>, SkewIntenCh3 <dbl>, SkewIntenCh4 <dbl>,</span>
+<span class="co">#> #   SpotFiberCountCh3 <dbl>, SpotFiberCountCh4 <dbl>, TotalIntenCh1 <dbl>,</span>
+<span class="co">#> #   TotalIntenCh2 <dbl>, TotalIntenCh3 <dbl>, TotalIntenCh4 <dbl>,</span>
+<span class="co">#> #   VarIntenCh1 <dbl>, VarIntenCh3 <dbl>, VarIntenCh4 <dbl>,</span>
+<span class="co">#> #   WidthCh1 <dbl>, XCentroid <dbl>, YCentroid <dbl></span></code></pre></div>
+</div>
+<div id="adding-steps" class="section level2">
+<h2>Adding Steps</h2>
+<p>After exploring the data, more preprocessing might be required. Steps can be added to the trained recipe. Suppose that we need to create PCA components but only from the predictors from channel 1 and any predictors that are areas:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">trained_rec <-<span class="st"> </span>trained_rec <span class="op">%>%</span>
+<span class="st">  </span><span class="kw">step_pca</span>(<span class="kw">ends_with</span>(<span class="st">"Ch1"</span>), <span class="kw">contains</span>(<span class="st">"area"</span>), <span class="dt">num =</span> <span class="dv">5</span>)
+trained_rec
+<span class="co">#> Data Recipe</span>
+<span class="co">#> </span>
+<span class="co">#> Inputs:</span>
+<span class="co">#> </span>
+<span class="co">#>       role #variables</span>
+<span class="co">#>    outcome          1</span>
+<span class="co">#>  predictor         58</span>
+<span class="co">#> </span>
+<span class="co">#> Training data contained 1009 data points and no missing data.</span>
+<span class="co">#> </span>
+<span class="co">#> Steps:</span>
+<span class="co">#> </span>
+<span class="co">#> Centering for AngleCh1, AreaCh1, ... [trained]</span>
+<span class="co">#> Scaling for AngleCh1, AreaCh1, ... [trained]</span>
+<span class="co">#> PCA extraction with ends_with("Ch1"), contains("area")</span></code></pre></div>
+<p>Note that only the last step has been estimated; the first two were previously trained and these activities are not duplicated. We can add the PCA estimates using <code>prep</code> again:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">trained_rec <-<span class="st"> </span><span class="kw">prep</span>(trained_rec, <span class="dt">training =</span> seg_train)
+<span class="co">#> step 1 center [pre-trained]</span>
+<span class="co">#> step 2 scale [pre-trained]</span>
+<span class="co">#> step 3 pca training</span></code></pre></div>
+<p><code>bake</code> can be reapplied to get the principal components in addition to the other variables:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">test_data  <-<span class="st"> </span><span class="kw">bake</span>(trained_rec, <span class="dt">newdata =</span> seg_test)
+<span class="kw">names</span>(test_data)
+<span class="co">#>  [1] "AvgIntenCh2"          "AvgIntenCh3"          "AvgIntenCh4"         </span>
+<span class="co">#>  [4] "DiffIntenDensityCh3"  "DiffIntenDensityCh4"  "EntropyIntenCh3"     </span>
+<span class="co">#>  [7] "EntropyIntenCh4"      "FiberAlign2Ch3"       "FiberAlign2Ch4"      </span>
+<span class="co">#> [10] "IntenCoocASMCh3"      "IntenCoocASMCh4"      "IntenCoocContrastCh3"</span>
+<span class="co">#> [13] "IntenCoocContrastCh4" "IntenCoocEntropyCh3"  "IntenCoocEntropyCh4" </span>
+<span class="co">#> [16] "IntenCoocMaxCh3"      "IntenCoocMaxCh4"      "KurtIntenCh3"        </span>
+<span class="co">#> [19] "KurtIntenCh4"         "SkewIntenCh3"         "SkewIntenCh4"        </span>
+<span class="co">#> [22] "SpotFiberCountCh3"    "SpotFiberCountCh4"    "TotalIntenCh2"       </span>
+<span class="co">#> [25] "TotalIntenCh3"        "TotalIntenCh4"        "VarIntenCh3"         </span>
+<span class="co">#> [28] "VarIntenCh4"          "XCentroid"            "YCentroid"           </span>
+<span class="co">#> [31] "PC1"                  "PC2"                  "PC3"                 </span>
+<span class="co">#> [34] "PC4"                  "PC5"</span></code></pre></div>
+<p>Note that the PCA components have replaced the original variables that were from channel 1 or measured an area aspect of the cells.</p>
+<p>There are a number of different steps included in the package:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">steps <-<span class="st"> </span><span class="kw">apropos</span>(<span class="st">"^step_"</span>)
+steps[<span class="op">!</span><span class="kw">grepl</span>(<span class="st">"new$"</span>, steps)]
+<span class="co">#>  [1] "step_BoxCox"       "step_YeoJohnson"   "step_bagimpute"   </span>
+<span class="co">#>  [4] "step_bin2factor"   "step_center"       "step_classdist"   </span>
+<span class="co">#>  [7] "step_corr"         "step_date"         "step_depth"       </span>
+<span class="co">#> [10] "step_discretize"   "step_dummy"        "step_holiday"     </span>
+<span class="co">#> [13] "step_hyperbolic"   "step_ica"          "step_interact"    </span>
+<span class="co">#> [16] "step_intercept"    "step_invlogit"     "step_isomap"      </span>
+<span class="co">#> [19] "step_knnimpute"    "step_kpca"         "step_lincomb"     </span>
+<span class="co">#> [22] "step_log"          "step_logit"        "step_meanimpute"  </span>
+<span class="co">#> [25] "step_modeimpute"   "step_ns"           "step_nzv"         </span>
+<span class="co">#> [28] "step_ordinalscore" "step_other"        "step_pca"         </span>
+<span class="co">#> [31] "step_percentile"   "step_poly"         "step_range"       </span>
+<span class="co">#> [34] "step_ratio"        "step_regex"        "step_rm"          </span>
+<span class="co">#> [37] "step_scale"        "step_shuffle"      "step_spatialsign" </span>
+<span class="co">#> [40] "step_sqrt"         "step_window"</span></code></pre></div>
+</div>
+
+<script type="text/javascript">
+window.onload = function() {
+  var i, fig = 1, caps = document.getElementsByClassName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.className !== 'figure' || cap.nodeName !== 'P')
+      continue;
+    cap.innerHTML = '<span>Figure ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+  fig = 1;
+  caps = document.getElementsByTagName('caption');
+  for (i = 0; i < caps.length; i++) {
+    var cap = caps[i];
+    if (cap.parentElement.nodeName !== 'TABLE') continue;
+    cap.innerHTML = '<span>Table ' + fig + ':</span> ' + cap.innerHTML;
+    fig++;
+  }
+}
+</script>
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/man/add_role.Rd b/man/add_role.Rd
new file mode 100644
index 0000000..7def0bc
--- /dev/null
+++ b/man/add_role.Rd
@@ -0,0 +1,48 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/roles.R
+\name{add_role}
+\alias{add_role}
+\title{Manually Add Roles}
+\usage{
+add_role(recipe, ..., new_role = "predictor")
+}
+\arguments{
+\item{recipe}{An existing \code{\link{recipe}}.}
+
+\item{...}{One or more selector functions to choose which variables are
+being assigned a role. See \code{\link{selections}} for more details.}
+
+\item{new_role}{A character string for a single role.}
+}
+\value{
+An updated recipe object.
+}
+\description{
+\code{add_role} can add a role definition to an existing variable in the
+  recipe.
+}
+\details{
+If a variable is selected that currently has a role, the role is
+  changed and a warning is issued.
+}
+\examples{
+
+data(biomass)
+
+# Create the recipe manually
+rec <- recipe(x = biomass)
+rec
+summary(rec)
+
+rec <- rec \%>\%
+  add_role(carbon, contains("gen"), sulfur, new_role = "predictor") \%>\%
+  add_role(sample, new_role = "id variable") \%>\%
+  add_role(dataset, new_role = "splitting variable") \%>\%
+  add_role(HHV, new_role = "outcome")
+rec
+
+}
+\concept{
+preprocessing model_specification
+}
+\keyword{datagen}
diff --git a/man/add_step.Rd b/man/add_step.Rd
new file mode 100644
index 0000000..2cc027c
--- /dev/null
+++ b/man/add_step.Rd
@@ -0,0 +1,23 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/misc.R
+\name{add_step}
+\alias{add_step}
+\title{Add a New Step to Current Recipe}
+\usage{
+add_step(rec, object)
+}
+\arguments{
+\item{rec}{A \code{\link{recipe}}.}
+
+\item{object}{A step object.}
+}
+\value{
+A updated \code{\link{recipe}} with the new step in the last slot.
+}
+\description{
+\code{add_step} adds a step to the last location in the recipe.
+}
+\concept{
+preprocessing
+}
+\keyword{datagen}
diff --git a/man/bake.Rd b/man/bake.Rd
new file mode 100644
index 0000000..d2495f4
--- /dev/null
+++ b/man/bake.Rd
@@ -0,0 +1,50 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/recipe.R
+\name{bake}
+\alias{bake}
+\alias{bake.recipe}
+\alias{bake.recipe}
+\title{Apply a Trained Data Recipe}
+\usage{
+bake(object, ...)
+
+\method{bake}{recipe}(object, newdata = object$template, ...)
+}
+\arguments{
+\item{object}{A trained object such as a \code{\link{recipe}} with at least
+one preprocessing step.}
+
+\item{...}{One or more selector functions to choose which variables will be
+returned by the function. See \code{\link{selections}} for more details.
+If no selectors are given, the default is to use
+\code{\link{all_predictors}}.}
+
+\item{newdata}{A data frame or tibble for whom the preprocessing will be
+applied.}
+}
+\value{
+A tibble that may have different columns than the original columns
+  in \code{newdata}.
+}
+\description{
+For a recipe with at least one preprocessing step that has been trained by
+  \code{\link{prep.recipe}}, apply the computations to new data.
+}
+\details{
+\code{\link{bake}} takes a trained recipe and applies the
+  operations to a data set to create a design matrix.
+
+If the original data used to train the data are to be processed, time can be
+  saved by using the \code{retain = TRUE} option of \code{\link{prep}} to
+  avoid duplicating the same operations.
+
+A tibble is always returned but can be easily converted to a data frame or
+  matrix as needed.
+}
+\author{
+Max Kuhn
+}
+\concept{
+preprocessing model_specification
+}
+\keyword{datagen}
diff --git a/man/biomass.Rd b/man/biomass.Rd
new file mode 100644
index 0000000..4568240
--- /dev/null
+++ b/man/biomass.Rd
@@ -0,0 +1,25 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{biomass}
+\alias{biomass}
+\title{Biomass Data}
+\source{
+Ghugare, S. B., Tiwary, S., Elangovan, V., and Tambe, S. S. (2013). 
+Prediction of Higher Heating Value of Solid Biomass Fuels Using Artificial 
+Intelligence Formalisms. \emph{BioEnergy Research}, 1-12.
+}
+\value{
+\item{biomass}{a data frame}
+}
+\description{
+Ghugare et al (2014) contains a data set where different biomass fuels are
+characterized by the amount of certain molecules (carbon, hydrogen, oxygen, 
+nitrogen, and sulfur) and the corresponding higher heating value (HHV). 
+These data are from their Table S.2 of the Supplementary Materials
+}
+\examples{
+data(biomass)
+str(biomass)
+}
+\keyword{datasets}
diff --git a/man/covers.Rd b/man/covers.Rd
new file mode 100644
index 0000000..3190c11
--- /dev/null
+++ b/man/covers.Rd
@@ -0,0 +1,23 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{covers}
+\alias{covers}
+\title{Raw Cover Type Data}
+\source{
+\url{https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.info}
+}
+\value{
+\item{covers}{a data frame}
+}
+\description{
+These data are raw data describing different types of forest cover-types 
+  from the UCI Machine Learning Database (see link below). There is one 
+  column in the data that has a few difference pieces of textual 
+  information (of variable lengths).
+}
+\examples{
+data(covers)
+str(covers)
+}
+\keyword{datasets}
diff --git a/man/credit_data.Rd b/man/credit_data.Rd
new file mode 100644
index 0000000..376ab79
--- /dev/null
+++ b/man/credit_data.Rd
@@ -0,0 +1,23 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{credit_data}
+\alias{credit_data}
+\title{Credit Data}
+\source{
+\url{https://github.com/gastonstat/CreditScoring}, 
+\url{http://bit.ly/2kkBFrk}
+}
+\value{
+\item{credit_data}{a data frame}
+}
+\description{
+These data are from the website of Dr. Lluís A. Belanche Muñoz by way of a 
+github repository of Dr. Gaston Sanchez. One data point is a missing outcome
+was removed from the original data.
+}
+\examples{
+data(credit_data)
+str(credit_data)
+}
+\keyword{datasets}
diff --git a/man/discretize.Rd b/man/discretize.Rd
new file mode 100644
index 0000000..48cc4bd
--- /dev/null
+++ b/man/discretize.Rd
@@ -0,0 +1,124 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/discretize.R
+\name{discretize}
+\alias{discretize}
+\alias{discretize.default}
+\alias{discretize.numeric}
+\alias{predict.discretize}
+\alias{step_discretize}
+\title{Discretize Numeric Variables}
+\usage{
+discretize(x, ...)
+
+\method{discretize}{default}(x, ...)
+
+\method{discretize}{numeric}(x, cuts = 4, labels = NULL, prefix = "bin",
+  keep_na = TRUE, infs = TRUE, min_unique = 10, ...)
+
+\method{predict}{discretize}(object, newdata, ...)
+
+step_discretize(recipe, ..., role = NA, trained = FALSE, objects = NULL,
+  options = list())
+}
+\arguments{
+\item{x}{A numeric vector}
+
+\item{...}{For \code{discretize}: options to pass to
+\code{\link[stats]{quantile}} that should not include \code{x} or
+\code{probs}. For \code{step_discretize}, the dots specify one or more
+selector functions to choose which variables are affected by the step. See
+\code{\link{selections}} for more details.}
+
+\item{cuts}{An integer defining how many cuts to make of the data.}
+
+\item{labels}{A character vector defining the factor levels that will be in
+the new factor (from smallest to largest). This should have length
+ \code{cuts+1} and should not include a level for missing (see
+ \code{keep_na} below).}
+
+\item{prefix}{A single parameter value to be used as a prefix for the factor
+levels (e.g. \code{bin1}, \code{bin2}, ...). If the string is not a valid
+R name, it is coerced to one.}
+
+\item{keep_na}{A logical for whether a factor level should be created to
+identify missing values in \code{x}.}
+
+\item{infs}{A logical indicating whether the smallest and largest cut point
+should be infinite.}
+
+\item{min_unique}{An integer defining a sample size line of dignity for the
+binning. If (the number of unique values)\code{/(cuts+1)} is less than
+\code{min_unique}, no discretization takes place.}
+
+\item{object}{An object of class \code{discretize}.}
+
+\item{newdata}{A new numeric object to be binned.}
+
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{objects}{The \code{\link{discretize}} objects are stored here once
+the recipe has be trained by \code{\link{prep.recipe}}.}
+
+\item{options}{A list of options to \code{\link{discretize}}. A defaults is
+set for the argument \code{x}. Note that the using the options
+\code{prefix} and \code{labels} when more than one variable is being
+transformed might be problematic as all variables inherit those values.}
+}
+\value{
+\code{discretize} returns an object of class \code{discretize}.
+  \code{predict.discretize} returns a factor vector.
+}
+\description{
+\code{discretize} converts a numeric vector into a factor with bins having
+  approximately the same number of data points (based on a training set).
+}
+\details{
+\code{discretize} estimates the cut points from \code{x} using
+  percentiles. For example, if \code{cuts = 3}, the function estimates the
+ quartiles of \code{x} and uses these as the cut points. If \code{cuts = 2},
+ the bins are defined as being above or below the median of \code{x}.
+
+The \code{predict} method can then be used to turn numeric vectors into
+ factor vectors.
+
+If \code{keep_na = TRUE}, a suffix of "_missing" is used as a factor level
+ (see the examples below).
+
+If \code{infs = FALSE} and a new value is greater than the largest value of
+ \code{x}, a missing value will result.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+median(biomass_tr$carbon)
+discretize(biomass_tr$carbon, cuts = 2)
+discretize(biomass_tr$carbon, cuts = 2, infs = FALSE)
+discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE)
+discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin")
+
+carbon_binned <- discretize(biomass_tr$carbon)
+table(predict(carbon_binned, biomass_tr$carbon))
+
+carbon_no_infs <- discretize(biomass_tr$carbon, infs = FALSE)
+predict(carbon_no_infs, c(50, 100))
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+rec <- rec \%>\% step_discretize(carbon, hydrogen)
+rec <- prep(rec, biomass_tr)
+binned_te <- bake(rec, biomass_te)
+table(binned_te$carbon)
+}
+\concept{
+preprocessing discretization factors
+}
+\keyword{datagen}
diff --git a/man/has_role.Rd b/man/has_role.Rd
new file mode 100644
index 0000000..0b93bf9
--- /dev/null
+++ b/man/has_role.Rd
@@ -0,0 +1,64 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/selections.R
+\name{has_role}
+\alias{has_role}
+\alias{all_predictors}
+\alias{all_outcomes}
+\alias{has_type}
+\alias{all_numeric}
+\alias{all_nominal}
+\alias{current_info}
+\title{Role Selection}
+\usage{
+has_role(match = "predictor", roles = current_info()$roles)
+
+all_predictors(roles = current_info()$roles)
+
+all_outcomes(roles = current_info()$roles)
+
+has_type(match = "numeric", types = current_info()$types)
+
+all_numeric(types = current_info()$types)
+
+all_nominal(types = current_info()$types)
+
+current_info()
+}
+\arguments{
+\item{match}{A single character string for the query. Exact matching is
+used (i.e. regular expressions won't work).}
+
+\item{roles}{A character string of roles for the current set of terms.}
+
+\item{types}{A character string of roles for the current set of data types}
+}
+\value{
+Selector functions return an integer vector while
+  \code{current_info} returns an environment with vectors \code{vars},
+  \code{roles}, and \code{types}.
+}
+\description{
+\code{has_role}, \code{all_predictors}, and \code{all_outcomes} can be used
+  to select variables in a formula that have certain roles. Similarly,
+  \code{has_type}, \code{all_numeric}, and \code{all_nominal} are used to
+  select columns based on their data type. See \code{\link{selections}} for
+  more details. \code{current_info} is an internal function that is
+  unlikely to help users while the others have limited utility outside of
+  step function arguments.
+}
+\examples{
+data(biomass)
+
+rec <- recipe(biomass) \%>\%
+  add_role(carbon, hydrogen, oxygen, nitrogen, sulfur,
+           new_role = "predictor") \%>\%
+  add_role(HHV, new_role = "outcome") \%>\%
+  add_role(sample, new_role = "id variable") \%>\%
+  add_role(dataset, new_role = "splitting indicator")
+recipe_info <- summary(rec)
+recipe_info
+
+has_role("id variable", roles = recipe_info$role)
+all_outcomes(roles = recipe_info$role)
+}
+\keyword{datagen}
diff --git a/man/juice.Rd b/man/juice.Rd
new file mode 100644
index 0000000..f3ff1b1
--- /dev/null
+++ b/man/juice.Rd
@@ -0,0 +1,53 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/recipe.R
+\name{juice}
+\alias{juice}
+\title{Extract Finalized Training Set}
+\usage{
+juice(object, ...)
+}
+\arguments{
+\item{object}{A \code{recipe} object that has been prepared 
+with the option \code{retain = TRUE}.}
+
+\item{...}{One or more selector functions to choose which variables will be
+returned by the function. See \code{\link{selections}} for more details.
+If no selectors are given, the default is to use
+\code{\link{all_predictors}}.}
+}
+\value{
+A tibble.
+}
+\description{
+As steps are estimated by \code{prep}, these operations are
+ applied to the training set. Rather than running \code{bake} 
+ to duplicate this processing, this function will return
+ variables from the processed training set.
+}
+\details{
+When preparing a recipe, if the training data set is retained using \code{retain = TRUE}, there is no need to \code{bake} the recipe to get the preprocessed training set.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+sp_signed <- rec \%>\%
+  step_center(all_predictors()) \%>\%
+  step_scale(all_predictors()) \%>\%
+  step_spatialsign(all_predictors())
+
+sp_signed_trained <- prep(sp_signed, training = biomass_tr, retain = TRUE)
+
+tr_values <- bake(sp_signed_trained, newdata = biomass_tr, all_predictors())
+og_values <- juice(sp_signed_trained, all_predictors())
+
+all.equal(tr_values, og_values)
+}
+\seealso{
+\code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
diff --git a/man/names0.Rd b/man/names0.Rd
new file mode 100644
index 0000000..b200029
--- /dev/null
+++ b/man/names0.Rd
@@ -0,0 +1,25 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/misc.R
+\name{names0}
+\alias{names0}
+\title{Sequences of Names with Padded Zeros}
+\usage{
+names0(num, prefix = "x")
+}
+\arguments{
+\item{num}{A single integer for how many elements are created.}
+
+\item{prefix}{A character string that will start each name. .}
+}
+\value{
+A character string of length \code{num}.
+}
+\description{
+This function creates a series of \code{num} names with a common prefix.
+  The names are numbered with leading zeros (e.g.
+  \code{prefix01}-\code{prefix10} instead of \code{prefix1}-\code{prefix10}).
+}
+\concept{
+string_functions naming_functions
+}
+\keyword{datagen}
diff --git a/man/okc.Rd b/man/okc.Rd
new file mode 100644
index 0000000..086e7d5
--- /dev/null
+++ b/man/okc.Rd
@@ -0,0 +1,24 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/data.R
+\docType{data}
+\name{okc}
+\alias{okc}
+\title{OkCupid Data}
+\source{
+Kim, A. Y., and A. Escobedo-Land. 2015. "OkCupid Data for 
+  Introductory Statistics and Data Science Courses." \emph{Journal of 
+  Statistics Education: An International Journal on the Teaching and 
+  Learning of Statistics}.
+}
+\value{
+\item{okc}{a data frame}
+}
+\description{
+These are a sample of columns of users of OkCupid dating website. The data
+are from Kim and Escobedo-Land (2015).
+}
+\examples{
+data(okc)
+str(okc)
+}
+\keyword{datasets}
diff --git a/man/prep.Rd b/man/prep.Rd
new file mode 100644
index 0000000..23946da
--- /dev/null
+++ b/man/prep.Rd
@@ -0,0 +1,73 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/recipe.R
+\name{prep}
+\alias{prep}
+\alias{prep.recipe}
+\alias{prep.recipe}
+\title{Train a Data Recipe}
+\usage{
+prep(x, ...)
+
+\method{prep}{recipe}(x, training = NULL, fresh = FALSE, verbose = TRUE,
+  retain = FALSE, stringsAsFactors = TRUE, ...)
+}
+\arguments{
+\item{x}{an object}
+
+\item{...}{further arguments passed to or from other methods (not currently
+used).}
+
+\item{training}{A data frame or tibble that will be used to estimate
+parameters for preprocessing.}
+
+\item{fresh}{A logical indicating whether already trained steps should be
+re-trained. If \code{TRUE}, you should pass in a data set to the argument
+\code{training}.}
+
+\item{verbose}{A logical that controls wether progress is reported as steps
+are executed.}
+
+\item{retain}{A logical: should the \emph{preprocessingcessed} training set be saved
+into the \code{template} slot of the recipe after training? This is a good
+  idea if you want to add more steps later but want to avoid re-training
+  the existing steps.}
+
+\item{stringsAsFactors}{A logical: should character columns be converted to
+factors? This affects the preprocessingcessed training set (when
+\code{retain = TRUE}) as well as the results of \code{bake.recipe}.}
+}
+\value{
+A recipe whose step objects have been updated with the required
+  quantities (e.g. parameter estimates, model objects, etc). Also, the
+  \code{term_info} object is likely to be modified as the steps are
+  executed.
+}
+\description{
+For a recipe with at least one preprocessing step, estimate the required
+  parameters from a training set that can be later applied to other data 
+  sets.
+}
+\details{
+Given a data set, this function estimates the required quantities
+  and statistics required by any steps.
+
+\code{\link{prep}} returns an updated recipe with the estimates.
+
+Note that missing data handling is handled in the steps; there is no global
+  \code{na.rm} option at the recipe-level or in  \code{\link{prep}}.
+
+Also, if a recipe has been trained using \code{\link{prep}} and then steps
+  are added, \code{\link{prep}} will only update the new steps. If
+  \code{fresh = TRUE}, all of the steps will be (re)estimated.
+
+As the steps are executed, the \code{training} set is updated. For example,
+  if the first step is to center the data and the second is to scale the
+  data, the step for scaling is given the centered data.
+}
+\author{
+Max Kuhn
+}
+\concept{
+preprocessing model_specification
+}
+\keyword{datagen}
diff --git a/man/print.recipe.Rd b/man/print.recipe.Rd
new file mode 100644
index 0000000..e5f4feb
--- /dev/null
+++ b/man/print.recipe.Rd
@@ -0,0 +1,26 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/recipe.R
+\name{print.recipe}
+\alias{print.recipe}
+\title{Print a Recipe}
+\usage{
+\method{print}{recipe}(x, form_width = 30, ...)
+}
+\arguments{
+\item{x}{A \code{recipe} object}
+
+\item{form_width}{The number of characters used to print the variables or
+terms in a formula}
+
+\item{...}{further arguments passed to or from other methods (not currently
+used).}
+}
+\value{
+The original object (invisibly)
+}
+\description{
+Print a Recipe
+}
+\author{
+Max Kuhn
+}
diff --git a/man/recipe.Rd b/man/recipe.Rd
new file mode 100644
index 0000000..c637a5e
--- /dev/null
+++ b/man/recipe.Rd
@@ -0,0 +1,165 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/recipe.R
+\name{recipe}
+\alias{recipe}
+\alias{recipe.default}
+\alias{recipe.formula}
+\alias{recipe.default}
+\alias{recipe.data.frame}
+\alias{recipe.formula}
+\alias{recipe.matrix}
+\title{Create a Recipe for Preprocessing Data}
+\usage{
+recipe(x, ...)
+
+\method{recipe}{default}(x, ...)
+
+\method{recipe}{data.frame}(x, formula = NULL, ..., vars = NULL,
+  roles = NULL)
+
+\method{recipe}{formula}(formula, data, ...)
+
+\method{recipe}{matrix}(x, ...)
+}
+\arguments{
+\item{x, data}{A data frame or tibble of the \emph{template} data set
+(see below).}
+
+\item{...}{Further arguments passed to or from other methods (not currently
+used).}
+
+\item{formula}{A model formula. No in-line functions should be used here
+(e.g. \code{log(x)}, \code{x:y}, etc.). These types of transformations
+should be enacted using \code{step} functions in this package. Dots are
+allowed as are simple multivariate outcome terms (i.e. no need for
+\code{cbind}; see Examples).}
+
+\item{vars}{A character string of column names corresponding to variables
+that will be used in any context (see below)}
+
+\item{roles}{A character string (the same length of \code{vars}) that
+describes a single role that the variable will take. This value could be
+anything but common roles are \code{"outcome"}, \code{"predictor"},
+\code{"case_weight"}, or \code{"ID"}}
+}
+\value{
+An object of class \code{recipe} with sub-objects:
+  \item{var_info}{A tibble containing information about the original data
+  set columns}
+  \item{term_info}{A tibble that contains the current set of terms in the
+  data set. This initially defaults to the same data contained in
+  \code{var_info}.}
+  \item{steps}{A list of \code{step} objects that define the sequence of
+  preprocessing steps that will be applied to data. The default value is
+  \code{NULL}}
+  \item{template}{A tibble of the data. This is initialized to be the same
+  as the data given in the \code{data} argument but can be different after
+  the recipe is trained.}
+}
+\description{
+A recipe is a description of what steps should be applied to a data set in
+  order to get it ready for data analysis.
+}
+\details{
+Recipes are alternative methods for creating design matrices and
+  for preprocessing data.
+
+Variables in recipes can have any type of \emph{role} in subsequent analyses
+  such as: outcome, predictor, case weights, stratification variables, etc.
+
+\code{recipe} objects can be created in several ways. If the analysis only
+  contains outcomes and predictors, the simplest way to create one is to use
+  a simple formula (e.g. \code{y ~ x1 + x2}) that does not contain inline
+  functions such as \code{log(x3)}. An example is given below.
+
+Alternatively, a \code{recipe} object can be created by first specifying
+  which variables in a data set should be used and then sequentially
+  defining their roles (see the last example).
+
+Steps to the recipe can be added sequentially. Steps can include common
+  operations like logging a variable, creating dummy variables or
+  interactions and so on. More computationally complex actions such as
+  dimension reduction or imputation can also be specified.
+
+Once a recipe has been defined, the \code{\link{prep}} function can be
+  used to estimate quants required in the steps from a data set (a.k.a. the
+  training data). \code{\link{prep}} returns another recipe.
+
+To apply the recipe to a data set, the \code{\link{bake}} function is
+  used in the same manner as \code{predict} would be for models. This
+  applies the steps to any data set.
+
+Note that the data passed to \code{recipe} need not be the complete data
+  that will be used to train the steps (by \code{\link{prep}}). The recipe
+  only needs to know the names and types of data that will be used. For
+  large data sets, \code{head} could be used to pass the recipe a smaller
+  data set to save time and memory.
+}
+\examples{
+
+###############################################
+# simple example:
+data(biomass)
+
+# split data
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+# When only predictors and outcomes, a simplified formula can be used.
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+# Now add preprocessing steps to the recipe.
+
+sp_signed <- rec \%>\%
+  step_center(all_predictors()) \%>\%
+  step_scale(all_predictors()) \%>\%
+  step_spatialsign(all_predictors())
+sp_signed
+
+# now estimate required parameters
+sp_signed_trained <- prep(sp_signed, training = biomass_tr)
+sp_signed_trained
+
+# apply the preprocessing to a data set
+test_set_values <- bake(sp_signed_trained, newdata = biomass_te)
+
+# or use pipes for the entire workflow:
+rec <- biomass_tr \%>\%
+  recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur) \%>\%
+  step_center(all_predictors()) \%>\%
+  step_scale(all_predictors()) \%>\%
+  step_spatialsign(all_predictors())
+
+###############################################
+# multivariate example
+
+# no need for `cbind(carbon, hydrogen)` for left-hand side
+multi_y <- recipe(carbon + hydrogen ~ oxygen + nitrogen + sulfur,
+                  data = biomass_tr)
+multi_y <- multi_y \%>\%
+  step_center(all_outcomes()) \%>\%
+  step_scale(all_predictors())
+
+multi_y_trained <- prep(multi_y, training = biomass_tr)
+
+results <- bake(multi_y_trained, biomass_te)
+
+###############################################
+# Creating a recipe manually with different roles
+
+rec <- recipe(biomass_tr) \%>\%
+  add_role(carbon, hydrogen, oxygen, nitrogen, sulfur,
+           new_role = "predictor") \%>\%
+  add_role(HHV, new_role = "outcome") \%>\%
+  add_role(sample, new_role = "id variable") \%>\%
+  add_role(dataset, new_role = "splitting indicator")
+rec
+}
+\author{
+Max Kuhn
+}
+\concept{
+preprocessing model_specification
+}
+\keyword{datagen}
diff --git a/man/recipes-internal.Rd b/man/recipes-internal.Rd
new file mode 100644
index 0000000..bd7cfa2
--- /dev/null
+++ b/man/recipes-internal.Rd
@@ -0,0 +1,18 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/YeoJohnson.R, R/misc.R
+\name{yj_trans}
+\alias{yj_trans}
+\alias{estimate_yj}
+\alias{prepare}
+\title{Internal Functions}
+\usage{
+yj_trans(x, lambda, eps = 0.001)
+
+estimate_yj(dat, limits = c(-5, 5), nunique = 5)
+
+prepare(x, ...)
+}
+\description{
+These are not to be used directly by the users.
+}
+\keyword{internal}
diff --git a/man/recipes.Rd b/man/recipes.Rd
new file mode 100644
index 0000000..f23300a
--- /dev/null
+++ b/man/recipes.Rd
@@ -0,0 +1,40 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/pkg.R
+\docType{package}
+\name{recipes}
+\alias{recipes}
+\alias{recipes-package}
+\title{recipes: A package for computing and preprocessing design matrices.}
+\description{
+The \code{recipes} package can be used to create design matrices for modeling
+  and to conduct preprocessing of variables. It is meant to be a more
+  extensive framework that R's formula method. Some differences between
+  simple formula methods and recipes are that
+\enumerate{
+\item Variables can have arbitrary roles in the analysis beyond predictors
+ and outcomes.
+\item A recipe consists of one or more steps that define actions on the
+ variables.
+\item Recipes can be defined sequentially using pipes as well as being
+ modifiable and extensible.
+}
+}
+\section{Basic Functions}{
+
+The three main functions are \code{\link{recipe}}, \code{\link{prep}},
+  and \code{\link{bake}}.
+
+\code{\link{recipe}} defines the operations on the data and the associated
+  roles. Once the preprocessing steps are defined, any parameters are
+  estimated using \code{\link{prep}}. Once the data are ready for
+  transformation, the \code{\link{bake}} function applies the operations.
+}
+
+\section{Step Functions}{
+
+These functions are used to add new actions to the recipe and have the
+  naming convention \code{"step_action"}. For example,
+  \code{\link{step_center}} centers the data to have a zero mean and
+  \code{\link{step_dummy}} is used to create dummy variables.
+}
+
diff --git a/man/reexports.Rd b/man/reexports.Rd
new file mode 100644
index 0000000..f40070f
--- /dev/null
+++ b/man/reexports.Rd
@@ -0,0 +1,16 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/misc.R
+\docType{import}
+\name{reexports}
+\alias{reexports}
+\alias{\%>\%}
+\title{Objects exported from other packages}
+\keyword{internal}
+\description{
+These objects are imported from other packages. Follow the links
+below to see their documentation.
+
+\describe{
+  \item{magrittr}{\code{\link[magrittr]{\%>\%}}}
+}}
+
diff --git a/man/selections.Rd b/man/selections.Rd
new file mode 100644
index 0000000..77ea0e4
--- /dev/null
+++ b/man/selections.Rd
@@ -0,0 +1,99 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/selections.R
+\name{selections}
+\alias{selections}
+\title{Methods for Select Variables in Step Functions}
+\description{
+When selecting variables or model terms in \code{step}
+  functions, \code{dplyr}-like tools are used. The \emph{selector}
+  functions can choose variables based on their name, current role, data
+  type, or any combination of these. The selectors are passed as any other
+  argument to the step. If the variables are explicitly stated in the step
+  function, this might be similar to:
+
+\preformatted{
+  recipe( ~ ., data = USArrests) \%>\%
+    step_pca(Murder, Assault, UrbanPop, Rape, num = 3)
+}
+
+The first four arguments indicate which variables should be used in the
+  PCA while the last argument is a specific argument to
+  \code{\link{step_pca}}.
+
+Note that:
+
+  \enumerate{
+    \item The selector arguments should not contain functions beyond those
+      supported (see below).
+    \item These arguments are not evaluated until the \code{prep} function
+      for the step is executed.
+    \item The \code{dplyr}-like syntax allows for negative sings to exclude
+      variables (e.g. \code{-Murder}) and the set of selectors will
+      processed in order.
+    \item A leading exclusion in these arguments (e.g. \code{-Murder}) has
+      the effect of adding all variables to the list except the excluded
+      variable(s).
+  }
+
+Also, select helpers from the \code{dplyr} package can also be used:
+  \code{\link[dplyr]{starts_with}}, \code{\link[dplyr]{ends_with}},
+  \code{\link[dplyr]{contains}}, \code{\link[dplyr]{matches}},
+  \code{\link[dplyr]{num_range}}, and \code{\link[dplyr]{everything}}.
+  For example:
+
+\preformatted{
+  recipe(Species ~ ., data = iris) \%>\%
+    step_center(starts_with("Sepal"), -contains("Width"))
+}
+
+would only select \code{Sepal.Length}
+
+\bold{Inline} functions that specify computations, such as \code{log(x)},
+  should not be used in selectors and will produce an error. A list of
+  allowed selector functions is below.
+
+Columns of the design matrix that may not exist when the step is coded can
+  also be selected. For example, when using \code{step_pca}, the number of
+  columns created by feature extraction may not be known when subsequent
+  steps are defined. In this case, using \code{matches("^PC")} will select
+  all of the columns whose names start with "PC" \emph{once those columns
+  are created}.
+
+There are sets of functions that can be used to select variables based on
+  their role or type: \code{\link{has_role}} and \code{\link{has_type}}.
+  For convenience, there are also functions that are more specific:
+  \code{\link{all_numeric}}, \code{\link{all_nominal}},
+  \code{\link{all_predictors}}, and \code{\link{all_outcomes}}. These can
+  be used in conjunction with the previous functions described for
+  selecting variables using their names:
+
+\preformatted{
+  data(biomass)
+  recipe(HHV ~ ., data = biomass) \%>\%
+    step_center(all_numeric(), -all_outcomes())
+}
+
+This results in all the numeric predictors: carbon, hydrogen, oxygen,
+  nitrogen, and sulfur.
+
+If a role for a variable has not been defined, it will never be selected
+  using role-specific selectors.
+
+All steps use these techniques to define variables for steps
+  \emph{except one}: \code{\link{step_interact}} requires traditional model
+  formula representations of the interactions and takes a single formula
+  as the argument to select the variables.
+
+The complete list of allowable functions in steps:
+
+  \itemize{
+    \item \bold{By name}: \code{\link[dplyr]{starts_with}},
+      \code{\link[dplyr]{ends_with}}, \code{\link[dplyr]{contains}},
+      \code{\link[dplyr]{matches}}, \code{\link[dplyr]{num_range}}, and
+      \code{\link[dplyr]{everything}}
+    \item \bold{By role}: \code{\link{has_role}},
+      \code{\link{all_predictors}}, and \code{\link{all_outcomes}}
+    \item \bold{By type}: \code{\link{has_type}}, \code{\link{all_numeric}},
+      and \code{\link{all_nominal}}
+  }
+}
diff --git a/man/step.Rd b/man/step.Rd
new file mode 100644
index 0000000..b54e767
--- /dev/null
+++ b/man/step.Rd
@@ -0,0 +1,25 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/misc.R
+\name{step}
+\alias{step}
+\title{A General Step Wrapper}
+\usage{
+step(subclass, ...)
+}
+\arguments{
+\item{subclass}{A character string for the resulting class. For example,
+if \code{subclass = "blah"} the step object that is returned has class
+\code{step_blah}.}
+
+\item{...}{All arguments to the step that should be returned.}
+}
+\value{
+A updated step with the new class.
+}
+\description{
+\code{step} sets the class of the step.
+}
+\concept{
+preprocessing
+}
+\keyword{datagen}
diff --git a/man/step_BoxCox.Rd b/man/step_BoxCox.Rd
new file mode 100644
index 0000000..812bb4c
--- /dev/null
+++ b/man/step_BoxCox.Rd
@@ -0,0 +1,80 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/BoxCox.R
+\name{step_BoxCox}
+\alias{step_BoxCox}
+\title{Box-Cox Transformation for Non-Negative Data}
+\usage{
+step_BoxCox(recipe, ..., role = NA, trained = FALSE, lambdas = NULL,
+  limits = c(-5, 5), nunique = 5)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{lambdas}{A numeric vector of transformation values. This is
+\code{NULL} until computed by \code{\link{prep.recipe}}.}
+
+\item{limits}{A length 2 numeric vector defining the range to compute the
+transformation parameter lambda.}
+
+\item{nunique}{An integer where data that have less possible values will
+not be evaluate for a transformation.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_BoxCox} creates a \emph{specification} of a recipe step that will
+   transform data using a simple Box-Cox transformation.
+}
+\details{
+The Box-Cox transformation, which requires a strictly positive
+  variable, can be used to rescale a variable to be more similar to a
+ normal distribution. In this package, the partial log-likelihood function
+ is directly optimized within a reasonable set of transformation values
+ (which can be changed by the user).
+
+This transformation is typically done on the outcome variable using the
+  residuals for a statistical model (such as ordinary least squares).
+  Here, a simple null model (intercept only) is used to apply the
+  transformation to the \emph{predictor} variables individually. This can
+  have the effect of making the variable distributions more symmetric.
+
+If the transformation parameters are estimated to be very closed to the
+  bounds, or if the optimization fails, a value of \code{NA} is used and
+  no transformation is applied.
+}
+\examples{
+
+rec <- recipe(~ ., data = as.data.frame(state.x77))
+
+bc_trans <- step_BoxCox(rec, all_numeric())
+
+bc_estimates <- prep(bc_trans, training = as.data.frame(state.x77))
+
+bc_data <- bake(bc_estimates, as.data.frame(state.x77))
+
+plot(density(state.x77[, "Illiteracy"]), main = "before")
+plot(density(bc_data$Illiteracy), main = "after")
+}
+\references{
+Sakia, R. M. (1992). The Box-Cox transformation technique:
+  A review. \emph{The Statistician}, 169-178..
+}
+\seealso{
+\code{\link{step_YeoJohnson}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_YeoJohnson.Rd b/man/step_YeoJohnson.Rd
new file mode 100644
index 0000000..a8d27f9
--- /dev/null
+++ b/man/step_YeoJohnson.Rd
@@ -0,0 +1,86 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/YeoJohnson.R
+\name{step_YeoJohnson}
+\alias{step_YeoJohnson}
+\title{Yeo-Johnson Transformation}
+\usage{
+step_YeoJohnson(recipe, ..., role = NA, trained = FALSE, lambdas = NULL,
+  limits = c(-5, 5), nunique = 5)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{lambdas}{A numeric vector of transformation values. This is
+\code{NULL} until computed by \code{\link{prep.recipe}}.}
+
+\item{limits}{A length 2 numeric vector defining the range to compute the
+transformation parameter lambda.}
+
+\item{nunique}{An integer where data that have less possible values will
+not be evaluate for a transformation}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_YeoJohnson} creates a \emph{specification} of a recipe step that
+  will transform data using a simple Yeo-Johnson transformation.
+}
+\details{
+The Yeo-Johnson transformation is very similar to the Box-Cox but
+  does not require the input variables to be strictly positive. In the
+  package, the partial log-likelihood function is directly optimized within
+  a reasonable set of transformation values (which can be changed by the
+  user).
+
+This transformation is typically done on the outcome variable using the
+  residuals for a statistical model (such as ordinary least squares). Here,
+  a simple null model (intercept only) is used to apply the transformation
+  to the \emph{predictor} variables individually. This can have the effect
+  of making the variable distributions more symmetric.
+
+If the transformation parameters are estimated to be very closed to the
+  bounds, or if the optimization fails, a value of \code{NA} is used and
+  no transformation is applied.
+}
+\examples{
+
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+yj_trans <- step_YeoJohnson(rec,  all_numeric())
+
+yj_estimates <- prep(yj_trans, training = biomass_tr)
+
+yj_te <- bake(yj_estimates, biomass_te)
+
+plot(density(biomass_te$sulfur), main = "before")
+plot(density(yj_te$sulfur), main = "after")
+}
+\references{
+Yeo, I. K., and Johnson, R. A. (2000). A new family of power
+  transformations to improve normality or symmetry. \emph{Biometrika}.
+}
+\seealso{
+\code{\link{step_BoxCox}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_bagimpute.Rd b/man/step_bagimpute.Rd
new file mode 100644
index 0000000..0e60521
--- /dev/null
+++ b/man/step_bagimpute.Rd
@@ -0,0 +1,102 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/bag_imp.R
+\name{step_bagimpute}
+\alias{step_bagimpute}
+\alias{imp_vars}
+\title{Imputation via Bagged Trees}
+\usage{
+step_bagimpute(recipe, ..., role = NA, trained = FALSE, models = NULL,
+  options = list(nbagg = 25, keepX = FALSE),
+  impute_with = imp_vars(all_predictors()), seed_val = sample.int(10^4, 1))
+
+imp_vars(...)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose variables. For 
+\code{step_bagimpute}, this indicates the variables to be imputed. When 
+used with \code{imp_vars}, the dots indicates which variables are used to 
+predict the missing data in each variable. See \code{\link{selections}} 
+for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{models}{The \code{\link[ipred]{ipredbagg}} objects are stored here 
+once this bagged trees have be trained by \code{\link{prep.recipe}}.}
+
+\item{options}{A list of options to \code{\link[ipred]{ipredbagg}}. Defaults 
+are set for the arguments \code{nbagg} and \code{keepX} but others can be 
+passed in. \bold{Note} that the arguments \code{X} and \code{y} should not 
+be passed here.}
+
+\item{impute_with}{A call to \code{imp_vars} to specify which variables are 
+used to impute the variables that can inlcude specific variable names 
+seperated by commas or different selectors (see 
+\code{\link{selections}}).  If a column is included in both lists to be 
+imputed and to be an imputation predictor, it will be removed from the 
+latter and not used to impute itself.}
+
+\item{seed_val}{A integer used to create reproducible models. The same seed 
+is used across all imputation models.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_bagimpute} creates a \emph{specification} of a recipe step that 
+  will create bagged tree models to impute missing data.
+}
+\details{
+For each variables requiring imputation, a bagged tree is created 
+  where the outcome is the variable of interest and the predictors are any 
+  other variables listed in the \code{impute_with} formula. One advantage to 
+  the bagged tree is that is can accept predictors that have missing values 
+  themselves. This imputation method can be used when the variable of 
+  interest (and predictors) are numeric or categorical. Imputed categorical 
+  variables will remain categorical.
+
+Note that if a variable that is to be imputed is also in \code{impute_with}, 
+  this variable will be ignored.
+
+It is possible that missing values will still occur after imputation if a 
+  large majority (or all) of the imputing variables are also missing.
+}
+\examples{
+data("credit_data")
+
+## missing data per column
+vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
+
+set.seed(342)
+in_training <- sample(1:nrow(credit_data), 2000)
+
+credit_tr <- credit_data[ in_training, ]
+credit_te <- credit_data[-in_training, ]
+missing_examples <- c(14, 394, 565)
+
+rec <- recipe(Price ~ ., data = credit_tr)
+
+impute_rec <- rec \%>\%
+  step_bagimpute(Status, Home, Marital, Job, Income, Assets, Debt)
+
+imp_models <- prep(impute_rec, training = credit_tr)
+
+imputed_te <- bake(imp_models, newdata = credit_te, everything())
+
+credit_te[missing_examples,]
+imputed_te[missing_examples, names(credit_te)]
+}
+\references{
+Kuhn, M. and Johnson, K. (2013). 
+  \emph{Applied Predictive Modeling}. Springer Verlag.
+}
+\concept{
+preprocessing imputation
+}
+\keyword{datagen}
diff --git a/man/step_bin2factor.Rd b/man/step_bin2factor.Rd
new file mode 100644
index 0000000..a1de62a
--- /dev/null
+++ b/man/step_bin2factor.Rd
@@ -0,0 +1,61 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/bin2factor.R
+\name{step_bin2factor}
+\alias{step_bin2factor}
+\title{Create a Factors from A Dummy Variable}
+\usage{
+step_bin2factor(recipe, ..., role = NA, trained = FALSE, levels = c("yes",
+  "no"), columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{Selector functions that choose which variables will be converted.
+See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{levels}{A length 2 character string that indicate the factor levels
+for the 1's (in the first position) and the zeros (second)}
+
+\item{columns}{A vector with the selected variable names. This is
+\code{NULL} until computed by \code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_bin2factor} creates a \emph{specification} of a recipe step that
+will create a two-level factor from a single dummy variable.
+}
+\details{
+This operation may be useful for situations where a binary piece of
+  information may need to be represented as categorical instead of numeric.
+  For example, naive Bayes models would do better to have factor predictors
+  so that the binomial distribution is modeled in stead of a Gaussian
+  probability density of numeric binary data.
+Note that the numeric data is only verified to be numeric (and does not
+count levels).
+}
+\examples{
+data(covers)
+
+rec <- recipe(~ description, covers) \%>\%
+ step_regex(description, pattern = "(rock|stony)", result = "rocks") \%>\%
+ step_regex(description, pattern = "(rock|stony)", result = "more_rocks") \%>\%
+ step_bin2factor(rocks)
+
+rec <- prep(rec, training = covers)
+results <- bake(rec, newdata = covers)
+
+table(results$rocks, results$more_rocks)
+}
+\concept{
+preprocessing dummy_variables factors
+}
+\keyword{datagen}
diff --git a/man/step_center.Rd b/man/step_center.Rd
new file mode 100644
index 0000000..95fbfb3
--- /dev/null
+++ b/man/step_center.Rd
@@ -0,0 +1,69 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/center.R
+\name{step_center}
+\alias{step_center}
+\title{Centering Numeric Data}
+\usage{
+step_center(recipe, ..., role = NA, trained = FALSE, means = NULL,
+  na.rm = TRUE)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{means}{A named numeric vector of means. This is \code{NULL} until 
+computed by \code{\link{prep.recipe}}.}
+
+\item{na.rm}{A logical value indicating whether \code{NA} values should be 
+removed when averaging.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_center} creates a \emph{specification} of a recipe step that 
+  will normalize numeric data to have a mean of zero.
+}
+\details{
+Centering data means that the average of a variable is subtracted 
+  from the data. \code{step_center} estimates the variable means from the 
+  data used in the \code{training} argument of \code{prep.recipe}. 
+  \code{bake.recipe} then applies the centering to new data sets using 
+  these means.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+center_trans <- rec \%>\%
+  step_center(carbon, contains("gen"), -hydrogen)
+
+center_obj <- prep(center_trans, training = biomass_tr)
+
+transformed_te <- bake(center_obj, biomass_te)
+
+biomass_te[1:10, names(transformed_te)]
+transformed_te
+}
+\seealso{
+\code{\link{recipe}} \code{\link{prep.recipe}} 
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing normalization_methods
+}
+\keyword{datagen}
diff --git a/man/step_classdist.Rd b/man/step_classdist.Rd
new file mode 100644
index 0000000..2bacbdb
--- /dev/null
+++ b/man/step_classdist.Rd
@@ -0,0 +1,82 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/classdist.R
+\name{step_classdist}
+\alias{step_classdist}
+\title{Distances to Class Centroids}
+\usage{
+step_classdist(recipe, ..., class, role = "predictor", trained = FALSE,
+  mean_func = mean, cov_func = cov, pool = FALSE, log = TRUE,
+  objects = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{class}{A single character string that specifies a single categorical
+variable to be used as the class.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that resulting
+distances will be used as predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{mean_func}{A function to compute the center of the distribution.}
+
+\item{cov_func}{A function that computes the covariance matrix}
+
+\item{pool}{A logical: should the covariance matrix be computed by pooling
+the data for all of the classes?}
+
+\item{log}{A logical: should the distances be transformed by the natural
+log function?}
+
+\item{objects}{Statistics are stored here once this step has been trained
+by \code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_classdist} creates a a \emph{specification} of a recipe step
+  that will convert numeric data into Mahalanobis distance measurements to
+  the data centroid. This is done for each value of a categorical class
+  variable.
+}
+\details{
+\code{step_classdist} will create a
+
+The function will create a new column for every unique value of the
+  \code{class} variable. The resulting variables will not replace the
+  original values and have the prefix \code{classdist_}.
+
+Note that, by default,  the default covariance function requires that each
+  class should have at least as many rows as variables listed in the
+  \code{terms} argument. If \code{pool = TRUE}, there must be at least as
+  many data points are variables overall.
+}
+\examples{
+
+# in case of missing data...
+mean2 <- function(x) mean(x, na.rm = TRUE)
+
+rec <- recipe(Species ~ ., data = iris) \%>\%
+  step_classdist(all_predictors(), class = "Species",
+                 pool = FALSE, mean_func = mean2)
+
+rec_dists <- prep(rec, training = iris)
+
+dists_to_species <- bake(rec_dists, newdata = iris, everything())
+## on log scale:
+dist_cols <- grep("classdist", names(dists_to_species), value = TRUE)
+dists_to_species[, c("Species", dist_cols)]
+}
+\concept{
+preprocessing dimension_reduction
+}
+\keyword{datagen}
diff --git a/man/step_corr.Rd b/man/step_corr.Rd
new file mode 100644
index 0000000..2e2df32
--- /dev/null
+++ b/man/step_corr.Rd
@@ -0,0 +1,83 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/corr.R
+\name{step_corr}
+\alias{step_corr}
+\title{High Correlation Filter}
+\usage{
+step_corr(recipe, ..., role = NA, trained = FALSE, threshold = 0.9,
+  use = "pairwise.complete.obs", method = "pearson", removals = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{threshold}{A value for the threshold of absolute correlation values.
+The step will try to remove the minimum number of columns so that all the
+resulting absolute correlations are less than this value.}
+
+\item{use}{A character string for the \code{use} argument to the
+\code{\link[stats]{cor}} function.}
+
+\item{method}{A character string for the \code{method} argument to the
+\code{\link[stats]{cor}} function.}
+
+\item{removals}{A character string that contains the names of columns that
+should be removed. These values are not determined until
+\code{\link{prep.recipe}} is called.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_corr} creates a \emph{specification} of a recipe step that will
+  potentially remove variables that have large absolute correlations with
+  other variables.
+}
+\details{
+This step attempts to remove variables to keep the largest absolute
+  correlation between the variables less than \code{threshold}.
+}
+\examples{
+data(biomass)
+
+set.seed(3535)
+biomass$duplicate <- biomass$carbon + rnorm(nrow(biomass))
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen +
+                    sulfur + duplicate,
+              data = biomass_tr)
+
+corr_filter <- rec \%>\%
+  step_corr(all_predictors(), threshold = .5)
+
+filter_obj <- prep(corr_filter, training = biomass_tr)
+
+filtered_te <- bake(filter_obj, biomass_te)
+round(abs(cor(biomass_tr[, c(3:7, 9)])), 2)
+round(abs(cor(filtered_te)), 2)
+}
+\seealso{
+\code{\link{step_nzv}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\author{
+Original R code for filtering algorithm by Dong Li, modified by
+  Max Kuhn. Contributions by Reynald Lescarbeau (for original in
+  \code{caret} package). Max Kuhn for the \code{step} function.
+}
+\concept{
+preprocessing variable_filters
+}
+\keyword{datagen}
diff --git a/man/step_date.Rd b/man/step_date.Rd
new file mode 100644
index 0000000..d23c9fc
--- /dev/null
+++ b/man/step_date.Rd
@@ -0,0 +1,83 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/date.R
+\name{step_date}
+\alias{step_date}
+\title{Date Feature Generator}
+\usage{
+step_date(recipe, ..., role = "predictor", trained = FALSE,
+  features = c("dow", "month", "year"), abbr = TRUE, label = TRUE,
+  ordinal = FALSE, columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables that
+will be used to create the new variables. The selected variables should
+have class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for
+more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new variable
+columns created by the original variables will be used as predictors in a
+model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{features}{A character string that includes at least one of the
+following values: \code{month}, \code{dow} (day of week), \code{doy}
+(day of year), \code{week}, \code{month}, \code{decimal} (decimal date,
+e.g. 2002.197), \code{quarter}, \code{semester}, \code{year}.}
+
+\item{abbr}{A logical. Only available for features \code{month} or
+\code{dow}. \code{FALSE} will display the day of the week as an ordered
+factor of character strings, such as "Sunday". \code{TRUE} will display
+an abbreviated version of the label, such as "Sun". \code{abbr} is
+disregarded if \code{label = FALSE}.}
+
+\item{label}{A logical. Only available for features \code{month} or
+\code{dow}. \code{TRUE} will display the day of the week as an ordered
+factor of character strings, such as "Sunday." \code{FALSE} will display
+the day of the week as a number.}
+
+\item{ordinal}{A logical: should factors be ordered? Only available for
+features \code{month} or \code{dow}.}
+
+\item{columns}{A character string of variables that will be used as
+inputs. This field is a placeholder and will be populated once
+ \code{\link{prep.recipe}} is used.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_date} creates a a \emph{specification} of a recipe step that will
+  convert date data into one or more factor or numeric variables.
+}
+\details{
+Unlike other steps, \code{step_date} does \emph{not} remove the
+  original date variables. \code{\link{step_rm}} can be used for this
+  purpose.
+}
+\examples{
+library(lubridate)
+
+examples <- data.frame(Dan = ymd("2002-03-04") + days(1:10),
+                       Stefan = ymd("2006-01-13") + days(1:10))
+date_rec <- recipe(~ Dan + Stefan, examples) \%>\%
+   step_date(all_predictors())
+
+date_rec <- prep(date_rec, training = examples)
+date_values <- bake(date_rec, newdata = examples)
+date_values
+}
+\seealso{
+\code{\link{step_holiday}} \code{\link{step_rm}} 
+  \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing model_specification variable_encodings dates
+}
+\keyword{datagen}
diff --git a/man/step_depth.Rd b/man/step_depth.Rd
new file mode 100644
index 0000000..7cc1c62
--- /dev/null
+++ b/man/step_depth.Rd
@@ -0,0 +1,87 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/depth.R
+\name{step_depth}
+\alias{step_depth}
+\title{Data Depths}
+\usage{
+step_depth(recipe, ..., class, role = "predictor", trained = FALSE,
+  metric = "halfspace", options = list(), data = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables that
+will be used to create the new features. See \code{\link{selections}} for
+more details.}
+
+\item{class}{A single character string that specifies a single categorical
+variable to be used as the class.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that resulting depth
+estimates will be used as predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{metric}{A character string specifying the depth metric. Possible
+values are "potential", "halfspace", "Mahalanobis", "simplicialVolume",
+"spatial", and "zonoid".}
+
+\item{options}{A list of options to pass to the underlying depth functions.
+See \code{\link[ddalpha]{depth.halfspace}},
+\code{\link[ddalpha]{depth.Mahalanobis}},
+\code{\link[ddalpha]{depth.potential}},
+\code{\link[ddalpha]{depth.projection}},
+\code{\link[ddalpha]{depth.simplicial}},
+\code{\link[ddalpha]{depth.simplicialVolume}},
+\code{\link[ddalpha]{depth.spatial}}, \code{\link[ddalpha]{depth.zonoid}}.}
+
+\item{data}{The training data are stored here once after
+\code{\link{prep.recipe}} is executed.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_depth} creates a a \emph{specification} of a recipe step that
+  will convert numeric data into measurement of \emph{data depth}. This is
+  done for each value of a categorical class variable.
+}
+\details{
+Data depth metrics attempt to measure how close data a data point
+  is to the center of its distribution.  There are a number of methods for
+  calculating death but a simple example is the inverse of the distance of
+  a data point to the centroid of the distribution. Generally, small values
+  indicate that a data point not close to the centroid. \code{step_depth}
+  can compute a class-specific depth for a new data point based on the
+  proximity of the new value to the training set distribution.
+
+Note that the entire training set is saved to compute future depth values.
+The saved data have been trained (i.e. prepared) and baked (i.e. processed) up to the point before the
+location that \code{step_depth} occupies in the recipe. Also, the data
+requirements for the different step methods may vary. For example, using
+\code{metric = "Mahalanobis"} requires that each class should have at least
+as many rows as variables listed in the \code{terms} argument.
+
+The function will create a new column for every unique value of the
+\code{class} variable. The resulting variables will not replace the
+original values and have the prefix \code{depth_}.
+}
+\examples{
+
+# halfspace depth is the default
+rec <- recipe(Species ~ ., data = iris) \%>\%
+  step_depth(all_predictors(), class = "Species")
+
+rec_dists <- prep(rec, training = iris)
+
+dists_to_species <- bake(rec_dists, newdata = iris)
+dists_to_species
+}
+\concept{
+preprocessing dimension_reduction
+}
+\keyword{datagen}
diff --git a/man/step_dummy.Rd b/man/step_dummy.Rd
new file mode 100644
index 0000000..4c1cb19
--- /dev/null
+++ b/man/step_dummy.Rd
@@ -0,0 +1,84 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/dummy.R
+\name{step_dummy}
+\alias{step_dummy}
+\title{Dummy Variables Creation}
+\usage{
+step_dummy(recipe, ..., role = "predictor", trained = FALSE,
+  contrast = options("contrasts"), naming = function(var, lvl) paste(var,
+  make.names(lvl), sep = "_"), levels = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will
+be used to create the dummy variables. See \code{\link{selections}} for
+more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the binary
+dummy variable columns created by the original variables will be used as
+predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{contrast}{A specification for which type of contrast should be used
+to make a set of full rank dummy variables. See
+\code{\link[stats]{contrasts}} for more details. \bold{not currently
+working}}
+
+\item{naming}{A function that defines the naming convention for new binary
+columns. See Details below.}
+
+\item{levels}{A list that contains the information needed to create dummy
+variables for each variable contained in \code{terms}. This is
+\code{NULL} until the step is trained by \code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_dummy} creates a a \emph{specification} of a recipe step that
+  will convert nominal data (e.g. character or factors) into one or more
+  numeric binary model terms for the levels of the original data.
+}
+\details{
+\code{step_dummy} will create a set of binary dummy variables 
+  from a factor variable. For example, if a factor column in the data set
+  has levels of "red", "green", "blue", the dummy variable bake will
+  create two additional columns of 0/1 data for two of those three values
+  (and remove the original column).
+
+By default, the missing dummy variable will correspond to the first level
+  of the factor being converted.
+
+The function allows for non-standard naming of the resulting variables. For
+  a factor named \code{x}, with levels \code{"a"} and \code{"b"}, the
+  default naming convention would be to create a new variable called
+  \code{x_b}. Note that if the factor levels are not valid variable names
+  (e.g. "some text with spaces"), it will be changed by
+  \code{\link[base]{make.names}} to be valid (see the example below). The
+  naming format can be changed using the \code{naming} argument.
+}
+\examples{
+data(okc)
+okc <- okc[complete.cases(okc),]
+
+rec <- recipe(~ diet + age + height, data = okc)
+
+dummies <- rec \%>\% step_dummy(diet)
+dummies <- prep(dummies, training = okc)
+
+dummy_data <- bake(dummies, newdata = okc)
+
+unique(okc$diet)
+grep("^diet", names(dummy_data), value = TRUE)
+}
+\concept{
+preprocessing dummy_variables model_specification dummy_variables
+  variable_encodings
+}
+\keyword{datagen}
diff --git a/man/step_holiday.Rd b/man/step_holiday.Rd
new file mode 100644
index 0000000..1be6d1e
--- /dev/null
+++ b/man/step_holiday.Rd
@@ -0,0 +1,68 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/holiday.R
+\name{step_holiday}
+\alias{step_holiday}
+\title{Holiday Feature Generator}
+\usage{
+step_holiday(recipe, ..., role = "predictor", trained = FALSE,
+  holidays = c("LaborDay", "NewYearsDay", "ChristmasDay"), columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+used to create the new variables. The selected variables should have
+class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for
+more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new variable
+columns created by the original variables will be used as predictors in
+a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{holidays}{A character string that includes at least one holdiay
+supported by the \code{timeDate} package. See
+\code{\link[timeDate]{listHolidays}} for a complete list.}
+
+\item{columns}{A character string of variables that will be used as
+inputs. This field is a placeholder and will be populated once
+\code{\link{prep.recipe}} is used.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_holiday} creates a a \emph{specification} of a recipe step that
+  will convert date data into one or more binary indicator variables for
+  common holidays.
+}
+\details{
+Unlike other steps, \code{step_holiday} does \emph{not} remove the
+  original date variables. \code{\link{step_rm}} can be used for
+  this purpose.
+}
+\examples{
+library(lubridate)
+
+examples <- data.frame(someday = ymd("2000-12-20") + days(0:40))
+holiday_rec <- recipe(~ someday, examples) \%>\%
+   step_holiday(all_predictors())
+
+holiday_rec <- prep(holiday_rec, training = examples)
+holiday_values <- bake(holiday_rec, newdata = examples)
+holiday_values
+}
+\seealso{
+\code{\link{step_date}} \code{\link{step_rm}}
+  \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}} \code{\link[timeDate]{listHolidays}}
+}
+\concept{
+preprocessing model_specification variable_encodings dates
+}
+\keyword{datagen}
diff --git a/man/step_hyperbolic.Rd b/man/step_hyperbolic.Rd
new file mode 100644
index 0000000..ec86ca1
--- /dev/null
+++ b/man/step_hyperbolic.Rd
@@ -0,0 +1,62 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/hyperbolic.R
+\name{step_hyperbolic}
+\alias{step_hyperbolic}
+\title{Hyperbolic Transformations}
+\usage{
+step_hyperbolic(recipe, ..., role = NA, trained = FALSE, func = "sin",
+  inverse = TRUE, columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{func}{A character value for the function. Valid values are "sin",
+"cos", or "tan".}
+
+\item{inverse}{A logical: should the inverse function be used?}
+
+\item{columns}{A character string of variable names that will be (eventually)
+populated by the \code{terms} argument.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_hyperbolic} creates a \emph{specification} of a recipe step that
+  will transform data using a hyperbolic function.
+}
+\examples{
+set.seed(313)
+examples <- matrix(rnorm(40), ncol = 2)
+examples <- as.data.frame(examples)
+
+rec <- recipe(~ V1 + V2, data = examples)
+
+cos_trans <- rec  \%>\%
+  step_hyperbolic(all_predictors(),
+                  func = "cos", inverse = FALSE)
+
+cos_obj <- prep(cos_trans, training = examples)
+
+transformed_te <- bake(cos_obj, examples)
+plot(examples$V1, transformed_te$V1)
+}
+\seealso{
+\code{\link{step_logit}} \code{\link{step_invlogit}}
+  \code{\link{step_log}}  \code{\link{step_sqrt}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_ica.Rd b/man/step_ica.Rd
new file mode 100644
index 0000000..a9566ed
--- /dev/null
+++ b/man/step_ica.Rd
@@ -0,0 +1,103 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/ica.R
+\name{step_ica}
+\alias{step_ica}
+\title{ICA Signal Extraction}
+\usage{
+step_ica(recipe, ..., role = "predictor", trained = FALSE, num = 5,
+  options = list(), res = NULL, prefix = "IC")
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+used to compute the components. See \code{\link{selections}} for more
+details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new
+independent component columns created by the original variables will be
+used as predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{num}{The number of ICA components to retain as new predictors. If
+\code{num} is greater than the number of columns or the number of possible
+components, a smaller value will be used.}
+
+\item{options}{A list of options to \code{\link[fastICA]{fastICA}}. No
+defaults are set here. \bold{Note} that the arguments \code{X} and
+\code{n.comp} should not be passed here.}
+
+\item{res}{The \code{\link[fastICA]{fastICA}} object is stored here once
+this preprocessing step has be trained by \code{\link{prep.recipe}}.}
+
+\item{prefix}{A character string that will be the prefix to the resulting
+new variables. See notes below.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_ica} creates a \emph{specification} of a recipe step that will
+  convert numeric data into one or more independent components.
+}
+\details{
+Independent component analysis (ICA) is a transformation of a
+  group of variables that produces a new set of artificial features or
+  components. ICA assumes that the variables are mixtures of a set of
+  distinct, non-Gaussian signals and attempts to transform the data to
+  isolate these signals. Like PCA, the components are statistically
+  independent from one another. This means that they can be used to combat
+  large inter-variables correlations in a data set. Also like PCA, it is
+  advisable to center and scale the variables prior to running ICA.
+
+This package produces components using the "FastICA" methodology (see
+  reference below).
+
+The argument \code{num} controls the number of components that will be
+  retained (the original variables that are used to derive the components
+  are removed from the data). The new components will have names that begin
+  with \code{prefix} and a sequence of numbers. The variable names are
+  padded with zeros. For example, if \code{num < 10}, their names will be
+  \code{IC1} - \code{IC9}. If \code{num = 101}, the names would be
+  \code{IC001} - \code{IC101}.
+}
+\examples{
+# from fastICA::fastICA
+set.seed(131)
+S <- matrix(runif(400), 200, 2)
+A <- matrix(c(1, 1, -1, 3), 2, 2, byrow = TRUE)
+X <- as.data.frame(S \%*\% A)
+
+tr <- X[1:100, ]
+te <- X[101:200, ]
+
+rec <- recipe( ~ ., data = tr)
+
+ica_trans <- step_center(rec,  V1, V2)
+ica_trans <- step_scale(rec, V1, V2)
+ica_trans <- step_ica(rec, V1, V2, num = 2)
+ica_estimates <- prep(ica_trans, training = tr)
+ica_data <- bake(ica_estimates, te)
+
+plot(te$V1, te$V2)
+plot(ica_data$IC1, ica_data$IC2)
+}
+\references{
+Hyvarinen, A., and Oja, E. (2000). Independent component
+  analysis: algorithms and applications. \emph{Neural Networks}, 13(4-5),
+  411-430.
+}
+\seealso{
+\code{\link{step_pca}} \code{\link{step_kpca}}
+  \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing ica projection_methods
+}
+\keyword{datagen}
diff --git a/man/step_interact.Rd b/man/step_interact.Rd
new file mode 100644
index 0000000..a1b76de
--- /dev/null
+++ b/man/step_interact.Rd
@@ -0,0 +1,82 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/interactions.R
+\name{step_interact}
+\alias{step_interact}
+\title{Create Interaction Variables}
+\usage{
+step_interact(recipe, terms, role = "predictor", trained = FALSE,
+  objects = NULL, sep = "_x_")
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{terms}{A traditional R formula that contains interaction terms.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new columns
+created from the original variables will be used as predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{objects}{A list of \code{terms} objects for each individual interation.}
+
+\item{sep}{A character value used to delinate variables in an interaction
+(e.g. \code{var1_x_var2} instead of the more traditional \code{var1:var2}).}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_interact} creates a \emph{specification} of a recipe step that
+  will create new columns that are interaction terms between two or more
+  variables.
+}
+\details{
+\code{step_interact} can create interactions between variables. It
+  is primarily intended for \bold{numeric data}; categorical variables
+  should probably be converted to dummy variables using
+  \code{\link{step_dummy}} prior to being used for interactions.
+
+Unlike other step functions, the \code{terms} argument should be a
+  traditional R model formula but should contain no inline functions (e.g.
+  \code{log}). For example, for predictors \code{A}, \code{B}, and \code{C},
+  a formula such as \code{~A:B:C} can be used to make a three way
+  interaction between the variables. If the formula contains terms other
+  than interactions (e.g. \code{(A+B+C)^3}) only the interaction terms are
+  retained for the design matrix.
+
+The separator between the variables defaults to "\code{_x_}" so that the
+  three way interaction shown previously would generate a column named
+  \code{A_x_B_x_C}. This can be changed using the \code{sep} argument.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+int_mod_1 <- rec \%>\%
+  step_interact(terms = ~ carbon:hydrogen)
+
+int_mod_2 <- int_mod_1 \%>\%
+  step_interact(terms = ~ (oxygen + nitrogen + sulfur)^3)
+
+int_mod_1 <- prep(int_mod_1, training = biomass_tr)
+int_mod_2 <- prep(int_mod_2, training = biomass_tr)
+
+dat_1 <- bake(int_mod_1, biomass_te)
+dat_2 <- bake(int_mod_2, biomass_te)
+
+names(dat_1)
+names(dat_2)
+}
+\concept{
+preprocessing model_specification
+}
+\keyword{datagen}
diff --git a/man/step_intercept.Rd b/man/step_intercept.Rd
new file mode 100644
index 0000000..654ef55
--- /dev/null
+++ b/man/step_intercept.Rd
@@ -0,0 +1,58 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/intercept.R
+\name{step_intercept}
+\alias{step_intercept}
+\title{Add intercept (or constant) column}
+\usage{
+step_intercept(recipe, ..., role = "predictor", trained = FALSE,
+  name = "intercept", value = 1)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of
+operations for this recipe.}
+
+\item{...}{Argument ignored; included for consistency with other step
+specification functions.}
+
+\item{role}{Defaults to "predictor"}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing
+have been estimated. Again included for consistency.}
+
+\item{name}{Character name for new added column}
+
+\item{value}{A numeric constant to fill the intercept column. Defaults to 1.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_intercept} creates a \emph{specification} of a recipe step that
+  will add an intercept or constant term in the first column of a data
+  matrix. \code{step_intercept} has defaults to \emph{predictor} role so
+  that it is by default called in the bake step. Be careful to avoid
+  unintentional transformations when calling steps with
+  \code{all_predictors}.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+rec_trans <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) \%>\%
+  step_intercept(value = 2)
+
+rec_obj <- prep(rec_trans, training = biomass_tr)
+
+with_intercept <- bake(rec_obj, biomass_te)
+with_intercept
+
+}
+\seealso{
+\code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
diff --git a/man/step_invlogit.Rd b/man/step_invlogit.Rd
new file mode 100644
index 0000000..434a58c
--- /dev/null
+++ b/man/step_invlogit.Rd
@@ -0,0 +1,65 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/invlogit.R
+\name{step_invlogit}
+\alias{step_invlogit}
+\title{Inverse Logit Transformation}
+\usage{
+step_invlogit(recipe, ..., role = NA, trained = FALSE, columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{columns}{A character string of variable names that will be (eventually)
+populated by the \code{terms} argument.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_invlogit} creates a \emph{specification} of a recipe step that
+  will transform the data from real values to be between zero and one.
+}
+\details{
+The inverse logit transformation takes values on the real line and
+  translates them to be between zero and one using the function
+  \code{f(x) = 1/(1+exp(-x))}.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+ilogit_trans <- rec  \%>\%
+  step_center(carbon, hydrogen) \%>\%
+  step_scale(carbon, hydrogen) \%>\%
+  step_invlogit(carbon, hydrogen)
+
+ilogit_obj <- prep(ilogit_trans, training = biomass_tr)
+
+transformed_te <- bake(ilogit_obj, biomass_te)
+plot(biomass_te$carbon, transformed_te$carbon)
+}
+\seealso{
+\code{\link{step_logit}} \code{\link{step_log}}
+  \code{\link{step_sqrt}}  \code{\link{step_hyperbolic}}
+  \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_isomap.Rd b/man/step_isomap.Rd
new file mode 100644
index 0000000..57b0d6d
--- /dev/null
+++ b/man/step_isomap.Rd
@@ -0,0 +1,106 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/isomap.R
+\name{step_isomap}
+\alias{step_isomap}
+\title{Isomap Embedding}
+\usage{
+step_isomap(recipe, ..., role = "predictor", trained = FALSE, num = 5,
+  options = list(knn = 50, .mute = c("message", "output")), res = NULL,
+  prefix = "Isomap")
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+used to compute the dimensions. See \code{\link{selections}} for more
+details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new
+dimension columns created by the original variables will be used as
+predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{num}{The number of isomap dimensions to retain as new predictors. If
+\code{num} is greater than the number of columns or the number of
+possible dimensions, a smaller value will be used.}
+
+\item{options}{A list of options to \code{\link[dimRed]{Isomap}}.}
+
+\item{res}{The \code{\link[dimRed]{Isomap}} object is stored here once this
+preprocessing step has be trained by \code{\link{prep.recipe}}.}
+
+\item{prefix}{A character string that will be the prefix to the resulting
+new variables. See notes below}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_isomap} creates a \emph{specification} of a recipe step that will
+  convert numeric data into one or more new dimensions.
+}
+\details{
+Isomap is a form of multidimensional scaling (MDS). MDS methods
+  try to find a reduced set of dimensions such that the geometric distances
+  between the original data points are preserved. This version of MDS uses
+  nearest neighbors in the data as a method for increasing the fidelity of
+  the new dimensions to the original data values.
+
+It is advisable to center and scale the variables prior to running Isomap
+  (\code{step_center} and \code{step_scale} can be used for this purpose).
+
+The argument \code{num} controls the number of components that will be
+  retained (the original variables that are used to derive the components
+  are removed from the data). The new components will have names that begin
+  with \code{prefix} and a sequence of numbers. The variable names are
+  padded with zeros. For example, if \code{num < 10}, their names will be
+  \code{Isomap1} - \code{Isomap9}. If \code{num = 101}, the names would be
+  \code{Isomap001} - \code{Isomap101}.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+im_trans <- rec \%>\%
+  step_YeoJohnson(all_predictors()) \%>\%
+  step_center(all_predictors()) \%>\%
+  step_scale(all_predictors()) \%>\%
+  step_isomap(all_predictors(),
+              options = list(knn = 100),
+              num = 2)
+
+im_estimates <- prep(im_trans, training = biomass_tr)
+
+im_te <- bake(im_estimates, biomass_te)
+
+rng <- extendrange(c(im_te$Isomap1, im_te$Isomap2))
+plot(im_te$Isomap1, im_te$Isomap2,
+     xlim = rng, ylim = rng)
+}
+\references{
+De Silva, V., and Tenenbaum, J. B. (2003). Global versus local
+  methods in nonlinear dimensionality reduction. \emph{Advances in Neural
+  Information Processing Systems}. 721-728.
+
+\pkg{dimRed}, a framework for dimensionality reduction,
+  \url{https://github.com/gdkrmr}
+}
+\seealso{
+\code{\link{step_pca}} \code{\link{step_kpca}}
+  \code{\link{step_ica}} \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing isomap projection_methods
+}
+\keyword{datagen}
diff --git a/man/step_knnimpute.Rd b/man/step_knnimpute.Rd
new file mode 100644
index 0000000..e118460
--- /dev/null
+++ b/man/step_knnimpute.Rd
@@ -0,0 +1,105 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/knn_imp.R
+\name{step_knnimpute}
+\alias{step_knnimpute}
+\title{Imputation via K-Nearest Neighbors}
+\usage{
+step_knnimpute(recipe, ..., role = NA, trained = FALSE, K = 5,
+  impute_with = imp_vars(all_predictors()), ref_data = NULL,
+  columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose variables. For
+\code{step_knnimpute}, this indicates the variables to be imputed. When
+used with \code{imp_vars}, the dots indicates which variables are used to
+predict the missing data in each variable. See \code{\link{selections}}
+for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{K}{The number of neighbors.}
+
+\item{impute_with}{A call to \code{imp_vars} to specify which variables are
+used to impute the variables that can include specific variable names
+separated by commas or different selectors (see
+\code{\link{selections}}).  If a column is included in both lists to be
+imputed and to be an imputation predictor, it will be removed from the
+latter and not used to impute itself.}
+
+\item{ref_data}{A tibble of data that will reflect the data preprocessing
+done up to the point of this imputation step. This is
+\code{NULL} until the step is trained by \code{\link{prep.recipe}}.}
+
+\item{columns}{The column names that will be imputed and used for
+imputation. This is  \code{NULL} until the step is trained by
+\code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_knnimpute} creates a \emph{specification} of a recipe step that
+  will impute missing data using nearest neighbors.
+}
+\details{
+The step uses the training set to impute any other data sets. The
+  only distance function available is Gower's distance which can be used for
+  mixtures of nominal and numeric data.
+
+Once the nearest neighbors are determined, the mode is used to predictor
+  nominal variables and the mean is used for numeric data.
+
+Note that if a variable that is to be imputed is also in \code{impute_with},
+  this variable will be ignored.
+
+It is possible that missing values will still occur after imputation if a
+  large majority (or all) of the imputing variables are also missing.
+}
+\examples{
+library(recipes)
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training", ]
+biomass_te <- biomass[biomass$dataset == "Testing", ]
+biomass_te_whole <- biomass_te
+
+# induce some missing data at random
+set.seed(9039)
+carb_missing <- sample(1:nrow(biomass_te), 3)
+nitro_missing <- sample(1:nrow(biomass_te), 3)
+
+biomass_te$carbon[carb_missing] <- NA
+biomass_te$nitrogen[nitro_missing] <- NA
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+ratio_recipe <- rec \%>\%
+  step_knnimpute(all_predictors(), K = 3)
+ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr)
+imputed <- bake(ratio_recipe2, biomass_te)
+
+# how well did it work?
+summary(biomass_te_whole$carbon)
+cbind(before = biomass_te_whole$carbon[carb_missing],
+      after = imputed$carbon[carb_missing])
+
+summary(biomass_te_whole$nitrogen)
+cbind(before = biomass_te_whole$nitrogen[nitro_missing],
+      after = imputed$nitrogen[nitro_missing])
+}
+\references{
+Gower, C. (1971) "A general coefficient of similarity and some
+  of its properties," Biometrics, 857-871.
+}
+\concept{
+preprocessing imputation
+}
+\keyword{datagen}
diff --git a/man/step_kpca.Rd b/man/step_kpca.Rd
new file mode 100644
index 0000000..f59d4be
--- /dev/null
+++ b/man/step_kpca.Rd
@@ -0,0 +1,118 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/kpca.R
+\name{step_kpca}
+\alias{step_kpca}
+\title{Kernel PCA Signal Extraction}
+\usage{
+step_kpca(recipe, ..., role = "predictor", trained = FALSE, num = 5,
+  res = NULL, options = list(kernel = "rbfdot", kpar = list(sigma = 0.2)),
+  prefix = "kPC")
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+used to compute the components. See \code{\link{selections}} for more
+details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new principal
+component columns created by the original variables will be used as
+predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{num}{The number of PCA components to retain as new predictors. If
+\code{num} is greater than the number of columns or the number of possible
+components, a smaller value will be used.}
+
+\item{res}{An S4 \code{\link[kernlab]{kpca}} object is stored here once this
+preprocessing step has be trained by \code{\link{prep.recipe}}.}
+
+\item{options}{A list of options to \code{\link[kernlab]{kpca}}. Defaults
+are set for the arguments \code{kernel} and \code{kpar} but others can be
+passed in. \bold{Note} that the arguments \code{x} and \code{features}
+should not be passed here (or at all).}
+
+\item{prefix}{A character string that will be the prefix to the resulting
+new variables. See notes below.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_kpca} a \emph{specification} of a recipe step that will convert
+  numeric data into one or more principal components using a kernel basis
+  expansion.
+}
+\details{
+Kernel principal component analysis (kPCA) is an extension a PCA
+  analysis that conducts the calculations in a broader dimensionality
+  defined by a kernel function. For example, if a quadratic kernel function
+  were used, each variable would be represented by its original values as
+  well as its square. This nonlinear mapping is used  during the PCA
+  analysis and can potentially help find better representations of the
+  original data.
+
+As with ordinary PCA, it is important to standardized the variables prior
+  to running PCA (\code{step_center} and \code{step_scale} can be used for
+  this purpose).
+
+When performing kPCA, the kernel function (and any important kernel
+  parameters) must be chosen. The \pkg{kernlab} package is used and the
+  reference below discusses the types of kernels available and their
+  parameter(s). These specifications can be made in the \code{kernel} and
+  \code{kpar} slots of the \code{options} argument to \code{step_kpca}.
+
+The argument \code{num} controls the number of components that will be
+  retained (the original variables that are used to derive the components
+  are removed from the data). The new components will have names that begin
+  with \code{prefix} and a sequence of numbers. The variable names are
+  padded with zeros. For example, if \code{num < 10}, their names will be
+  \code{kPC1} - \code{kPC9}. If \code{num = 101}, the names would be
+  \code{kPC001} - \code{kPC101}.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+kpca_trans <- rec \%>\%
+  step_YeoJohnson(all_predictors()) \%>\%
+  step_center(all_predictors()) \%>\%
+  step_scale(all_predictors()) \%>\%
+  step_kpca(all_predictors())
+
+kpca_estimates <- prep(kpca_trans, training = biomass_tr)
+
+kpca_te <- bake(kpca_estimates, biomass_te)
+
+rng <- extendrange(c(kpca_te$kPC1, kpca_te$kPC2))
+plot(kpca_te$kPC1, kpca_te$kPC2,
+     xlim = rng, ylim = rng)
+}
+\references{
+Scholkopf, B., Smola, A., and Muller, K. (1997). Kernel
+  principal component analysis. \emph{Lecture Notes in Computer Science},
+  1327, 583-588.
+
+Karatzoglou, K., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab -
+  An S4 package for kernel methods in R. \emph{Journal of Statistical
+  Software}, 11(1), 1-20.
+}
+\seealso{
+\code{\link{step_pca}} \code{\link{step_ica}}
+  \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing pca projection_methods kernel_methods
+}
+\keyword{datagen}
diff --git a/man/step_lincomb.Rd b/man/step_lincomb.Rd
new file mode 100644
index 0000000..40f3e7f
--- /dev/null
+++ b/man/step_lincomb.Rd
@@ -0,0 +1,74 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/lincombo.R
+\name{step_lincomb}
+\alias{step_lincomb}
+\title{Linear Combination Filter}
+\usage{
+step_lincomb(recipe, ..., role = NA, trained = FALSE, max_steps = 5,
+  removals = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{max_steps}{A value .}
+
+\item{removals}{A character string that contains the names of columns that
+should be removed. These values are not determined until
+\code{\link{prep.recipe}} is called.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_lincomb} creates a \emph{specification} of a recipe step that
+  will potentially remove numeric variables that have linear combinations
+  between them.
+}
+\details{
+This step finds exact linear combinations between two or more
+  variables and recommends which column(s) should be removed to resolve the
+  issue. This algorithm may need to be applied multiple times (as defined
+  by \code{max_steps}).
+}
+\examples{
+data(biomass)
+
+biomass$new_1 <- with(biomass,
+                      .1*carbon - .2*hydrogen + .6*sulfur)
+biomass$new_2 <- with(biomass,
+                      .5*carbon - .2*oxygen + .6*nitrogen)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen +
+                sulfur + new_1 + new_2,
+              data = biomass_tr)
+
+lincomb_filter <- rec \%>\%
+  step_lincomb(all_predictors())
+  
+prep(lincomb_filter, training = biomass_tr)
+}
+\seealso{
+\code{\link{step_nzv}}\code{\link{step_corr}}
+  \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\author{
+Max Kuhn, Kirk Mettler, and Jed Wing
+}
+\concept{
+preprocessing variable_filters
+}
+\keyword{datagen}
diff --git a/man/step_log.Rd b/man/step_log.Rd
new file mode 100644
index 0000000..58a74ed
--- /dev/null
+++ b/man/step_log.Rd
@@ -0,0 +1,59 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/log.R
+\name{step_log}
+\alias{step_log}
+\title{Logarithmic Transformation}
+\usage{
+step_log(recipe, ..., role = NA, trained = FALSE, base = exp(1),
+  columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{base}{A numeric value for the base.}
+
+\item{columns}{A character string of variable names that will be (eventually)
+populated by the \code{terms} argument.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_log} creates a \emph{specification} of a recipe step that will
+  log transform data.
+}
+\examples{
+set.seed(313)
+examples <- matrix(exp(rnorm(40)), ncol = 2)
+examples <- as.data.frame(examples)
+
+rec <- recipe(~ V1 + V2, data = examples)
+
+log_trans <- rec  \%>\%
+  step_log(all_predictors())
+
+log_obj <- prep(log_trans, training = examples)
+
+transformed_te <- bake(log_obj, examples)
+plot(examples$V1, transformed_te$V1)
+}
+\seealso{
+\code{\link{step_logit}} \code{\link{step_invlogit}}
+  \code{\link{step_hyperbolic}}  \code{\link{step_sqrt}}
+  \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_logit.Rd b/man/step_logit.Rd
new file mode 100644
index 0000000..fca542f
--- /dev/null
+++ b/man/step_logit.Rd
@@ -0,0 +1,60 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/logit.R
+\name{step_logit}
+\alias{step_logit}
+\title{Logit Transformation}
+\usage{
+step_logit(recipe, ..., role = NA, trained = FALSE, columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{columns}{A character string of variable names that will be (eventually)
+populated by the \code{terms} argument.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_logit} creates a \emph{specification} of a recipe step that will
+  logit transform the data.
+}
+\details{
+The inverse logit transformation takes values between zero and one
+  and translates them to be on the real line using the function
+  \code{f(p) = log(p/(1-p))}.
+}
+\examples{
+set.seed(313)
+examples <- matrix(runif(40), ncol = 2)
+examples <- data.frame(examples)
+
+rec <- recipe(~ X1 + X2, data = examples)
+
+logit_trans <- rec  \%>\%
+  step_logit(all_predictors())
+
+logit_obj <- prep(logit_trans, training = examples)
+
+transformed_te <- bake(logit_obj, examples)
+plot(examples$X1, transformed_te$X1)
+}
+\seealso{
+\code{\link{step_invlogit}} \code{\link{step_log}}
+\code{\link{step_sqrt}}  \code{\link{step_hyperbolic}} \code{\link{recipe}}
+\code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_meanimpute.Rd b/man/step_meanimpute.Rd
new file mode 100644
index 0000000..44e5cbe
--- /dev/null
+++ b/man/step_meanimpute.Rd
@@ -0,0 +1,72 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/meanimpute.R
+\name{step_meanimpute}
+\alias{step_meanimpute}
+\title{Impute Numeric Data Using the Mean}
+\usage{
+step_meanimpute(recipe, ..., role = NA, trained = FALSE, means = NULL,
+  trim = 0)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{means}{A named numeric vector of means. This is \code{NULL} until
+computed by \code{\link{prep.recipe}}.}
+
+\item{trim}{The fraction (0 to 0.5) of observations to be trimmed from each
+end of the variables before the mean is computed. Values of trim outside
+that range are taken as the nearest endpoint.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_meanimpute} creates a \emph{specification} of a recipe step that
+  will substitute missing values of numeric variables by the training set
+  mean of those variables.
+}
+\details{
+\code{step_meanimpute} estimates the variable means from the data
+  used in the \code{training} argument of \code{prep.recipe}.
+  \code{bake.recipe} then applies the new values to new data sets using
+  these averages.
+}
+\examples{
+data("credit_data")
+
+## missing data per column
+vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
+
+set.seed(342)
+in_training <- sample(1:nrow(credit_data), 2000)
+
+credit_tr <- credit_data[ in_training, ]
+credit_te <- credit_data[-in_training, ]
+missing_examples <- c(14, 394, 565)
+
+rec <- recipe(Price ~ ., data = credit_tr)
+
+impute_rec <- rec \%>\%
+  step_meanimpute(Income, Assets, Debt)
+
+imp_models <- prep(impute_rec, training = credit_tr)
+
+imputed_te <- bake(imp_models, newdata = credit_te, everything())
+
+credit_te[missing_examples,]
+imputed_te[missing_examples, names(credit_te)]
+}
+\concept{
+preprocessing imputation
+}
+\keyword{datagen}
diff --git a/man/step_modeimpute.Rd b/man/step_modeimpute.Rd
new file mode 100644
index 0000000..fda149b
--- /dev/null
+++ b/man/step_modeimpute.Rd
@@ -0,0 +1,67 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/modeimpute.R
+\name{step_modeimpute}
+\alias{step_modeimpute}
+\title{Impute Nominal Data Using the Most Common Value}
+\usage{
+step_modeimpute(recipe, ..., role = NA, trained = FALSE, modes = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{modes}{A named character vector of modes. This is \code{NULL} until
+computed by \code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_modeimpute} creates a \emph{specification} of a recipe step that
+  will substitute missing values of nominal variables by the training set
+  mode of those variables.
+}
+\details{
+\code{step_modeimpute} estimates the variable modes from the data
+  used in the \code{training} argument of \code{prep.recipe}.
+  \code{bake.recipe} then applies the new values to new data sets using
+  these values. If the training set data has more than one mode, one is
+  selected at random.
+}
+\examples{
+data("credit_data")
+
+## missing data per column
+vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
+
+set.seed(342)
+in_training <- sample(1:nrow(credit_data), 2000)
+
+credit_tr <- credit_data[ in_training, ]
+credit_te <- credit_data[-in_training, ]
+missing_examples <- c(14, 394, 565)
+
+rec <- recipe(Price ~ ., data = credit_tr)
+
+impute_rec <- rec \%>\%
+  step_modeimpute(Status, Home, Marital)
+
+imp_models <- prep(impute_rec, training = credit_tr)
+
+imputed_te <- bake(imp_models, newdata = credit_te, everything())
+
+table(credit_te$Home, imputed_te$Home, useNA = "always")
+}
+\concept{
+preprocessing imputation
+}
+\keyword{datagen}
diff --git a/man/step_ns.Rd b/man/step_ns.Rd
new file mode 100644
index 0000000..7f51f58
--- /dev/null
+++ b/man/step_ns.Rd
@@ -0,0 +1,70 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/ns.R
+\name{step_ns}
+\alias{step_ns}
+\title{Nature Spline Basis Functions}
+\usage{
+step_ns(recipe, ..., role = "predictor", trained = FALSE, objects = NULL,
+  options = list(df = 2))
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new columns
+created from the original variables will be used as predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{objects}{A list of \code{\link[splines]{ns}} objects created once the
+step has been trained.}
+
+\item{options}{A list of options for \code{\link[splines]{ns}} which should
+not include \code{x}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_ns} creates a \emph{specification} of a recipe step that will
+  create new columns that are basis expansions of variables using natural
+  splines.
+}
+\details{
+\code{step_ns} can new features from a single variable that enable
+  fitting routines to model this variable in a nonlinear manner. The extent
+  of the possible nonlinearity is determined by the \code{df} or \code{knot}
+  arguments of \code{\link[splines]{ns}}. The original variables are
+  removed from the data and new columns are added. The naming convention
+  for the new variables is \code{varname_ns_1} and so on.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+with_splines <- rec \%>\%
+  step_ns(carbon, hydrogen)
+with_splines <- prep(with_splines, training = biomass_tr)
+
+expanded <- bake(with_splines, biomass_te)
+expanded
+}
+\seealso{
+\code{\link{step_poly}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing basis_expansion
+}
+\keyword{datagen}
diff --git a/man/step_nzv.Rd b/man/step_nzv.Rd
new file mode 100644
index 0000000..de28d9e
--- /dev/null
+++ b/man/step_nzv.Rd
@@ -0,0 +1,86 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/nzv.R
+\name{step_nzv}
+\alias{step_nzv}
+\title{Near-Zero Variance Filter}
+\usage{
+step_nzv(recipe, ..., role = NA, trained = FALSE, options = list(freq_cut
+  = 95/5, unique_cut = 10), removals = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables that
+will evaluated by the filtering bake. See \code{\link{selections}} for
+more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{options}{A list of options for the filter (see Details below).}
+
+\item{removals}{A character string that contains the names of columns that
+should be removed. These values are not determined until
+\code{\link{prep.recipe}} is called.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_nzv} creates a \emph{specification} of a recipe step that will
+  potentially remove variables that are highly sparse and unbalanced.
+}
+\details{
+This step diagnoses predictors that have one unique value (i.e.
+  are zero variance predictors) or predictors that are have both of the
+  following characteristics:
+\enumerate{
+  \item they have very few unique values relative to the number of samples
+    and
+  \item the ratio of the frequency of the most common value to the
+    frequency of the second most common value is large.
+}
+
+For example, an example of near zero variance predictor is one that, for
+  1000 samples, has two distinct values and 999 of them are a single value.
+
+To be flagged, first the frequency of the most prevalent value over the
+  second most frequent value (called the "frequency ratio") must be above
+  \code{freq_cut}. Secondly, the "percent of unique values," the number of
+  unique values divided by the total number of samples (times 100), must
+  also be below \code{unique_cut}.
+
+In the above example, the frequency ratio is 999 and the unique value
+  percentage is 0.0001.
+}
+\examples{
+data(biomass)
+
+biomass$sparse <- c(1, rep(0, nrow(biomass) - 1))
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + sparse,
+              data = biomass_tr)
+
+nzv_filter <- rec \%>\%
+  step_nzv(all_predictors())
+
+filter_obj <- prep(nzv_filter, training = biomass_tr)
+
+filtered_te <- bake(filter_obj, biomass_te)
+any(names(filtered_te) == "sparse")
+}
+\seealso{
+\code{\link{step_corr}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing variable_filters
+}
+\keyword{datagen}
diff --git a/man/step_ordinalscore.Rd b/man/step_ordinalscore.Rd
new file mode 100644
index 0000000..2b184e4
--- /dev/null
+++ b/man/step_ordinalscore.Rd
@@ -0,0 +1,73 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/ordinalscore.R
+\name{step_ordinalscore}
+\alias{step_ordinalscore}
+\title{Convert Ordinal Factors to Numeric Scores}
+\usage{
+step_ordinalscore(recipe, ..., role = NA, trained = FALSE, columns = NULL,
+  convert = as.numeric)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{columns}{A character string of variables that will be converted. This is \code{NULL}
+until computed by \code{\link{prep.recipe}}.}
+
+\item{convert}{A function that takes an ordinal factor vector as an input and outputs a single numeric variable.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_ordinalscore} creates a \emph{specification} of a recipe step that
+  will convert ordinal factor variables into numeric scores.
+}
+\details{
+Dummy variables from ordered factors with \code{C} levels will create polynomial basis functions with \code{C-1} terms. As an alternative, this step can be used to translate the ordered levels into a single numeric vector of values that represent (subjective) scores. By default, the translation uses a linear scale (1, 2, 3, ... \code{C}) but custom score functions can also be used (see the example below).
+}
+\examples{
+fail_lvls <- c("meh", "annoying", "really_bad")
+
+ord_data <- 
+  data.frame(item = c("paperclip", "twitter", "airbag"),
+             fail_severity = factor(fail_lvls,
+                                    levels = fail_lvls,
+                                    ordered = TRUE))
+
+model.matrix(~fail_severity, data = ord_data)
+
+linear_values <- recipe(~ item + fail_severity, data = ord_data) \%>\%
+  step_dummy(item) \%>\%
+  step_ordinalscore(fail_severity)
+
+linear_values <- prep(linear_values, training = ord_data, retain = TRUE)
+
+juice(linear_values, everything())
+
+custom <- function(x) {
+  new_values <- c(1, 3, 7)
+  new_values[as.numeric(x)]
+}
+
+nonlin_scores <- recipe(~ item + fail_severity, data = ord_data) \%>\%
+  step_dummy(item) \%>\%
+  step_ordinalscore(fail_severity, convert = custom)
+
+nonlin_scores <- prep(nonlin_scores, training = ord_data, retain = TRUE)
+
+juice(nonlin_scores, everything())
+}
+\concept{
+preprocessing ordinal_data
+}
+\keyword{datagen}
diff --git a/man/step_other.Rd b/man/step_other.Rd
new file mode 100644
index 0000000..8a6b7a8
--- /dev/null
+++ b/man/step_other.Rd
@@ -0,0 +1,76 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/other.R
+\name{step_other}
+\alias{step_other}
+\title{Collapse Some Categorical Levels}
+\usage{
+step_other(recipe, ..., role = NA, trained = FALSE, threshold = 0.05,
+  other = "other", objects = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables that
+will potentially be reduced. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{threshold}{A single numeric value in (0, 1) for pooling.}
+
+\item{other}{A single character value for the "other" category.}
+
+\item{objects}{A list of objects that contain the information to pool
+infrequent levels that is determined by \code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_other} creates a \emph{specification} of a recipe step that will
+   potentially pool infrequently occurring values into an "other" category.
+}
+\details{
+The overall proportion of the categories are computed. The "other"
+  category is used in place of any categorical levels whose individual
+  proportion in the training set is less than \code{threshold}.
+
+If no pooling is done the data are unmodified (although character data may
+  be changed to factors based on the value of \code{stringsAsFactors} in
+  \code{\link{prep.recipe}}). Otherwise, a factor is always returned with
+  different factor levels.
+
+If \code{threshold} is less than the largest category proportion, all levels
+  except for the most frequent are collapsed to the \code{other} level.
+
+If the retained categories include the value of \code{other}, an error is
+  thrown. If \code{other} is in the list of discarded levels, no error
+  occurs.
+}
+\examples{
+data(okc)
+
+set.seed(19)
+in_train <- sample(1:nrow(okc), size = 30000)
+
+okc_tr <- okc[ in_train,]
+okc_te <- okc[-in_train,]
+
+rec <- recipe(~ diet + location, data = okc_tr)
+
+
+rec <- rec \%>\%
+  step_other(diet, location, threshold = .1, other = "other values")
+rec <- prep(rec, training = okc_tr)
+
+collapsed <- bake(rec, okc_te)
+table(okc_te$diet, collapsed$diet, useNA = "always")
+}
+\concept{
+preprocessing factors
+}
+\keyword{datagen}
diff --git a/man/step_pca.Rd b/man/step_pca.Rd
new file mode 100644
index 0000000..2dc324a
--- /dev/null
+++ b/man/step_pca.Rd
@@ -0,0 +1,113 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/pca.R
+\name{step_pca}
+\alias{step_pca}
+\title{PCA Signal Extraction}
+\usage{
+step_pca(recipe, ..., role = "predictor", trained = FALSE, num = 5,
+  threshold = NA, options = list(), res = NULL, prefix = "PC")
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+used to compute the components. See \code{\link{selections}} for more
+details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new principal
+component columns created by the original variables will be used as
+predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{num}{The number of PCA components to retain as new predictors. If
+\code{num} is greater than the number of columns or the number of
+possible components, a smaller value will be used.}
+
+\item{threshold}{A fraction of the total variance that should be covered
+by the components. For example, \code{threshold = .75} means that
+\code{step_pca} should generate enough components to capture 75\% of the
+variability in the variables. Note: using this argument will override and
+resent any value given to \code{num}.}
+
+\item{options}{A list of options to the default method for
+\code{\link[stats]{prcomp}}. Argument defaults are set to
+\code{retx = FALSE}, \code{center = FALSE}, \code{scale. = FALSE}, and
+\code{tol = NULL}. \bold{Note} that the argument \code{x} should not be
+passed here (or at all).}
+
+\item{res}{The \code{\link[stats]{prcomp.default}} object is stored here
+once this preprocessing step has be trained by \code{\link{prep.recipe}}.}
+
+\item{prefix}{A character string that will be the prefix to the resulting
+new variables. See notes below}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_pca} creates a \emph{specification} of a recipe step that will
+  convert numeric data into one or more principal components.
+}
+\details{
+Principal component analysis (PCA) is a transformation of a group of
+  variables that produces a new set of artificial features or components.
+  These components are designed to capture the maximum amount of information
+  (i.e. variance) in the original variables. Also, the components are
+  statistically independent from one another. This means that they can be
+  used to combat large inter-variables correlations in a data set.
+
+It is advisable to standardized the variables prior to running PCA. Here,
+  each variable will be centered and scaled prior to the PCA calculation.
+  This can be changed using the \code{options} argument or by using
+  \code{\link{step_center}} and \code{\link{step_scale}}.
+
+The argument \code{num} controls the number of components that will be
+  retained (the original variables that are used to derive the components
+  are removed from the data). The new components will have names that begin
+  with \code{prefix} and a sequence of numbers. The variable names are
+  padded with zeros. For example, if \code{num < 10}, their names will be
+  \code{PC1} - \code{PC9}. If \code{num = 101}, the names would be
+  \code{PC001} - \code{PC101}.
+
+Alternatively, \code{threshold} can be used to determine the number of
+  components that are required to capture a specified fraction of the total
+  variance in the variables.
+}
+\examples{
+rec <- recipe( ~ ., data = USArrests)
+pca_trans <- rec \%>\%
+  step_center(all_numeric()) \%>\%
+  step_scale(all_numeric()) \%>\%
+  step_pca(all_numeric(), num = 3)
+pca_estimates <- prep(pca_trans, training = USArrests)
+pca_data <- bake(pca_estimates, USArrests)
+
+rng <- extendrange(c(pca_data$PC1, pca_data$PC2))
+plot(pca_data$PC1, pca_data$PC2,
+     xlim = rng, ylim = rng)
+
+with_thresh <- rec \%>\%
+  step_center(all_numeric()) \%>\%
+  step_scale(all_numeric()) \%>\%
+  step_pca(all_numeric(), threshold = .99)
+with_thresh <- prep(with_thresh, training = USArrests)
+bake(with_thresh, USArrests)
+}
+\references{
+Jolliffe, I. T. (2010). \emph{Principal Component Analysis}.
+  Springer.
+}
+\seealso{
+\code{\link{step_ica}} \code{\link{step_kpca}}
+  \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}}
+  \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing pca projection_methods
+}
+\keyword{datagen}
diff --git a/man/step_poly.Rd b/man/step_poly.Rd
new file mode 100644
index 0000000..6318dba
--- /dev/null
+++ b/man/step_poly.Rd
@@ -0,0 +1,72 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/poly.R
+\name{step_poly}
+\alias{step_poly}
+\title{Orthogonal Polynomial Basis Functions}
+\usage{
+step_poly(recipe, ..., role = "predictor", trained = FALSE,
+  objects = NULL, options = list(degree = 2))
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new columns
+created from the original variables will be used as predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{objects}{A list of \code{\link[stats]{poly}} objects created once the
+step has been trained.}
+
+\item{options}{A list of options for  \code{\link[stats]{poly}} which should
+not include \code{x} or \code{simple}. Note that the option
+\code{raw = TRUE} will produce the regular polynomial values (not
+orthogonalized).}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_poly} creates a \emph{specification} of a recipe step that will
+  create new columns that are basis expansions of variables using orthogonal
+  polynomials.
+}
+\details{
+\code{step_poly} can new features from a single variable that
+  enable fitting routines to model this variable in a nonlinear manner. The
+  extent of the possible nonlinearity is determined by the \code{degree}
+  argument of  \code{\link[stats]{poly}}. The original variables are
+  removed from the data and new columns are added. The naming convention
+  for the new variables is \code{varname_poly_1} and so on.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+quadratic <- rec \%>\%
+  step_poly(carbon, hydrogen)
+quadratic <- prep(quadratic, training = biomass_tr)
+
+expanded <- bake(quadratic, biomass_te)
+expanded
+}
+\seealso{
+\code{\link{step_ns}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing basis_expansion
+}
+\keyword{datagen}
diff --git a/man/step_range.Rd b/man/step_range.Rd
new file mode 100644
index 0000000..0a0d3d7
--- /dev/null
+++ b/man/step_range.Rd
@@ -0,0 +1,67 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/range.R
+\name{step_range}
+\alias{step_range}
+\title{Scaling Numeric Data to a Specific Range}
+\usage{
+step_range(recipe, ..., role = NA, trained = FALSE, min = 0, max = 1,
+  ranges = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+scaled. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{min}{A single numeric value for the smallest value in the range}
+
+\item{max}{A single numeric value for the largest value in the range}
+
+\item{ranges}{A character vector of variables that will be normalized. Note
+that this is ignored until the values are determined by
+\code{\link{prep.recipe}}. Setting this value will be ineffective.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_range} creates a \emph{specification} of a recipe step that will
+  normalize numeric data to have a standard deviation of one.
+}
+\details{
+Scaling data means that the standard deviation of a variable is
+  divided out of the data. \code{step_range} estimates the variable standard
+  deviations from the data used in the \code{training} argument of
+  \code{prep.recipe}. \code{bake.recipe} then applies the scaling to new
+  data sets using these standard deviations.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+ranged_trans <- rec \%>\%
+  step_range(carbon, hydrogen)
+
+ranged_obj <- prep(ranged_trans, training = biomass_tr)
+
+transformed_te <- bake(ranged_obj, biomass_te)
+
+biomass_te[1:10, names(transformed_te)]
+transformed_te
+}
+\concept{
+preprocessing normalization_methods
+}
+\keyword{datagen}
diff --git a/man/step_ratio.Rd b/man/step_ratio.Rd
new file mode 100644
index 0000000..39e66e7
--- /dev/null
+++ b/man/step_ratio.Rd
@@ -0,0 +1,78 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/ratio.R
+\name{step_ratio}
+\alias{step_ratio}
+\alias{denom_vars}
+\title{Ratio Variable Creation}
+\usage{
+step_ratio(recipe, ..., role = "predictor", trained = FALSE,
+  denom = denom_vars(), naming = function(numer, denom)
+  make.names(paste(numer, denom, sep = "_o_")), columns = NULL)
+
+denom_vars(...)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will
+be used in the \emph{numerator} of the ratio. When used with
+\code{denom_vars}, the dots indicates which variables are used in the
+\emph{denominator}. See \code{\link{selections}} for more details.}
+
+\item{role}{For terms created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the newly created
+ratios created by the original variables will be used as
+predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{denom}{A call to \code{denom_vars} to specify which variables are
+used in the denominator that can include specific variable names
+separated by commas or different selectors (see
+\code{\link{selections}}).  If a column is included in both lists to be
+numerator and denominator, it will be removed from the listing.}
+
+\item{naming}{A function that defines the naming convention for new ratio
+columns.}
+
+\item{columns}{The column names used in the ratios. This argument is
+not populated until \code{\link{prep.recipe}} is executed.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_ratio} creates a a \emph{specification} of a recipe step that
+  will create one or more ratios out of numeric variables.
+}
+\examples{
+library(recipes)
+data(biomass)
+
+biomass$total <- apply(biomass[, 3:7], 1, sum)
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + 
+                    sulfur + total,
+              data = biomass_tr)
+
+ratio_recipe <- rec \%>\%
+  # all predictors over total
+  step_ratio(all_predictors(), denom = denom_vars(total)) \%>\%
+  # get rid of the original predictors 
+  step_rm(all_predictors(), -matches("_o_"))
+  
+
+ratio_recipe <- prep(ratio_recipe, training = biomass_tr)
+
+ratio_data <- bake(ratio_recipe, biomass_te)
+ratio_data
+}
+\concept{
+preprocessing
+}
+\keyword{datagen}
diff --git a/man/step_regex.Rd b/man/step_regex.Rd
new file mode 100644
index 0000000..783baa3
--- /dev/null
+++ b/man/step_regex.Rd
@@ -0,0 +1,66 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/regex.R
+\name{step_regex}
+\alias{step_regex}
+\title{Create Dummy Variables using Regular Expressions}
+\usage{
+step_regex(recipe, ..., role = "predictor", trained = FALSE,
+  pattern = ".", options = list(), result = make.names(pattern),
+  input = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{A single selector functions to choose which variable will be
+searched for the pattern. The selector should resolve into a single
+variable. See \code{\link{selections}} for more details.}
+
+\item{role}{For a variable created by this step, what analysis role should
+they be assigned?. By default, the function assumes that the new dummy
+variable column created by the original variable will be used as a
+predictors in a model.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{pattern}{A character string containing a regular expression (or
+character string for \code{fixed = TRUE}) to be matched in the given
+character vector. Coerced by \code{as.character} to a character string
+if possible.}
+
+\item{options}{A list of options to \code{\link{grepl}} that should not
+include \code{x} or \code{pattern}.}
+
+\item{result}{A single character value for the name of the new variable. It
+should be a valid column name.}
+
+\item{input}{A single character value for the name of the variable being
+searched. This is \code{NULL} until computed by
+\code{\link{prep.recipe}}.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_regex} creates a \emph{specification} of a recipe step that will
+  create a new dummy variable based on a regular expression.
+}
+\examples{
+data(covers)
+
+rec <- recipe(~ description, covers) \%>\%
+  step_regex(description, pattern = "(rock|stony)", result = "rocks") \%>\%
+  step_regex(description, pattern = "ratake families")
+
+rec2 <- prep(rec, training = covers)
+rec2
+
+with_dummies <- bake(rec2, newdata = covers)
+with_dummies
+}
+\concept{
+preprocessing dummy_variables regular_expressions
+}
+\keyword{datagen}
diff --git a/man/step_rm.Rd b/man/step_rm.Rd
new file mode 100644
index 0000000..9449c32
--- /dev/null
+++ b/man/step_rm.Rd
@@ -0,0 +1,55 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/rm.R
+\name{step_rm}
+\alias{step_rm}
+\title{General Variable Filter}
+\usage{
+step_rm(recipe, ..., role = NA, trained = FALSE, removals = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables that
+will evaluated by the filtering bake. See \code{\link{selections}} for
+more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{removals}{A character string that contains the names of columns that
+should be removed. These values are not determined until
+\code{\link{prep.recipe}} is called.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_rm} creates a \emph{specification} of a recipe step that will
+  remove variables based on their name, type, or role.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+library(dplyr)
+smaller_set <- rec \%>\%
+  step_rm(contains("gen"))
+
+smaller_set <- prep(smaller_set, training = biomass_tr)
+
+filtered_te <- bake(smaller_set, biomass_te)
+filtered_te
+}
+\concept{
+preprocessing variable_filters
+}
+\keyword{datagen}
diff --git a/man/step_scale.Rd b/man/step_scale.Rd
new file mode 100644
index 0000000..6462d11
--- /dev/null
+++ b/man/step_scale.Rd
@@ -0,0 +1,65 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/scale.R
+\name{step_scale}
+\alias{step_scale}
+\title{Scaling Numeric Data}
+\usage{
+step_scale(recipe, ..., role = NA, trained = FALSE, sds = NULL,
+  na.rm = TRUE)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{sds}{A named numeric vector of standard deviations This is \code{NULL}
+until computed by \code{\link{prep.recipe}}.}
+
+\item{na.rm}{A logical value indicating whether \code{NA} values should be
+removed when computing the standard deviation.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_scale} creates a \emph{specification} of a recipe step that
+  will normalize numeric data to have a standard deviation of one.
+}
+\details{
+Scaling data means that the standard deviation of a variable is
+  divided out of the data. \code{step_scale} estimates the variable
+  standard deviations from the data used in the \code{training} argument of
+  \code{prep.recipe}. \code{bake.recipe} then applies the scaling to
+  new data sets using these standard deviations.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+scaled_trans <- rec \%>\%
+  step_scale(carbon, hydrogen)
+
+scaled_obj <- prep(scaled_trans, training = biomass_tr)
+
+transformed_te <- bake(scaled_obj, biomass_te)
+
+biomass_te[1:10, names(transformed_te)]
+transformed_te
+}
+\concept{
+preprocessing normalization_methods
+}
+\keyword{datagen}
diff --git a/man/step_shuffle.Rd b/man/step_shuffle.Rd
new file mode 100644
index 0000000..5b928fa
--- /dev/null
+++ b/man/step_shuffle.Rd
@@ -0,0 +1,48 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/shuffle.R
+\name{step_shuffle}
+\alias{step_shuffle}
+\title{Shuffle Variables}
+\usage{
+step_shuffle(recipe, ..., role = NA, trained = FALSE, columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will
+permuted. See \code{\link{selections}} for  more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{columns}{A character string that contains the names of columns that
+should be shuffled. These values are not determined until
+\code{\link{prep.recipe}} is called.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_shuffle} creates a \emph{specification} of a recipe step that will
+  randomly change the order of rows for selected variables.
+}
+\examples{
+integers <- data.frame(A = 1:12, B = 13:24, C = 25:36)
+
+library(dplyr)
+rec <- recipe(~ A + B + C, data = integers) \%>\%
+  step_shuffle(A, B)
+
+rand_set <- prep(rec, training = integers)
+
+set.seed(5377)
+bake(rand_set, integers)
+}
+\concept{
+preprocessing randomization permutation
+}
+\keyword{datagen}
diff --git a/man/step_spatialsign.Rd b/man/step_spatialsign.Rd
new file mode 100644
index 0000000..0b695d7
--- /dev/null
+++ b/man/step_spatialsign.Rd
@@ -0,0 +1,72 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/spatialsign.R
+\name{step_spatialsign}
+\alias{step_spatialsign}
+\title{Spatial Sign Preprocessing}
+\usage{
+step_spatialsign(recipe, ..., role = "predictor", trained = FALSE,
+  columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+used for the normalization. See \code{\link{selections}} for more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned?}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{columns}{A character string of variable names that will be (eventually)
+populated by the \code{terms} argument.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_spatialsign} is a \emph{specification} of a recipe step that
+  will convert numeric data into a projection on to a unit sphere.
+}
+\details{
+The spatial sign transformation projects the variables onto a unit
+  sphere and is related to global contrast normalization. The spatial sign
+  of a vector \code{w} is \code{w/norm(w)}.
+
+The variables should be centered and scaled prior to the computations.
+}
+\examples{
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+ss_trans <- rec \%>\%
+  step_center(carbon, hydrogen) \%>\%
+  step_scale(carbon, hydrogen) \%>\%
+  step_spatialsign(carbon, hydrogen)
+
+ss_obj <- prep(ss_trans, training = biomass_tr)
+
+transformed_te <- bake(ss_obj, biomass_te)
+
+plot(biomass_te$carbon, biomass_te$hydrogen)
+
+plot(transformed_te$carbon, transformed_te$hydrogen)
+}
+\references{
+Serneels, S., De Nolf, E., and Van Espen, P. (2006). Spatial
+  sign preprocessing: a simple way to impart moderate robustness to
+  multivariate estimators. \emph{Journal of Chemical Information and
+  Modeling}, 46(3), 1402-1409.
+}
+\concept{
+preprocessing projection_methods
+}
+\keyword{datagen}
diff --git a/man/step_sqrt.Rd b/man/step_sqrt.Rd
new file mode 100644
index 0000000..4e46e57
--- /dev/null
+++ b/man/step_sqrt.Rd
@@ -0,0 +1,55 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/sqrt.R
+\name{step_sqrt}
+\alias{step_sqrt}
+\title{Square Root Transformation}
+\usage{
+step_sqrt(recipe, ..., role = NA, trained = FALSE, columns = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables will be
+transformed. See \code{\link{selections}} for more details.}
+
+\item{role}{Not used by this step since no new variables are created.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{columns}{A character string of variable names that will be (eventually)
+populated by the \code{terms} argument.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_sqrt} creates a \emph{specification} of a recipe step that will
+  square root transform the data.
+}
+\examples{
+set.seed(313)
+examples <- matrix(rnorm(40)^2, ncol = 2)
+examples <- as.data.frame(examples)
+
+rec <- recipe(~ V1 + V2, data = examples)
+
+sqrt_trans <- rec  \%>\%
+  step_sqrt(all_predictors())
+
+sqrt_obj <- prep(sqrt_trans, training = examples)
+
+transformed_te <- bake(sqrt_obj, examples)
+plot(examples$V1, transformed_te$V1)
+}
+\seealso{
+\code{\link{step_logit}} \code{\link{step_invlogit}}
+  \code{\link{step_log}}  \code{\link{step_hyperbolic}} \code{\link{recipe}}
+  \code{\link{prep.recipe}} \code{\link{bake.recipe}}
+}
+\concept{
+preprocessing transformation_methods
+}
+\keyword{datagen}
diff --git a/man/step_window.Rd b/man/step_window.Rd
new file mode 100644
index 0000000..3e07dcc
--- /dev/null
+++ b/man/step_window.Rd
@@ -0,0 +1,113 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/window.R
+\name{step_window}
+\alias{step_window}
+\title{Moving Window Functions}
+\usage{
+step_window(recipe, ..., role = NA, trained = FALSE, size = 3,
+  na.rm = TRUE, statistic = "mean", columns = NULL, names = NULL)
+}
+\arguments{
+\item{recipe}{A recipe object. The step will be added to the sequence of 
+operations for this recipe.}
+
+\item{...}{One or more selector functions to choose which variables are 
+affected by the step. See \code{\link{selections}} for more details.}
+
+\item{role}{For model terms created by this step, what analysis role should
+they be assigned? If \code{names} is left to be \code{NULL}, the rolling
+statistics replace the original columns and the roles are left unchanged.
+If \code{names} is set, those new columns will have a role of \code{NULL}
+unless this argument has a value.}
+
+\item{trained}{A logical to indicate if the quantities for preprocessing 
+have been estimated.}
+
+\item{size}{An odd integer \code{>= 3} for the window size.}
+
+\item{na.rm}{A logical for whether missing values should be removed from the
+calculations within each window.}
+
+\item{statistic}{A character string for the type of statistic that should
+be calculated for each moving window. Possible values are: \code{'max'},
+\code{'mean'}, \code{'median'}, \code{'min'}, \code{'prod'}, \code{'sd'},
+\code{'sum'}, \code{'var'}}
+
+\item{columns}{A character string that contains the names of columns that
+should be processed. These values are not determined until
+\code{\link{prep.recipe}} is called.}
+
+\item{names}{An optional character string that is the same length of the
+number of terms selected by \code{terms}. If you are not sure what columns
+will be selected, use the \code{summary} function (see the example below).
+These will be the names of the new columns created by the step.}
+}
+\value{
+An updated version of \code{recipe} with the
+  new step added to the sequence of existing steps (if any).
+}
+\description{
+\code{step_window} creates a \emph{specification} of a recipe step that will
+  create new columns that are the results of functions that compute
+  statistics across moving windows.
+}
+\details{
+The calculations use a somewhat atypical method for handling the
+  beginning and end parts of the rolling statistics. The process starts
+  with the center justified window calculations and the beginning and
+  ending parts of the rolling values are determined using the first and
+  last rolling values, respectively. For example if a column \code{x} with
+  12 values is smoothed with a 5-point moving median, the first three
+  smoothed values are estimated by \code{median(x[1:5])} and the fourth
+  uses \code{median(x[2:6])}.
+}
+\examples{
+library(recipes)
+library(dplyr)
+library(rlang)
+library(ggplot2, quietly = TRUE)
+
+set.seed(5522)
+sim_dat <- data.frame(x1 = (20:100) / 10)
+n <- nrow(sim_dat)
+sim_dat$y1 <- sin(sim_dat$x1) + rnorm(n, sd = 0.1)
+sim_dat$y2 <- cos(sim_dat$x1) + rnorm(n, sd = 0.1)
+sim_dat$x2 <- runif(n)
+sim_dat$x3 <- rnorm(n)
+
+rec <- recipe(y1 + y2 ~ x1 + x2 + x3, data = sim_dat) \%>\%
+  step_window(starts_with("y"), size = 7, statistic = "median",
+              names = paste0("med_7pt_", 1:2),
+              role = "outcome") \%>\%
+  step_window(starts_with("y"),
+              names = paste0("mean_3pt_", 1:2),
+              role = "outcome")
+rec <- prep(rec, training = sim_dat)
+
+# If you aren't sure how to set the names, see which variables are selected
+# and the order that they are selected:
+terms_select(info = summary(rec), terms = quos(starts_with("y")))
+
+smoothed_dat <- bake(rec, sim_dat, everything())
+
+ggplot(data = sim_dat, aes(x = x1, y = y1)) +
+  geom_point() +
+  geom_line(data = smoothed_dat, aes(y = med_7pt_1)) +
+  geom_line(data = smoothed_dat, aes(y = mean_3pt_1), col = "red") +
+  theme_bw()
+
+# If you want to replace the selected variables with the rolling statistic
+# don't set `names`
+sim_dat$original <- sim_dat$y1
+rec <- recipe(y1 + y2 + original ~ x1 + x2 + x3, data = sim_dat) \%>\%
+  step_window(starts_with("y"))
+rec <- prep(rec, training = sim_dat)
+smoothed_dat <- bake(rec, sim_dat, everything())
+ggplot(smoothed_dat, aes(x = original, y = y1)) + 
+  geom_point() + 
+  theme_bw()
+}
+\concept{
+preprocessing moving_windows
+}
+\keyword{datagen}
diff --git a/man/summary.recipe.Rd b/man/summary.recipe.Rd
new file mode 100644
index 0000000..c339122
--- /dev/null
+++ b/man/summary.recipe.Rd
@@ -0,0 +1,40 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/recipe.R
+\name{summary.recipe}
+\alias{summary.recipe}
+\title{Summarize a Recipe}
+\usage{
+\method{summary}{recipe}(object, original = FALSE, ...)
+}
+\arguments{
+\item{object}{A \code{recipe} object}
+
+\item{original}{A logical: show the current set of variables or the original
+set when the recipe was defined.}
+
+\item{...}{further arguments passed to or from other methods (not currently
+used).}
+}
+\value{
+A tibble with columns \code{variable}, \code{type}, \code{role},
+  and \code{source}.
+}
+\description{
+This function prints the current set of variables/features and some of their
+  characteristics.
+}
+\details{
+Note that, until the recipe has been trained, the currrent and
+  original variables are the same.
+}
+\examples{
+rec <- recipe( ~ ., data = USArrests)
+summary(rec)
+rec <- step_pca(rec, all_numeric(), num = 3)
+summary(rec) # still the same since not yet trained
+rec <- prep(rec, training = USArrests)
+summary(rec)
+}
+\seealso{
+\code{\link{recipe}} \code{\link{prep.recipe}}
+}
diff --git a/man/terms_select.Rd b/man/terms_select.Rd
new file mode 100644
index 0000000..ae03508
--- /dev/null
+++ b/man/terms_select.Rd
@@ -0,0 +1,40 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/selections.R
+\name{terms_select}
+\alias{terms_select}
+\title{Select Terms in a Step Function.}
+\usage{
+terms_select(terms, info)
+}
+\arguments{
+\item{terms}{A list of formulas whose right-hand side contains quoted
+expressions. See \code{\link[rlang]{quos}} for examples.}
+
+\item{info}{A tibble with columns \code{variable}, \code{type}, \code{role},
+and \code{source} that represent the current state of the data. The
+function \code{\link{summary.recipe}} can be used to get this information
+from a recipe.}
+}
+\value{
+A character string of column names or an error of there are no
+  selectors or if no variables are selected.
+}
+\description{
+This function bakees the step function selectors and might be useful
+  when creating custom steps.
+}
+\examples{
+library(rlang)
+data(okc)
+rec <- recipe(~ ., data = okc)
+info <- summary(rec)
+terms_select(info = info, quos(all_predictors()))
+}
+\seealso{
+\code{\link{recipe}} \code{\link{summary.recipe}}
+  \code{\link{prep.recipe}}
+}
+\concept{
+preprocessing
+}
+\keyword{datagen}
diff --git a/tests/testthat.R b/tests/testthat.R
new file mode 100644
index 0000000..300c374
--- /dev/null
+++ b/tests/testthat.R
@@ -0,0 +1,6 @@
+library(testthat)
+library(recipes)
+
+test_check(package = "recipes")
+q("no")
+
diff --git a/tests/testthat/test-basics.R b/tests/testthat/test-basics.R
new file mode 100644
index 0000000..cffe5e0
--- /dev/null
+++ b/tests/testthat/test-basics.R
@@ -0,0 +1,45 @@
+library(testthat)
+context("Testing basic functionalities")
+library(tibble)
+
+library(recipes)
+data("biomass")
+
+test_that("Recipe correctly identifies output variable", {
+  raw_recipe <- recipe(HHV ~ ., data = biomass)
+  var_info <- raw_recipe$var_info
+  expect_true(is.tibble(var_info))
+  outcome_ind <- which(var_info$variable == "HHV")
+  expect_true(var_info$role[outcome_ind] == "outcome")
+  expect_true(all(var_info$role[-outcome_ind] == rep("predictor", ncol(biomass) - 1)))
+})
+
+test_that("Recipe fails on in-line functions", {
+  expect_error(recipe(HHV ~ log(nitrogen), data = biomass))
+  expect_error(recipe(HHV ~ (.)^2, data = biomass))
+  expect_error(recipe(HHV ~ nitrogen  + sulfur  + nitrogen:sulfur, data = biomass))
+  expect_error(recipe(HHV ~ nitrogen^2, data = biomass))
+})
+
+test_that("return character or factor values", {
+  raw_recipe <- recipe(HHV ~ ., data = biomass)
+  centered <- raw_recipe %>% 
+    step_center(carbon, hydrogen, oxygen, nitrogen, sulfur)
+  
+  centered_char <- prep(centered, training = biomass, stringsAsFactors = FALSE, retain = TRUE)
+  char_var <- bake(centered_char, newdata = head(biomass))
+  expect_equal(class(char_var$sample), "character")
+  
+  centered_fac <- prep(centered, training = biomass, stringsAsFactors = TRUE, retain = TRUE)
+  fac_var <- bake(centered_fac, newdata = head(biomass))
+  expect_equal(class(fac_var$sample), "factor")  
+  expect_equal(levels(fac_var$sample), sort(unique(biomass$sample)))  
+})
+
+
+test_that("Using prepare", {
+  expect_error(prepare(recipe(HHV ~ ., data = biomass), 
+                       training = biomass),
+               paste0("As of version 0.0.1.9006, used `prep` ",
+                      "instead of `prepare`"))
+})
diff --git a/tests/testthat/test_BoxCox.R b/tests/testthat/test_BoxCox.R
new file mode 100644
index 0000000..0480f1d
--- /dev/null
+++ b/tests/testthat/test_BoxCox.R
@@ -0,0 +1,58 @@
+library(testthat)
+library(recipes)
+
+n <- 20
+set.seed(1)
+ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)),
+                     x2 = 1/rnorm(n),
+                     x3 = rep(1:2, each = n/2),
+                     x4 = rexp(n))
+
+## from `car` package
+exp_lambda <- c(x1 = 0.2874304685, 
+                x2 = NA,
+                x3 = NA,
+                x4 = 0.06115365314)
+exp_dat <- structure(list(x1 = c(-0.48855792533959, 0.295526451871788, -0.66306066037752, 
+                                 2.18444062220084, 0.45714544418559, -0.650762952308473, 0.639934327981261, 
+                                 0.94795174900382, 0.745877376631664, -0.199443408020842, 2.05013184840922, 
+                                 0.526004196848377, -0.484073411411316, -1.5846209165316, 1.46827089088108, 
+                                 0.0555044880684726, 0.0848273579417863, 1.21733702306844, 1.05470177834901, 
+                                 0.76793945044649), 
+                          x2 = c(1.0881660755694, 1.27854953038913, 
+                                 13.4111208085756, -0.502676325196487, 1.61335666257264, -17.8161848705567, 
+                                 -6.41867035287092, -0.679924106156326, -2.09139367300257, 2.39267901359744, 
+                                 0.736008721758276, -9.72878791903891, 2.57950278065913, -18.5856192870844, 
+                                 -0.726185004156987, -2.40967012205861, -2.5362046143702, -16.8595975858421, 
+                                 0.909069940992826, 1.31031417340121), 
+                          x3 = c(1L, 1L, 1L, 1L, 
+                                 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
+                          x4 = c(-0.0299153493217198, -0.00545480495048682, -0.605467890118739, 
+                                 0.771791879612809, -0.763649380406524, 0.872804671752781, 1.38894407918253, 
+                                 -0.537364454265797, -0.482864603899052, -0.0227886234018179, 
+                                 -1.25797709152009, -0.995703197045091, 0.102163556869708, -0.246753343931442, 
+                                 -1.7395729395129, 0.104247324965852, -1.15077903230011, 0.48306309307708, 
+                                 1.99265865015763, -0.747338829803379)), 
+                     .Names = c("x1", "x2", "x3", "x4"), 
+                     row.names = c(NA, -20L), class = "data.frame")
+
+test_that('simple Box Cox', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_BoxCox(x1, x2, x3, x4)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+  
+  expect_equal(names(exp_lambda)[!is.na(exp_lambda)], names(rec_trained$steps[[1]]$lambdas))
+  expect_equal(exp_lambda[!is.na(exp_lambda)], rec_trained$steps[[1]]$lambdas, tol = .001)
+  expect_equal(as.matrix(exp_dat), as.matrix(rec_trans), tol = .05)
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_BoxCox(x1, x2, x3, x4)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_YeoJohnson.R b/tests/testthat/test_YeoJohnson.R
new file mode 100644
index 0000000..42628c0
--- /dev/null
+++ b/tests/testthat/test_YeoJohnson.R
@@ -0,0 +1,61 @@
+library(testthat)
+library(recipes)
+
+n <- 20
+set.seed(1)
+ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)),
+                     x2 = 1/rnorm(n),
+                     x3 = rep(1:2, each = n/2),
+                     x4 = rexp(n))
+
+## from `car` package
+exp_lambda <- c(x1 = -0.2727204451, 
+                x2 =  1.139292543,
+                x3 = NA,
+                x4 = -1.012702061)
+exp_dat <- structure(list(x1 = c(0.435993557749438, 0.754696454247318, 0.371327932207827, 
+                                 1.46113017436327, 0.82204097731098, 0.375761562702297, 0.89751975937422, 
+                                 1.02175936118846, 0.940739811377902, 0.54984302797741, 1.41856737837093, 
+                                 0.850587387615876, 0.437701618670981, 0.112174615510591, 1.21942112715274, 
+                                 0.654589551748501, 0.666780580127795, 1.12625135443351, 1.0636850911955, 
+                                 0.949680956411546), 
+                          x2 = c(1.15307873387121, 1.36532999080347, 
+                                 17.4648439780388, -0.487746797875704, 1.74452440065935, -13.3640721541574, 
+                                 -5.35805967319061, -0.653901985285932, -1.90735599477338, 2.65253432454371, 
+                                 0.76771137336975, -7.79484535687973, 2.87484976680907, -13.8738947581599, 
+                                 -0.696856395842167, -2.17745353101028, -2.28384276604207, -12.7261652971783, 
+                                 0.95585544349634, 1.40099012093008), 
+                          x3 = c(1L, 1L, 1L, 1L, 1L, 
+                                 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
+                          x4 = c(0.49061104973894, 0.49670370366879, 0.338742419511653, 
+                                 0.663722100577351, 0.296260662322359, 0.681346128666408, 
+                                 0.757581280603711, 0.357148961119583, 0.371872889850153, 
+                                 0.49239057672598, 0.173259524331095, 0.235933290139909, 0.52297977893566, 
+                                 0.434927187456966, 0.0822501770191215, 0.523479652016858, 
+                                 0.197977570919824, 0.608108816144845, 0.821913792446345, 
+                                 0.300608495427594)), 
+                     .Names = c("x1", "x2", "x3", "x4"), 
+                     row.names = c(NA, 
+                                   -20L), 
+                     class = "data.frame")
+
+test_that('simple YJ trans', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_YeoJohnson(x1, x2, x3, x4)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+  
+  expect_equal(names(exp_lambda)[!is.na(exp_lambda)], names(rec_trained$steps[[1]]$lambdas))
+  expect_equal(exp_lambda[!is.na(exp_lambda)], rec_trained$steps[[1]]$lambdas, tol = .001)
+  expect_equal(as.matrix(exp_dat), as.matrix(rec_trans), tol = .05)
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_YeoJohnson(x1, x2, x3, x4)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_bagimpute.R b/tests/testthat/test_bagimpute.R
new file mode 100644
index 0000000..fe29ef5
--- /dev/null
+++ b/tests/testthat/test_bagimpute.R
@@ -0,0 +1,57 @@
+library(testthat)
+library(ipred)
+library(rpart)
+library(recipes)
+data("biomass")
+
+biomass$fac <- factor(sample(letters[1:2], size = nrow(biomass), replace = TRUE))
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + fac,
+              data = biomass)
+
+test_that('imputation models', {
+  imputed <- rec %>%
+    step_bagimpute(carbon, fac, impute_with = imp_vars(hydrogen, oxygen), seed_val = 12)
+
+  imputed_trained <- prep(imputed, training = biomass, verbose = FALSE)
+
+
+  ## make sure we get the same trees given the same random samples
+  carb_samps <- lapply(imputed_trained$steps[[1]]$models[["carbon"]]$mtrees,
+                       function(x) x$bindx)
+  for(i in seq_along(carb_samps)) {
+    carb_data <- biomass[carb_samps[[i]], c("carbon", "hydrogen", "oxygen")]
+    carb_mod <- rpart(carbon ~ ., data = carb_data,
+                      control= rpart.control(xval=0))
+    expect_equal(carb_mod$splits,
+                 imputed_trained$steps[[1]]$models[["carbon"]]$mtrees[[i]]$btree$splits)
+
+  }
+
+  fac_samps <- lapply(imputed_trained$steps[[1]]$models[[1]]$mtrees,
+                      function(x) x$bindx)
+
+  fac_ctrl <- imputed_trained$steps[[1]]$models[["fac"]]$mtrees[[1]]$btree$control
+
+  ## make sure we get the same trees given the same random samples
+  for(i in seq_along(fac_samps)) {
+    fac_data <- biomass[fac_samps[[i]], c("fac", "hydrogen", "oxygen")]
+    fac_mod <- rpart(fac ~ ., data = fac_data, control= fac_ctrl)
+    expect_equal(fac_mod$splits,
+                 imputed_trained$steps[[1]]$models[["fac"]]$mtrees[[i]]$btree$splits)
+  }
+})
+
+
+test_that('printing', {
+  imputed <- rec %>%
+    step_bagimpute(carbon, impute_with = imp_vars(hydrogen), seed_val = 12)
+  
+  expect_output(print(imputed))
+  expect_output(prep(imputed, training = biomass))
+})
+
+
+
+
+
diff --git a/tests/testthat/test_bin2factor.R b/tests/testthat/test_bin2factor.R
new file mode 100644
index 0000000..e7168b2
--- /dev/null
+++ b/tests/testthat/test_bin2factor.R
@@ -0,0 +1,38 @@
+library(testthat)
+library(recipes)
+
+data(covers)
+rec <- recipe(~ description, covers) %>%
+  step_regex(description, pattern = "(rock|stony)", result = "rocks") %>%
+  step_regex(description, pattern = "(rock|stony)", result = "more_rocks") 
+
+test_that('default options', {
+  rec1 <- rec %>% step_bin2factor(rocks)
+  rec1 <- prep(rec1, training = covers)
+  res1 <- bake(rec1, newdata = covers)
+  expect_true(all(diag(table(res1$rocks, res1$more_rocks)) == 0))
+})
+
+
+test_that('nondefault options', {
+  rec2 <- rec %>% step_bin2factor(rocks, levels = letters[2:1])
+  rec2 <- prep(rec2, training = covers)
+  res2 <- bake(rec2, newdata = covers)
+  expect_true(all(diag(table(res2$rocks, res2$more_rocks)) == 0))
+})
+
+
+test_that('bad options', {
+  rec3 <- rec %>% step_bin2factor(description)
+  expect_error(prep(rec3, training = covers))
+  expect_error(rec %>% step_bin2factor(rocks, levels = letters[1:5]))
+  expect_error(rec %>% step_bin2factor(rocks, levels = 1:2))
+})
+
+
+test_that('printing', {
+  rec2 <- rec %>% step_bin2factor(rocks, levels = letters[2:1])
+  expect_output(print(rec2))
+  expect_output(prep(rec2, training = covers))
+})
+
diff --git a/tests/testthat/test_center_scale.R b/tests/testthat/test_center_scale.R
new file mode 100644
index 0000000..9f77483
--- /dev/null
+++ b/tests/testthat/test_center_scale.R
@@ -0,0 +1,68 @@
+library(testthat)
+context("Testing center and scale")
+
+library(recipes)
+
+means <- vapply(biomass[, 3:7], mean, c(mean = 0))
+sds <- vapply(biomass[, 3:7], sd, c(sd = 0))
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass)
+
+test_that('correct means and std devs', {
+  standardized <- rec %>%
+    step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) %>%
+    step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur)
+
+  standardized_trained <- prep(standardized, training = biomass, verbose = FALSE)
+
+  expect_equal(standardized_trained$steps[[1]]$means, means)
+  expect_equal(standardized_trained$steps[[2]]$sds, sds)
+})
+
+test_that('training in stages', {
+  at_once <- rec %>%
+    step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) %>%
+    step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur)
+
+  at_once_trained <- prep(at_once, training = biomass, verbose = FALSE)
+
+  ## not train in stages
+  center_first <- rec %>%
+    step_center(carbon, hydrogen, oxygen, nitrogen, sulfur)
+  center_first_trained <- prep(center_first, training = biomass, verbose = FALSE)
+  in_stages <- center_first_trained %>%
+    step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur)
+  in_stages_trained <- prep(in_stages, training = biomass, verbose = FALSE)
+  in_stages_retrained <- prep(in_stages, training = biomass, verbose = FALSE, fresh = TRUE)
+
+  expect_equal(at_once_trained, in_stages_trained)
+  expect_equal(at_once_trained, in_stages_retrained)
+
+})
+
+
+test_that('single predictor', {
+  standardized <- rec %>%
+    step_center(carbon) %>%
+    step_scale(hydrogen)
+
+  standardized_trained <- prep(standardized, training = biomass, verbose = FALSE)
+  results <- bake(standardized_trained, biomass)
+
+  exp_res <- biomass[, 3:8]
+  exp_res$carbon <- exp_res$carbon - mean(exp_res$carbon)
+  exp_res$hydrogen <- exp_res$hydrogen / sd(exp_res$hydrogen)
+
+  expect_equal(as.data.frame(results), exp_res[, colnames(results)])
+})
+
+
+test_that('printing', {
+  standardized <- rec %>%
+    step_center(carbon) %>%
+    step_scale(hydrogen)
+  expect_output(print(standardized))
+  expect_output(prep(standardized, training = biomass))
+})
+
diff --git a/tests/testthat/test_classdist.R b/tests/testthat/test_classdist.R
new file mode 100644
index 0000000..631d233
--- /dev/null
+++ b/tests/testthat/test_classdist.R
@@ -0,0 +1,47 @@
+library(testthat)
+library(recipes)
+
+test_that("defaults", {
+  rec <- recipe(Species ~ ., data = iris) %>%
+    step_classdist(all_predictors(), class = "Species", log = FALSE)
+  trained <- prep(rec, training = iris, verbose = FALSE)
+  dists <- bake(trained, newdata = iris)
+  dists <- dists[, grepl("classdist", names(dists))]
+  dists <- as.data.frame(dists)
+
+  split_up <- split(iris[, 1:4], iris$Species)
+  mahalanobis2 <- function(x, y)
+    mahalanobis(y, center = colMeans(x), cov = cov(x))
+
+  exp_res <- lapply(split_up, mahalanobis2, y = iris[, 1:4])
+  exp_res <- as.data.frame(exp_res)
+
+  for(i in 1:ncol(exp_res))
+    expect_equal(dists[, i], exp_res[, i])
+})
+
+test_that("alt args", {
+  rec <- recipe(Species ~ ., data = iris) %>%
+    step_classdist(all_predictors(), class = "Species", log = FALSE, mean_func = median)
+  trained <- prep(rec, training = iris, verbose = FALSE)
+  dists <- bake(trained, newdata = iris)
+  dists <- dists[, grepl("classdist", names(dists))]
+  dists <- as.data.frame(dists)
+
+  split_up <- split(iris[, 1:4], iris$Species)
+  mahalanobis2 <- function(x, y)
+    mahalanobis(y, center = apply(x, 2, median), cov = cov(x))
+
+  exp_res <- lapply(split_up, mahalanobis2, y = iris[, 1:4])
+  exp_res <- as.data.frame(exp_res)
+
+  for(i in 1:ncol(exp_res))
+    expect_equal(dists[, i], exp_res[, i])
+})
+
+test_that('printing', {
+  rec <- recipe(Species ~ ., data = iris) %>%
+    step_classdist(all_predictors(), class = "Species", log = FALSE)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = iris))
+})
diff --git a/tests/testthat/test_corr.R b/tests/testthat/test_corr.R
new file mode 100644
index 0000000..023951b
--- /dev/null
+++ b/tests/testthat/test_corr.R
@@ -0,0 +1,43 @@
+library(testthat)
+library(recipes)
+
+n <- 100
+set.seed(424)
+dat <- matrix(rnorm(n*5), ncol =  5)
+dat <- as.data.frame(dat)
+dat$duplicate <- dat$V1
+dat$V6 <- -dat$V2 + runif(n)*.2
+
+test_that('high filter', {
+  set.seed(1)
+  rec <- recipe(~ ., data = dat)
+  filtering <- rec %>% 
+    step_corr(all_predictors(), threshold = .5)
+  
+  filtering_trained <- prep(filtering, training = dat, verbose = FALSE)
+  
+  removed <- c("V6", "V1")
+  
+  expect_equal(filtering_trained$steps[[1]]$removals, removed)
+})
+
+test_that('low filter', {
+  rec <- recipe(~ ., data = dat)
+  filtering <- rec %>% 
+    step_corr(all_predictors(), threshold = 1)
+  
+  filtering_trained <- prep(filtering, training = dat, verbose = FALSE)
+
+  expect_equal(filtering_trained$steps[[1]]$removals, numeric(0))
+})
+
+
+test_that('printing', {
+  set.seed(1)
+  rec <- recipe(~ ., data = dat)
+  filtering <- rec %>% 
+    step_corr(all_predictors(), threshold = .5)
+  expect_output(print(filtering))
+  expect_output(prep(filtering, training = dat))
+})
+
diff --git a/tests/testthat/test_date.R b/tests/testthat/test_date.R
new file mode 100644
index 0000000..e81145c
--- /dev/null
+++ b/tests/testthat/test_date.R
@@ -0,0 +1,97 @@
+library(testthat)
+library(recipes)
+library(lubridate)
+library(tibble)
+
+examples <- data.frame(Dan = ymd("2002-03-04") + days(1:10),
+                       Stefan = ymd("2006-01-13") + days(1:10))
+
+examples$Dan <- as.POSIXct(examples$Dan)
+
+date_rec <- recipe(~ Dan + Stefan, examples) %>%
+  step_date(all_predictors())
+
+feats <- c("year", "doy", "week", "decimal", "semester", "quarter", "dow", "month")
+
+test_that('default option', {
+  date_rec <- recipe(~ Dan + Stefan, examples) %>%
+    step_date(all_predictors(), features = feats)
+  
+  date_rec <- prep(date_rec, training = examples)   
+  date_res <- bake(date_rec, newdata = examples)
+  
+  date_exp <- tibble(
+    Dan = examples$Dan,
+    Stefan = examples$Stefan,
+    Dan_year = year(examples$Dan),
+    Dan_doy = yday(examples$Dan),
+    Dan_week = week(examples$Dan),
+    Dan_decimal = decimal_date(examples$Dan),
+    Dan_semester = semester(examples$Dan),
+    Dan_quarter = quarter(examples$Dan),
+    Dan_dow = wday(examples$Dan, label = TRUE, abbr = TRUE),
+    Dan_month = month(examples$Dan, label = TRUE, abbr = TRUE),
+    Stefan_year = year(examples$Stefan),
+    Stefan_doy = yday(examples$Stefan),
+    Stefan_week = week(examples$Stefan),
+    Stefan_decimal = decimal_date(examples$Stefan),
+    Stefan_semester = semester(examples$Stefan),
+    Stefan_quarter = quarter(examples$Stefan),
+    Stefan_dow = wday(examples$Stefan, label = TRUE, abbr = TRUE),
+    Stefan_month = month(examples$Stefan, label = TRUE, abbr = TRUE)
+  )
+  date_exp$Dan_dow <- factor(as.character(date_exp$Dan_dow), levels = levels(date_exp$Dan_dow))
+  date_exp$Dan_month <- factor(as.character(date_exp$Dan_month), levels = levels(date_exp$Dan_month))  
+  date_exp$Stefan_dow <- factor(as.character(date_exp$Stefan_dow), levels = levels(date_exp$Stefan_dow))
+  date_exp$Stefan_month <- factor(as.character(date_exp$Stefan_month), levels = levels(date_exp$Stefan_month))
+  
+  expect_equal(date_res, date_exp)
+})
+
+
+test_that('nondefault options', {
+  date_rec <- recipe(~ Dan + Stefan, examples) %>%
+    step_date(all_predictors(), features = c("dow", "month"), label = FALSE)
+  
+  date_rec <- prep(date_rec, training = examples)   
+  date_res <- bake(date_rec, newdata = examples)
+  
+  date_exp <- tibble(
+    Dan = examples$Dan,
+    Stefan = examples$Stefan,
+    Dan_dow = wday(examples$Dan, label = FALSE),
+    Dan_month = month(examples$Dan, label = FALSE),
+    Stefan_dow = wday(examples$Stefan, label = FALSE),
+    Stefan_month = month(examples$Stefan, label = FALSE)
+  )
+
+  expect_equal(date_res, date_exp)
+})
+
+
+test_that('ordinal values', {
+  date_rec <- recipe(~ Dan + Stefan, examples) %>%
+    step_date(all_predictors(), features = c("dow", "month"), ordinal = TRUE)
+  
+  date_rec <- prep(date_rec, training = examples)   
+  date_res <- bake(date_rec, newdata = examples)
+  
+  date_exp <- tibble(
+    Dan = examples$Dan,
+    Stefan = examples$Stefan,
+    Dan_dow = wday(examples$Dan, label = TRUE),
+    Dan_month = month(examples$Dan, label = TRUE),
+    Stefan_dow = wday(examples$Stefan, label = TRUE),
+    Stefan_month = month(examples$Stefan, label = TRUE)
+  )
+  
+  expect_equal(date_res, date_exp)
+})
+
+
+test_that('printing', {
+  date_rec <- recipe(~ Dan + Stefan, examples) %>%
+    step_date(all_predictors(), features = feats)
+  expect_output(print(date_rec))
+  expect_output(prep(date_rec, training = examples))
+})
diff --git a/tests/testthat/test_depth.R b/tests/testthat/test_depth.R
new file mode 100644
index 0000000..18ad952
--- /dev/null
+++ b/tests/testthat/test_depth.R
@@ -0,0 +1,55 @@
+library(testthat)
+library(recipes)
+library(ddalpha)
+
+test_that("defaults", {
+  rec <- recipe(Species ~ ., data = iris) %>%
+    step_depth(all_predictors(), class = "Species", metric = "spatial")
+  trained <- prep(rec, training = iris, verbose = FALSE)
+  depths <- bake(trained, newdata = iris)
+  depths <- depths[, grepl("depth", names(depths))]
+  depths <- as.data.frame(depths)
+
+  split_up <- split(iris[, 1:4], iris$Species)
+  spatial <- function(x, y)
+    depth.spatial(x = y, data = x)
+
+  exp_res <- lapply(split_up, spatial, y = iris[, 1:4])
+  exp_res <- as.data.frame(exp_res)
+
+  for(i in 1:ncol(exp_res))
+    expect_equal(depths[, i], exp_res[, i])
+})
+
+test_that("alt args", {
+  rec <- recipe(Species ~ ., data = iris) %>%
+    step_depth(all_predictors(), class = "Species",
+               metric = "Mahalanobis",
+               options = list(mah.estimate = "MCD", mah.parMcd = .75))
+  trained <- prep(rec, training = iris, verbose = FALSE)
+  depths <- bake(trained, newdata = iris)
+  depths <- depths[, grepl("depth", names(depths))]
+  depths <- as.data.frame(depths)
+
+  split_up <- split(iris[, 1:4], iris$Species)
+  Mahalanobis <- function(x, y)
+    depth.Mahalanobis(x = y, data = x, mah.estimate = "MCD", mah.parMcd = .75)
+
+  exp_res <- lapply(split_up, Mahalanobis, y = iris[, 1:4])
+  exp_res <- as.data.frame(exp_res)
+
+  head(exp_res)
+  head(depths)
+
+  for(i in 1:ncol(exp_res))
+    expect_equal(depths[, i], exp_res[, i])
+})
+
+
+test_that('printing', {
+  rec <- recipe(Species ~ ., data = iris) %>%
+    step_depth(all_predictors(), class = "Species", metric = "spatial")
+  expect_output(print(rec))
+  expect_output(prep(rec, training = iris))
+})
+
diff --git a/tests/testthat/test_discretized.R b/tests/testthat/test_discretized.R
new file mode 100644
index 0000000..c25d54b
--- /dev/null
+++ b/tests/testthat/test_discretized.R
@@ -0,0 +1,39 @@
+library(testthat)
+library(recipes)
+
+
+ex_tr <- data.frame(x1 = 1:100, 
+                    x2 = rep(1:5, each = 20), 
+                    x3 = factor(rep(letters[1:2], each = 50)))
+ex_te <- data.frame(x1 = c(1, 50, 101, NA))
+                     
+lvls_breaks_4 <- c('bin_missing', 'bin1', 'bin2', 'bin3', 'bin4')
+
+test_that('default args', {
+  bin_1 <- discretize(ex_tr$x1)
+  pred_1 <- predict(bin_1, ex_te$x1)
+  exp_1 <- factor(c("bin1", "bin2", "bin4", "bin_missing"), levels = lvls_breaks_4)
+  expect_equal(pred_1, exp_1)
+})
+
+test_that('NA values', {
+  bin_2 <- discretize(ex_tr$x1, keep_na = FALSE)
+  pred_2 <- predict(bin_2, ex_te$x1)
+  exp_2 <- factor(c("bin1", "bin2", "bin4", NA), levels = lvls_breaks_4[-1])
+  expect_equal(pred_2, exp_2)
+})
+
+test_that('NA values from out of range', {
+  bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE)
+  pred_3 <- predict(bin_3, ex_te$x1)
+  exp_3 <- factor(c("bin1", "bin2", NA, NA), levels = lvls_breaks_4[-1])
+  expect_equal(pred_3, exp_3)
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_tr) %>% 
+    step_discretize(x1)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_tr))
+})
diff --git a/tests/testthat/test_dummies.R b/tests/testthat/test_dummies.R
new file mode 100644
index 0000000..99fd460
--- /dev/null
+++ b/tests/testthat/test_dummies.R
@@ -0,0 +1,39 @@
+library(testthat)
+library(recipes)
+
+data(okc)
+
+okc$location <- gsub(", california", "", okc$location)
+okc$diet[is.na(okc$diet)] <- "missing"
+okc <- okc[complete.cases(okc), -5]
+
+okc_fac <- data.frame(okc)
+
+test_that('dummy variables with string inputs', {
+  rec <- recipe(age ~ ., data = okc)
+  dummy <- rec %>% step_dummy(diet, location)
+  dummy_trained <- prep(dummy, training = okc, verbose = FALSE, stringsAsFactors = FALSE)
+  dummy_pred <- bake(dummy_trained, newdata = okc)
+  dummy_pred <- dummy_pred[, order(colnames(dummy_pred))]
+  dummy_pred <- as.data.frame(dummy_pred)
+  rownames(dummy_pred) <- NULL
+  
+  exp_res <- model.matrix(age ~ ., data = okc_fac)[, -1]
+  exp_res <- exp_res[, colnames(exp_res) != "age"]
+  colnames(exp_res) <- gsub("^location", "location_", colnames(exp_res))
+  colnames(exp_res) <- gsub("^diet", "diet_", colnames(exp_res))
+  colnames(exp_res) <- make.names(colnames(exp_res))
+  exp_res <- exp_res[, order(colnames(exp_res))]
+  exp_res <- as.data.frame(exp_res)
+  rownames(exp_res) <- NULL
+  expect_equal(dummy_pred, exp_res)
+})
+
+
+test_that('printing', {
+  rec <- recipe(age ~ ., data = okc)
+  dummy <- rec %>% step_dummy(diet, location)
+  expect_output(print(dummy))
+  expect_output(prep(dummy, training = okc))
+})
+
diff --git a/tests/testthat/test_holiday.R b/tests/testthat/test_holiday.R
new file mode 100644
index 0000000..9e3383c
--- /dev/null
+++ b/tests/testthat/test_holiday.R
@@ -0,0 +1,57 @@
+library(testthat)
+library(recipes)
+library(lubridate)
+
+exp_dates <- data.frame(date  = ymd(c("2017-12-25", "2017-05-29", "2017-04-16")),
+                        holiday = c("ChristmasDay", "USMemorialDay", "Easter"),
+                        stringsAsFactors = FALSE)
+test_data <- data.frame(day  = ymd("2017-01-01") + days(0:364))
+
+test_that('Date class', {
+  holiday_rec <- recipe(~ day, test_data) %>%
+    step_holiday(all_predictors(), holidays = exp_dates$holiday)
+
+  holiday_rec <- prep(holiday_rec, training = test_data)
+  holiday_ind <- bake(holiday_rec, test_data)
+
+  all.equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1],
+            exp_dates$date[exp_dates$holiday == "USMemorialDay"])
+
+  expect_equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1],
+               exp_dates$date[exp_dates$holiday == "USMemorialDay"])
+  expect_equal(holiday_ind$day[holiday_ind$day_ChristmasDay == 1],
+               exp_dates$date[exp_dates$holiday == "ChristmasDay"])
+  expect_equal(holiday_ind$day[holiday_ind$day_Easter == 1],
+               exp_dates$date[exp_dates$holiday == "Easter"])
+})
+
+
+test_that('POSIXct class', {
+  test_data$day <- as.POSIXct(test_data$day)
+  exp_dates$date <- as.POSIXct(exp_dates$date)
+
+  holiday_rec <- recipe(~ day, test_data) %>%
+    step_holiday(all_predictors(), holidays = exp_dates$holiday)
+
+  holiday_rec <- prep(holiday_rec, training = test_data)
+  holiday_ind <- bake(holiday_rec, test_data)
+
+  all.equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1],
+            exp_dates$date[exp_dates$holiday == "USMemorialDay"])
+
+  expect_equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1],
+               exp_dates$date[exp_dates$holiday == "USMemorialDay"])
+  expect_equal(holiday_ind$day[holiday_ind$day_ChristmasDay == 1],
+               exp_dates$date[exp_dates$holiday == "ChristmasDay"])
+  expect_equal(holiday_ind$day[holiday_ind$day_Easter == 1],
+               exp_dates$date[exp_dates$holiday == "Easter"])
+})
+
+
+test_that('printing', {
+  holiday_rec <- recipe(~ day, test_data) %>%
+    step_holiday(all_predictors(), holidays = exp_dates$holiday)
+  expect_output(print(holiday_rec))
+  expect_output(prep(holiday_rec, training = test_data))
+})
+
diff --git a/tests/testthat/test_hyperbolic.R b/tests/testthat/test_hyperbolic.R
new file mode 100644
index 0000000..61ce981
--- /dev/null
+++ b/tests/testthat/test_hyperbolic.R
@@ -0,0 +1,45 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+ex_dat <- data.frame(x1 = seq(0, 1, length = n),
+                     x2 = seq(1, 0, length = n))
+
+get_exp <- function(x, f) 
+  as_tibble(lapply(x, f))
+
+
+test_that('simple hyperbolic trans', {
+  
+  for(func in c("sin", "cos", "tan")) {
+    for(invf in c(TRUE, FALSE)) {
+      rec <- recipe(~., data = ex_dat) %>% 
+        step_hyperbolic(x1, x2, func = func, inverse = invf)
+      
+      rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+      rec_trans <- bake(rec_trained, newdata = ex_dat)
+      
+      if(invf) {
+        foo <- get(paste0("a", func))
+      } else {
+        foo <- get(func)
+      }
+      
+      exp_res <- get_exp(ex_dat, foo)
+      
+      expect_equal(rec_trans, exp_res)
+    }
+  }
+  
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_hyperbolic(x1, x2, func = "sin", inverse = TRUE)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
+
diff --git a/tests/testthat/test_ica.R b/tests/testthat/test_ica.R
new file mode 100644
index 0000000..9710ffb
--- /dev/null
+++ b/tests/testthat/test_ica.R
@@ -0,0 +1,88 @@
+library(testthat)
+library(recipes)
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+## Generated using fastICA
+exp_comp <- structure(
+  c(-0.741586750989301, -0.473165319478851, -0.532724778033598, 
+    0.347336643017696, -0.523140911818999, 0.0839020928800183, -0.689112937865132, 
+    1.1905359062157, 2.87389193916233, 3.87655326677861, 0.662748883270711, 
+    0.108848159489063, 0.509384921516091, -0.708397515735095, -0.129606867389727, 
+    1.7900565287023, 0.171125628795304, 0.314289325954585, -0.142425199147843, 
+    -0.619509248504534, 0.38690051207701, -0.414352364956822, -0.609744599991299, 
+    -0.144705519030626, -0.293470631707416, -0.791746573929697, -0.634208572824357, 
+    1.36675934105489, -0.785855217530414, -0.730790987290872, -0.236417274868796, 
+    -0.210596011735952, -0.413793241941344, -0.511246150962085, -0.181254985021062, 
+    0.298659496162363, -0.757969803548959, -0.666845883775384, -0.240983277334825, 
+    -0.394806974813201, 1.44451054341856, 3.33833135277739, -0.54575996404394, 
+    -0.423145023192357, -0.388925027133234, -0.418629250017466, -0.463085718807788, 
+    -0.14499128867367, 0.323243757311295, -0.417689940076107, -0.777761367811451, 
+    -0.799107717902467, -0.548346133015069, 0.769235286712577, -0.40466870434895, 
+    -0.591389964794494, -0.208052301856056, -0.945352336400244, 0.919793619211536, 
+    -0.561549525440524, -0.535789943464846, -0.735536725127484, 3.7162236121338, 
+    0.459835444175181, 0.137984939011763, -0.755831873688866, -0.757751495230731, 
+    -0.512815283263682, 0.901123348803226, -0.755032174981781, -1.04745496967861, 
+    -0.481720409476034, -0.956534770489922, 2.39775097011864, -0.537189360991569, 
+    0.455171520278689, -0.764070183446535, -0.0133182183358093, 0.0084094629323547, 
+    -0.11887530759164, -0.50492491720854, -0.731237740789087, -0.810056304451282, 
+    -0.0654477889270799, -0.165218457853762, -0.384457532271443, 
+    -1.25744957888255, -0.164838366701182, -0.818591960610985, -0.577844253001226, 
+    0.159731749239493, -0.350242543749645, 3.22437340069565, -0.575271823706669, 
+    -0.171250094126726, 1.21819592885382, -0.303636775510361, 0.192247367642684, 
+    0.235728177283036, -0.768212986589321, 0.333147682813931, -0.403932170943429, 
+    -0.261749940045069, -0.331436881499356, -0.298793661022028, -0.255788540744319, 
+    -0.764483629396313, -0.162133725599773, -0.10676549266036, -0.349722429991475, 
+    -0.340728544016434, -0.358565693266084, 0.0242508678396987, -0.277425329351928, 
+    0.055217077863271, 0.146403703647814, -0.241268230680493, -0.283770652745491, 
+    -0.573657866580657, -0.224655195396099, 0.226079102614757, 2.03305968574443, 
+    -0.225655562941607, -0.155789455588855, -0.613828894885655, 0.480057477445702, 
+    0.277055812270816, -0.263765068404404, 0.0411239101983566, 0.30164066516454, 
+    -0.760891669412883, -0.478609196612072, -0.162692709808673, 3.12547570195871, 
+    -0.189300748528298, -0.16882558146447, -0.30745201359965, 2.77823976198232, 
+    -0.306599455530011, -0.979722296618571, -0.913952653732135, -0.608622766593967, 
+    -0.061561169157735, 0.0134953299517241, -0.111595843415483, -0.0995809192931606, 
+    -0.353150299985198, -0.173474040260694, -0.11913118533085, -0.268152445374219, 
+    -1.64524056576117, -0.052825674116391, 2.82692828099746, -0.257823074604271, 
+    -0.0316348082448068, -0.347414676200845, -0.237534967478309, 
+    -0.266298103195764, -0.0555773569483491, 2.35155293218832), 
+  .Dim = c(80L,  2L), 
+  .Dimnames = list(c("15", "20", "26", "31", "36", "41", "46", 
+                     "51", "55", "65", "69", "73", "76", "88", "91", "126", "132", 
+                     "136", "141", "147", "155", "162", "167", "173", "178", "183", 
+                     "190", "196", "203", "208", "213", "218", "223", "230", "235", 
+                     "241", "252", "257", "262", "267", "277", "282", "286", "294", 
+                     "299", "305", "309", "314", "319", "325", "330", "348", "353", 
+                     "357", "359", "370", "375", "385", "399", "407", "409", "414", 
+                     "419", "424", "429", "434", "439", "448", "467", "473", "477", 
+                     "482", "485", "493", "499", "516", "519", "527", "532", "535"
+  ), c("IC1", "IC2")))
+rownames(exp_comp) <- NULL
+
+test_that('correct ICA values', {
+  ica_extract <- rec %>% 
+    step_ica(carbon, hydrogen, oxygen, nitrogen, sulfur, num = 2)
+  
+  set.seed(12)
+  ica_extract_trained <- prep(ica_extract, training = biomass_tr, verbose = FALSE)
+  
+  ica_pred <- bake(ica_extract_trained, newdata = biomass_te)
+  ica_pred <- as.matrix(ica_pred)
+  
+  rownames(ica_pred) <- NULL
+  
+  expect_equal(ica_pred, exp_comp)
+})
+
+
+test_that('printing', {
+  ica_extract <- rec %>% 
+    step_ica(carbon, hydrogen, num = 2)
+  expect_output(print(ica_extract))
+  expect_output(prep(ica_extract, training = biomass_tr))
+})
diff --git a/tests/testthat/test_interact.R b/tests/testthat/test_interact.R
new file mode 100644
index 0000000..9808653
--- /dev/null
+++ b/tests/testthat/test_interact.R
@@ -0,0 +1,76 @@
+library(testthat)
+library(recipes)
+data("biomass")
+
+tr_biomass <- subset(biomass, dataset == "Training")[, -(1:2)]
+te_biomass <- subset(biomass, dataset == "Testing")[, -(1:2)]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = tr_biomass)
+
+test_that('non-factor variables with dot', {
+  int_rec <- rec %>% step_interact(~(.-HHV)^3, sep=":")
+  int_rec_trained <- prep(int_rec, training = tr_biomass, verbose = FALSE)
+  
+  te_new <- bake(int_rec_trained, newdata = te_biomass, all_predictors())
+  te_new <- te_new[, sort(names(te_new))]
+  te_new <- as.matrix(te_new)
+  
+  og_terms <- terms(~(.-HHV)^3, data = te_biomass)
+  te_og <- model.matrix(og_terms, data = te_biomass)[, -1]
+  te_og <- te_og[, sort(colnames(te_og))]
+  
+  rownames(te_new) <- NULL
+  rownames(te_og) <- NULL
+  
+  expect_equal(te_og, te_new)
+})
+
+
+test_that('non-factor variables with specific variables', {
+  int_rec <- rec %>% step_interact(~carbon:hydrogen + oxygen:nitrogen:sulfur, sep = ":")
+  int_rec_trained <- prep(int_rec, training = tr_biomass, verbose = FALSE)
+  
+  te_new <- bake(int_rec_trained, newdata = te_biomass, all_predictors())
+  te_new <- te_new[, sort(names(te_new))]
+  te_new <- as.matrix(te_new)
+  
+  og_terms <- terms(~carbon + hydrogen + oxygen + nitrogen + sulfur + 
+                      carbon:hydrogen + oxygen:nitrogen:sulfur, data = te_biomass)
+  te_og <- model.matrix(og_terms, data = te_biomass)[, -1]
+  te_og <- te_og[, sort(colnames(te_og))]
+  
+  rownames(te_new) <- NULL
+  rownames(te_og) <- NULL
+  
+  expect_equal(te_og, te_new)
+})
+
+
+test_that('printing', {
+  int_rec <- rec %>% step_interact(~carbon:hydrogen)
+  expect_output(print(int_rec))
+  expect_output(prep(int_rec, training = tr_biomass))
+})
+
+
+# currently failing; try to figure out why
+# test_that('with factors', {
+#   int_rec <- recipe(Sepal.Width ~ ., data = iris) %>% 
+#     step_interact(~ (. - Sepal.Width)^3, sep = ":")
+#   int_rec_trained <- prep(int_rec, iris)
+#   
+#   te_new <- bake(int_rec_trained, newdata = iris, role = "predictor")
+#   te_new <- te_new[, sort(names(te_new))]
+#   te_new <- as.matrix(te_new)
+#   
+#   og_terms <- terms(Sepal.Width ~ (.)^3, data = iris)
+#   te_og <- model.matrix(og_terms, data = iris)[, -1]
+#   te_og <- te_og[, sort(colnames(te_og))]
+#   
+#   rownames(te_new) <- NULL
+#   rownames(te_og) <- NULL
+#   
+#   all.equal(te_og, te_new)
+# })
+
diff --git a/tests/testthat/test_intercept.R b/tests/testthat/test_intercept.R
new file mode 100644
index 0000000..4dfea99
--- /dev/null
+++ b/tests/testthat/test_intercept.R
@@ -0,0 +1,61 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+ex_dat <- data.frame(cat = rep(c("A", "B"), each = 5), numer = 1:10)
+
+test_that('add appropriate column with default settings', {
+  rec <- recipe(~ ., data = ex_dat) %>%
+    step_intercept()
+
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+
+  exp_res <- tibble::add_column(ex_dat, "intercept" = 1, .before = TRUE)
+
+  expect_equal(rec_trans, exp_res)
+})
+
+test_that('adds arbitrary numeric column', {
+  rec <- recipe(~ ., data = ex_dat) %>%
+    step_intercept(name = "(Intercept)", value = 2.5)
+
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+
+  exp_res <- tibble::add_column(ex_dat, "(Intercept)" = 2.5, .before = TRUE)
+
+  expect_equal(rec_trans, exp_res)
+})
+
+
+test_that('deals with bad input', {
+  expect_error(
+    recipe(~ ., data = ex_dat) %>%
+      step_intercept(value = "Pie") %>%
+      prep(),
+    "Intercept value must be numeric."
+  )
+
+  expect_error(
+    recipe(~ ., data = ex_dat) %>%
+      step_intercept(name = 4) %>%
+      prep(),
+    "Intercept/constant column name must be a character value."
+  )
+
+  expect_warning(
+    recipe(~ ., data = ex_dat) %>%
+      step_intercept(all_predictors()) %>%
+      prep(),
+    "Selectors are not used for this step."
+  )
+})
+
+test_that('printing', {
+  rec <- recipe(~ ., data = ex_dat) %>%
+    step_intercept()
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_invlogit.R b/tests/testthat/test_invlogit.R
new file mode 100644
index 0000000..156549b
--- /dev/null
+++ b/tests/testthat/test_invlogit.R
@@ -0,0 +1,27 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+set.seed(12)
+ex_dat <- data.frame(x1 = rnorm(n),
+                     x2 = runif(n))
+
+test_that('simple logit trans', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_invlogit(x1)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+  
+  exp_res <- as_tibble(ex_dat)
+  exp_res$x1 <- binomial()$linkinv(exp_res$x1)
+  expect_equal(rec_trans, exp_res)
+})
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_invlogit(x1)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
diff --git a/tests/testthat/test_isomap.R b/tests/testthat/test_isomap.R
new file mode 100644
index 0000000..7faebcb
--- /dev/null
+++ b/tests/testthat/test_isomap.R
@@ -0,0 +1,43 @@
+library(testthat)
+library(recipes)
+
+## expected results form the `dimRed` package
+
+exp_res <- structure(list(Isomap1 = c(0.312570873898531, 0.371885353599467, 2.23124009833741,
+                                      0.248271457498181, -0.420128801874122),
+                          Isomap2 = c(-0.443724171391742, -0.407721529759647, 0.245721022395862,
+                                      3.112001672258, 0.0292770508011519),
+                          Isomap3 = c(0.761529345514676, 0.595015565588918, 1.59943072269788,
+                                      0.566884409484389, 1.53770327701819)),
+                     .Names = c("Isomap1","Isomap2", "Isomap3"),
+                     class = c("tbl_df", "tbl", "data.frame"),
+                     row.names = c(NA, -5L))
+
+set.seed(1)
+dat1 <- matrix(rnorm(15), ncol = 3)
+dat2 <- matrix(rnorm(15), ncol = 3)
+colnames(dat1) <- paste0("x", 1:3)
+colnames(dat2) <- paste0("x", 1:3)
+
+rec <- recipe( ~ ., data = dat1)
+
+test_that('correct Isomap values', {
+  skip_on_cran()
+  im_rec <- rec %>%
+    step_isomap(x1, x2, x3, options = list(knn = 3), num = 3)
+
+  im_trained <- prep(im_rec, training = dat1, verbose = FALSE)
+
+  im_pred <- bake(im_trained, newdata = dat2)
+
+  all.equal(as.matrix(im_pred), as.matrix(exp_res))
+})
+
+
+test_that('printing', {
+  im_rec <- rec %>%
+    step_isomap(x1, x2, x3, options = list(knn = 3), num = 3)
+  expect_output(print(im_rec))
+  expect_output(prep(im_rec, training = dat1))
+})
+
diff --git a/tests/testthat/test_knnimpute.R b/tests/testthat/test_knnimpute.R
new file mode 100644
index 0000000..be9cac4
--- /dev/null
+++ b/tests/testthat/test_knnimpute.R
@@ -0,0 +1,63 @@
+library(testthat)
+library(gower)
+library(recipes)
+library(dplyr)
+data("biomass")
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training", ]
+biomass_te <- biomass[biomass$dataset == "Testing", ]
+
+# induce some missing data at random
+set.seed(9039)
+carb_missing <- sample(1:nrow(biomass_te), 3)
+nitro_missing <- sample(1:nrow(biomass_te), 3)
+
+biomass_te$carbon[carb_missing] <- NA
+biomass_te$nitrogen[nitro_missing] <- NA
+
+test_that('imputation values', {
+  discr_rec <- rec %>%
+    step_discretize(nitrogen, options = list(keep_na = FALSE))
+  impute_rec <- discr_rec %>%
+    step_knnimpute(carbon,
+                   nitrogen,
+                   impute_with = imp_vars(hydrogen, oxygen, nitrogen),
+                   K = 3)
+
+  discr_rec <- prep(discr_rec, training = biomass_tr, verbose = FALSE)
+  tr_data <- bake(discr_rec, newdata = biomass_tr)
+  te_data <- bake(discr_rec, newdata = biomass_te) %>%
+    dplyr::select(hydrogen, oxygen, nitrogen, carbon)
+  
+  nn <- gower_topn(te_data[, c("hydrogen", "oxygen", "nitrogen")],
+                   tr_data[, c("hydrogen", "oxygen", "nitrogen")],
+                   n = 3)$index
+  
+  impute_rec <- prep(impute_rec, training = biomass_tr, verbose = FALSE)
+  imputed_te <- bake(impute_rec, newdata = biomass_te)
+  
+  for(i in carb_missing) {
+    nn_tr_ind <- nn[, i]
+    nn_tr_data <- tr_data$carbon[nn_tr_ind]
+    expect_equal(imputed_te$carbon[i], mean(nn_tr_data))
+  }
+  
+  for(i in nitro_missing) {
+    nn_tr_ind <- nn[, i]
+    nn_tr_data <- tr_data$nitrogen[nn_tr_ind]
+    expect_equal(as.character(imputed_te$nitrogen[i]), 
+                 recipes:::mode_est(nn_tr_data))
+  }  
+})
+
+
+test_that('printing', {
+  discr_rec <- rec %>%
+    step_discretize(nitrogen, options = list(keep_na = FALSE))
+  expect_output(print(discr_rec))
+  expect_output(prep(discr_rec, training = biomass_tr))
+})
+
diff --git a/tests/testthat/test_kpca.R b/tests/testthat/test_kpca.R
new file mode 100644
index 0000000..17d5b38
--- /dev/null
+++ b/tests/testthat/test_kpca.R
@@ -0,0 +1,42 @@
+library(testthat)
+library(kernlab)
+library(recipes)
+
+set.seed(131)
+tr_dat <- matrix(rnorm(100*6), ncol = 6)
+te_dat <- matrix(rnorm(20*6), ncol = 6)
+colnames(tr_dat) <- paste0("X", 1:6)
+colnames(te_dat) <- paste0("X", 1:6)
+
+rec <- recipe(X1 ~ ., data = tr_dat)
+
+test_that('correct kernel PCA values', {
+  kpca_rec <- rec %>%
+    step_kpca(X2, X3, X4, X5, X6)
+  
+  kpca_trained <- prep(kpca_rec, training = tr_dat, verbose = FALSE)
+  
+  pca_pred <- bake(kpca_trained, newdata = te_dat)
+  pca_pred <- as.matrix(pca_pred)
+  
+  pca_exp <- kpca(as.matrix(tr_dat[, -1]), 
+                  kernel = kpca_rec$steps[[1]]$options$kernel,
+                  kpar = kpca_rec$steps[[1]]$options$kpar)
+
+  pca_pred_exp <- kernlab::predict(pca_exp, te_dat[, -1])[, 1:kpca_trained$steps[[1]]$num]
+  colnames(pca_pred_exp) <- paste0("kPC", 1:kpca_trained$steps[[1]]$num)
+  
+  rownames(pca_pred) <- NULL
+  rownames(pca_pred_exp) <- NULL
+  
+  expect_equal(pca_pred, pca_pred_exp)
+})
+
+
+test_that('printing', {
+  kpca_rec <- rec %>%
+    step_kpca(X2, X3, X4, X5, X6)
+  expect_output(print(kpca_rec))
+  expect_output(prep(kpca_rec, training = tr_dat))
+})
+
diff --git a/tests/testthat/test_lincomb.R b/tests/testthat/test_lincomb.R
new file mode 100644
index 0000000..4800fb4
--- /dev/null
+++ b/tests/testthat/test_lincomb.R
@@ -0,0 +1,67 @@
+library(testthat)
+library(recipes)
+
+dummies <- cbind(model.matrix( ~ block - 1, npk), 
+                 model.matrix( ~ N - 1, npk), 
+                 model.matrix( ~ P - 1, npk), 
+                 model.matrix( ~ K - 1, npk),
+                 yield = npk$yield)
+
+dummies <- as.data.frame(dummies)
+
+dum_rec <- recipe(yield ~ . , data = dummies)
+
+###################################################################
+
+data(biomass)
+biomass$new_1 <- with(biomass,
+                      .1*carbon - .2*hydrogen + .6*sulfur)
+biomass$new_2 <- with(biomass,
+                      .5*carbon - .2*oxygen + .6*nitrogen)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+biomass_rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + 
+                        sulfur + new_1 + new_2,
+                      data = biomass_tr)
+
+###################################################################
+
+test_that('example 1', {
+  dum_filtered <- dum_rec %>% 
+    step_lincomb(all_predictors())
+  dum_filtered <- prep(dum_filtered, training = dummies, verbose = FALSE)
+  removed <- c("N1", "P1", "K1")
+  expect_equal(dum_filtered$steps[[1]]$removals, removed)
+})
+
+test_that('example 2', {
+  lincomb_filter <- biomass_rec %>%
+    step_lincomb(all_predictors())
+  
+  filtering_trained <- prep(lincomb_filter, training = biomass_tr)
+  test_res <- bake(filtering_trained, newdata = biomass_te)
+
+  expect_true(all(!(paste0("new_", 1:2) %in% colnames(test_res))))
+})
+
+test_that('no exclusions', {
+  biomass_rec_2 <- recipe(HHV ~ carbon + hydrogen, data = biomass_tr)
+  lincomb_filter_2 <- biomass_rec_2 %>%
+    step_lincomb(all_predictors())
+  
+  filtering_trained_2 <- prep(lincomb_filter_2, training = biomass_tr)
+  test_res_2 <- bake(filtering_trained_2, newdata = biomass_te)
+  
+  expect_true(length(filtering_trained_2$steps[[1]]$removals) == 0)
+  expect_true(all(colnames(test_res_2) == c("carbon", "hydrogen")))
+})
+
+
+test_that('printing', {
+  dum_filtered <- dum_rec %>% 
+    step_lincomb(all_predictors())
+  expect_output(print(dum_filtered))
+  expect_output(prep(dum_filtered, training = dummies))
+})
diff --git a/tests/testthat/test_log.R b/tests/testthat/test_log.R
new file mode 100644
index 0000000..ad0f422
--- /dev/null
+++ b/tests/testthat/test_log.R
@@ -0,0 +1,44 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+set.seed(1)
+ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)),
+                     x2 = 1/abs(rnorm(n)),
+                     x3 = rep(1:2, each = n/2),
+                     x4 = rexp(n))
+
+test_that('simple log trans', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_log(x1, x2, x3, x4)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+  
+  exp_res <- as_tibble(lapply(ex_dat, log))
+  
+  expect_equal(rec_trans, exp_res)
+})
+
+
+test_that('alt base', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_log(x1, x2, x3, x4, base = pi)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+  
+  exp_res <- as_tibble(lapply(ex_dat, log, base = pi))
+  
+  expect_equal(rec_trans, exp_res)
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_log(x1, x2, x3, x4)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_logit.R b/tests/testthat/test_logit.R
new file mode 100644
index 0000000..e80e9d1
--- /dev/null
+++ b/tests/testthat/test_logit.R
@@ -0,0 +1,36 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+set.seed(12)
+ex_dat <- data.frame(x1 = runif(n),
+                     x2 = rnorm(n))
+
+test_that('simple logit trans', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_logit(x1)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_trans <- bake(rec_trained, newdata = ex_dat)
+  
+  exp_res <- as_tibble(ex_dat)
+  exp_res$x1 <- binomial()$linkfun(exp_res$x1)
+  expect_equal(rec_trans, exp_res)
+})
+
+
+test_that('out of bounds logit trans', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_logit(x1, x2)
+  
+  expect_error(prep(rec, training = ex_dat, verbose = FALSE))
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_logit(x1)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
diff --git a/tests/testthat/test_meanimpute.R b/tests/testthat/test_meanimpute.R
new file mode 100644
index 0000000..45401e5
--- /dev/null
+++ b/tests/testthat/test_meanimpute.R
@@ -0,0 +1,56 @@
+library(testthat)
+library(recipes)
+data("credit_data")
+
+set.seed(342)
+in_training <- sample(1:nrow(credit_data), 2000)
+
+credit_tr <- credit_data[ in_training, ]
+credit_te <- credit_data[-in_training, ]
+
+test_that('simple mean', {
+  rec <- recipe(Price ~ ., data = credit_tr)
+  
+  impute_rec <- rec %>%
+    step_meanimpute(Age, Assets, Income)
+  imputed <- prep(impute_rec, training = credit_tr, verbose = FALSE)
+  te_imputed <- bake(imputed, newdata = credit_te)
+
+  expect_equal(te_imputed$Age, credit_te$Age)
+  expect_equal(te_imputed$Assets[is.na(credit_te$Assets)], 
+               rep(mean(credit_tr$Assets, na.rm = TRUE), 
+                   sum(is.na(credit_te$Assets))))
+  expect_equal(te_imputed$Income[is.na(credit_te$Income)], 
+               rep(mean(credit_tr$Income, na.rm = TRUE), 
+                   sum(is.na(credit_te$Income))))  
+})
+
+test_that('trimmed mean', {
+  rec <- recipe(Price ~ ., data = credit_tr)
+  
+  impute_rec <- rec %>%
+    step_meanimpute(Assets, trim = .1)
+  imputed <- prep(impute_rec, training = credit_tr, verbose = FALSE)
+  te_imputed <- bake(imputed, newdata = credit_te)
+  
+  expect_equal(te_imputed$Assets[is.na(credit_te$Assets)], 
+               rep(mean(credit_tr$Assets, na.rm = TRUE, trim = .1), 
+                   sum(is.na(credit_te$Assets))))
+})
+
+test_that('non-numeric', {
+  rec <- recipe(Price ~ ., data = credit_tr)
+  
+  impute_rec <- rec %>%
+    step_meanimpute(Assets, Job)
+  expect_error(prep(impute_rec, training = credit_tr, verbose = FALSE))
+})
+
+
+test_that('printing', {
+  impute_rec <- recipe(Price ~ ., data = credit_tr) %>%
+    step_meanimpute(Age, Assets, Income)
+  expect_output(print(impute_rec))
+  expect_output(prep(impute_rec, training = credit_tr))
+})
+
diff --git a/tests/testthat/test_modeimpute.R b/tests/testthat/test_modeimpute.R
new file mode 100644
index 0000000..c6252bd
--- /dev/null
+++ b/tests/testthat/test_modeimpute.R
@@ -0,0 +1,46 @@
+library(testthat)
+library(recipes)
+data("credit_data")
+
+set.seed(342)
+in_training <- sample(1:nrow(credit_data), 2000)
+
+credit_tr <- credit_data[ in_training, ]
+credit_te <- credit_data[-in_training, ]
+
+test_that('simple modes', {
+  rec <- recipe(Price ~ ., data = credit_tr)
+  
+  impute_rec <- rec %>%
+    step_modeimpute(Status, Home, Marital)
+  imputed <- prep(impute_rec, training = credit_tr, verbose = FALSE)
+  te_imputed <- bake(imputed, newdata = credit_te)
+
+  expect_equal(te_imputed$Status, credit_te$Status)
+  home_exp <- rep(recipes:::mode_est(credit_tr$Home), 
+                  sum(is.na(credit_te$Home)))
+  home_exp <- factor(home_exp, levels = levels(credit_te$Home))
+  expect_equal(te_imputed$Home[is.na(credit_te$Home)], 
+               home_exp)
+  marital_exp <- rep(recipes:::mode_est(credit_tr$Marital), 
+                  sum(is.na(credit_te$Marital)))
+  marital_exp <- factor(marital_exp, levels = levels(credit_te$Marital))
+  expect_equal(te_imputed$Marital[is.na(credit_te$Marital)], 
+               marital_exp)
+})
+
+
+test_that('non-nominal', {
+  rec <- recipe(Price ~ ., data = credit_tr)
+  
+  impute_rec <- rec %>%
+    step_modeimpute(Assets, Job)
+  expect_error(prep(impute_rec, training = credit_tr, verbose = FALSE))
+})
+
+test_that('printing', {
+  impute_rec <- recipe(Price ~ ., data = credit_tr) %>%
+    step_modeimpute(Status, Home, Marital)
+  expect_output(print(impute_rec))
+  expect_output(prep(impute_rec, training = credit_tr))
+})
diff --git a/tests/testthat/test_multivariate.R b/tests/testthat/test_multivariate.R
new file mode 100644
index 0000000..fb75313
--- /dev/null
+++ b/tests/testthat/test_multivariate.R
@@ -0,0 +1,28 @@
+library(tibble)
+library(recipes)
+data("biomass")
+
+
+test_that('multivariate outcome', {
+  raw_recipe <- recipe(carbon + hydrogen ~ oxygen + nitrogen + sulfur, data = biomass)
+  rec <- raw_recipe %>% 
+    step_center(all_outcomes()) %>%
+    step_scale(all_predictors())
+  
+  rec_trained <- prep(rec, training = biomass)
+  
+  results <- bake(rec_trained, head(biomass))
+  
+  exp_res <- biomass
+  
+  pred <- c("oxygen", "nitrogen", "sulfur")
+  outcome <- c("carbon", "hydrogen")
+  for(i in pred)
+    exp_res[,i] <- exp_res[,i]/sd(exp_res[,i])
+  for(i in outcome)
+    exp_res[,i] <- exp_res[,i]-mean(exp_res[,i])
+  
+  expect_equal(rec$term_info$variable[rec$term_info$role == "outcome"], outcome)
+  expect_equal(rec$term_info$variable[rec$term_info$role == "predictor"], pred)  
+  expect_equal(exp_res[1:6, colnames(results)], as.data.frame(results))
+})
diff --git a/tests/testthat/test_ns.R b/tests/testthat/test_ns.R
new file mode 100644
index 0000000..0952cfa
--- /dev/null
+++ b/tests/testthat/test_ns.R
@@ -0,0 +1,58 @@
+library(testthat)
+library(recipes)
+data(biomass)
+library(splines)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+test_that('correct basis functions', {
+  with_ns <- rec %>% 
+    step_ns(carbon, hydrogen)
+  
+  with_ns <- prep(with_ns, training = biomass_tr, verbose = FALSE)
+  
+  with_ns_pred_tr <- bake(with_ns, newdata = biomass_tr)
+  with_ns_pred_te <- bake(with_ns, newdata = biomass_te)
+  
+  carbon_ns_tr_exp <- ns(biomass_tr$carbon, df = 2)
+  hydrogen_ns_tr_exp <- ns(biomass_tr$hydrogen, df = 2)
+  carbon_ns_te_exp <- predict(carbon_ns_tr_exp, biomass_te$carbon)
+  hydrogen_ns_te_exp <- predict(hydrogen_ns_tr_exp, biomass_te$hydrogen)
+  
+  carbon_ns_tr_res <- as.matrix(with_ns_pred_tr[, grep("carbon", names(with_ns_pred_tr))])
+  colnames(carbon_ns_tr_res) <- NULL
+  hydrogen_ns_tr_res <- as.matrix(with_ns_pred_tr[, grep("hydrogen", names(with_ns_pred_tr))])
+  colnames(hydrogen_ns_tr_res) <- NULL
+  
+  carbon_ns_te_res <- as.matrix(with_ns_pred_te[, grep("carbon", names(with_ns_pred_te))])
+  colnames(carbon_ns_te_res) <- 1:ncol(carbon_ns_te_res)
+  hydrogen_ns_te_res <- as.matrix(with_ns_pred_te[, grep("hydrogen", names(with_ns_pred_te))])
+  colnames(hydrogen_ns_te_res) <- 1:ncol(hydrogen_ns_te_res)
+  
+  ## remove attributes
+  carbon_ns_tr_exp <- matrix(carbon_ns_tr_exp, ncol = 2)
+  carbon_ns_te_exp <- matrix(carbon_ns_te_exp, ncol = 2)
+  hydrogen_ns_tr_exp <- matrix(hydrogen_ns_tr_exp, ncol = 2)
+  hydrogen_ns_te_exp <- matrix(hydrogen_ns_te_exp, ncol = 2) 
+  dimnames(carbon_ns_tr_res) <- NULL
+  dimnames(carbon_ns_te_res) <- NULL  
+  dimnames(hydrogen_ns_tr_res) <- NULL
+  dimnames(hydrogen_ns_te_res) <- NULL  
+  
+  expect_equal(carbon_ns_tr_res, carbon_ns_tr_exp)
+  expect_equal(carbon_ns_te_res, carbon_ns_te_exp)
+  expect_equal(hydrogen_ns_tr_res, hydrogen_ns_tr_exp)
+  expect_equal(hydrogen_ns_te_res, hydrogen_ns_te_exp)  
+})
+
+
+test_that('printing', {
+  with_ns <- rec %>%  step_ns(carbon, hydrogen)
+  expect_output(print(with_ns))
+  expect_output(prep(with_ns, training = biomass_tr))
+})
+
diff --git a/tests/testthat/test_nzv.R b/tests/testthat/test_nzv.R
new file mode 100644
index 0000000..ee918bc
--- /dev/null
+++ b/tests/testthat/test_nzv.R
@@ -0,0 +1,58 @@
+library(testthat)
+library(recipes)
+
+n <- 50
+set.seed(424)
+dat <- data.frame(x1 = rnorm(n),
+                  x2 = rep(1:5, each = 10),
+                  x3 = factor(rep(letters[1:3], c(2, 2, 46))),
+                  x4 = 1,
+                  y = runif(n))
+
+
+ratios <- function(x) {
+  tab <- sort(table(x), decreasing = TRUE)
+  if(length(tab) > 1) 
+    tab[1]/tab[2] else Inf
+}
+
+pct_uni <- vapply(dat[, -5], function(x) length(unique(x)), c(val = 0))/nrow(dat)*100
+f_ratio <- vapply(dat[, -5], ratios, c(val = 0))
+vars <- names(pct_uni)
+
+test_that('nzv filtering', {
+  rec <- recipe(y ~ ., data = dat)
+  filtering <- rec %>% 
+    step_nzv(x1, x2, x3, x4)
+  
+  filtering_trained <- prep(filtering, training = dat, verbose = FALSE)
+  
+  removed <- vars[
+    pct_uni <= filtering_trained$steps[[1]]$options$unique_cut & 
+      f_ratio >= filtering_trained$steps[[1]]$options$freq_cut]
+  
+  expect_equal(filtering_trained$steps[[1]]$removals, removed)
+})
+
+test_that('altered options', {
+  rec <- recipe(y ~ ., data = dat)
+  filtering <- rec %>% 
+    step_nzv(x1, x2, x3, x4, 
+             options = list(freq_cut = 50, unique_cut = 10))
+  
+  filtering_trained <- prep(filtering, training = dat, verbose = FALSE)
+  
+  removed <- vars[
+    pct_uni <= filtering_trained$steps[[1]]$options$unique_cut & 
+      f_ratio >= filtering_trained$steps[[1]]$options$freq_cut]
+  
+  expect_equal(filtering_trained$steps[[1]]$removals, removed)
+})
+
+
+test_that('printing', {
+  rec <- recipe(y ~ ., data = dat) %>%
+    step_nzv(x1, x2, x3, x4)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = dat))
+})
diff --git a/tests/testthat/test_ordinalscore.R b/tests/testthat/test_ordinalscore.R
new file mode 100644
index 0000000..137eead
--- /dev/null
+++ b/tests/testthat/test_ordinalscore.R
@@ -0,0 +1,72 @@
+library(testthat)
+library(recipes)
+
+n <- 20
+
+set.seed(752)
+ex_dat <- data.frame(
+  numbers = rnorm(n),
+  fact = factor(sample(letters[1:3], n, replace = TRUE)),
+  ord1 = factor(sample(LETTERS[1:3], n, replace = TRUE),
+               ordered = TRUE),
+  ord2 = factor(sample(LETTERS[4:8], n, replace = TRUE),
+                ordered = TRUE),
+  ord3 = factor(sample(LETTERS[10:20], n, replace = TRUE),
+                ordered = TRUE)
+)
+
+ex_miss <- ex_dat
+ex_miss$ord1[c(1, 5, 9)] <- NA
+ex_miss$ord3[2] <- NA
+
+score <- function(x) as.numeric(x)^2
+
+test_that('linear scores', {
+  rec1 <- recipe(~ ., data = ex_dat) %>%
+    step_ordinalscore(starts_with("ord"))
+  rec1 <- prep(rec1, training = ex_dat, retain = TRUE, 
+                  stringsAsFactors = FALSE, verbose = FALSE)
+  rec1_scores <- bake(rec1, newdata = ex_dat)
+  rec1_scores_NA <- bake(rec1, newdata = ex_miss)
+  
+  expect_equal(as.numeric(ex_dat$ord1), rec1_scores$ord1)
+  expect_equal(as.numeric(ex_dat$ord2), rec1_scores$ord2)
+  expect_equal(as.numeric(ex_dat$ord3), rec1_scores$ord3)
+  
+  expect_equal(as.numeric(ex_miss$ord1), rec1_scores_NA$ord1)
+  expect_equal(as.numeric(ex_miss$ord3), rec1_scores_NA$ord3)
+})
+
+test_that('nonlinear scores', {
+  rec2 <- recipe(~ ., data = ex_dat) %>%
+    step_ordinalscore(starts_with("ord"), 
+                      convert = score)
+  rec2 <- prep(rec2, training = ex_dat, retain = TRUE, 
+                  stringsAsFactors = FALSE, verbose = FALSE)
+  rec2_scores <- bake(rec2, newdata = ex_dat)
+  rec2_scores_NA <- bake(rec2, newdata = ex_miss)
+  
+  expect_equal(as.numeric(ex_dat$ord1)^2, rec2_scores$ord1)
+  expect_equal(as.numeric(ex_dat$ord2)^2, rec2_scores$ord2)
+  expect_equal(as.numeric(ex_dat$ord3)^2, rec2_scores$ord3)
+  
+  expect_equal(as.numeric(ex_miss$ord1)^2, rec2_scores_NA$ord1)
+  expect_equal(as.numeric(ex_miss$ord3)^2, rec2_scores_NA$ord3)
+})
+
+test_that('bad spec', {
+  rec3 <- recipe(~ ., data = ex_dat) %>%
+    step_ordinalscore(everything())
+  expect_error(prep(rec3, training = ex_dat, verbose = FALSE))
+  rec4 <- recipe(~ ., data = ex_dat) 
+  expect_error(rec4 %>% step_ordinalscore())  
+})
+
+
+test_that('printing', {
+  rec5 <- recipe(~ ., data = ex_dat) %>%
+    step_ordinalscore(starts_with("ord"))
+  expect_output(print(rec5))
+  expect_output(prep(rec5, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_other.R b/tests/testthat/test_other.R
new file mode 100644
index 0000000..5e7935f
--- /dev/null
+++ b/tests/testthat/test_other.R
@@ -0,0 +1,135 @@
+library(testthat)
+library(recipes)
+
+data(okc)
+
+set.seed(19)
+in_train <- sample(1:nrow(okc), size = 30000)
+
+okc_tr <- okc[ in_train,]
+okc_te <- okc[-in_train,]
+
+rec <- recipe(~ diet + location, data = okc_tr)
+
+test_that('default inputs', {
+  others <- rec %>% step_other(diet, location)
+  others <- prep(others, training = okc_tr)
+  others_te <- bake(others, newdata = okc_te)
+  
+  diet_props <- table(okc_tr$diet)/sum(!is.na(okc_tr$diet))
+  diet_props <- sort(diet_props, decreasing = TRUE)
+  diet_levels <- names(diet_props)[diet_props >= others$step[[1]]$threshold]
+  for(i in diet_levels)
+    expect_equal(sum(others_te$diet == i, na.rm =TRUE), 
+                 sum(okc_te$diet == i, na.rm =TRUE))
+  
+  diet_levels <- c(diet_levels, others$step[[1]]$objects[["diet"]]$other)
+  expect_true(all(levels(others_te$diet) %in% diet_levels))
+  expect_true(all(diet_levels %in% levels(others_te$diet)))
+  
+  location_props <- table(okc_tr$location)/sum(!is.na(okc_tr$location))
+  location_props <- sort(location_props, decreasing = TRUE)
+  location_levels <- names(location_props)[location_props >= others$step[[1]]$threshold]
+  for(i in location_levels)
+    expect_equal(sum(others_te$location == i, na.rm =TRUE), 
+                 sum(okc_te$location == i, na.rm =TRUE))
+  
+  location_levels <- c(location_levels, others$step[[1]]$objects[["location"]]$other)
+  expect_true(all(levels(others_te$location) %in% location_levels))
+  expect_true(all(location_levels %in% levels(others_te$location)))
+
+  expect_equal(is.na(okc_te$diet), is.na(others_te$diet))
+  expect_equal(is.na(okc_te$location), is.na(others_te$location))
+})
+
+
+test_that('high threshold - much removals', {
+  others <- rec %>% step_other(diet, location, threshold = .5)
+  others <- prep(others, training = okc_tr)
+  others_te <- bake(others, newdata = okc_te)
+  
+  diet_props <- table(okc_tr$diet)
+  diet_levels <- others$steps[[1]]$objects$diet$keep
+  for(i in diet_levels)
+    expect_equal(sum(others_te$diet == i, na.rm =TRUE), 
+                 sum(okc_te$diet == i, na.rm =TRUE))
+  
+  diet_levels <- c(diet_levels, others$step[[1]]$objects[["diet"]]$other)
+  expect_true(all(levels(others_te$diet) %in% diet_levels))
+  expect_true(all(diet_levels %in% levels(others_te$diet)))
+  
+  location_props <- table(okc_tr$location)
+  location_levels <- others$steps[[1]]$objects$location$keep
+  for(i in location_levels)
+    expect_equal(sum(others_te$location == i, na.rm =TRUE), 
+                 sum(okc_te$location == i, na.rm =TRUE))
+  
+  location_levels <- c(location_levels, others$step[[1]]$objects[["location"]]$other)
+  expect_true(all(levels(others_te$location) %in% location_levels))
+  expect_true(all(location_levels %in% levels(others_te$location)))
+
+  expect_equal(is.na(okc_te$diet), is.na(others_te$diet))
+  expect_equal(is.na(okc_te$location), is.na(others_te$location))
+})
+
+
+test_that('low threshold - no removals', {
+  others <- rec %>% step_other(diet, location, threshold = 10^-10)
+  others <- prep(others, training = okc_tr, stringsAsFactors = FALSE)
+  others_te <- bake(others, newdata = okc_te)
+  
+  expect_equal(others$steps[[1]]$objects$diet$collapse, FALSE)
+  expect_equal(others$steps[[1]]$objects$location$collapse, FALSE)
+  
+  expect_equal(okc_te$diet, others_te$diet)
+  expect_equal(okc_te$location, others_te$location)
+})
+
+
+test_that('factor inputs', {
+  
+  okc$diet <- as.factor(okc$diet)
+  okc$location <- as.factor(okc$location)
+  
+  okc_tr <- okc[ in_train,]
+  okc_te <- okc[-in_train,]
+  
+  rec <- recipe(~ diet + location, data = okc_tr)
+  
+  others <- rec %>% step_other(diet, location)
+  others <- prep(others, training = okc_tr)
+  others_te <- bake(others, newdata = okc_te)
+  
+  diet_props <- table(okc_tr$diet)/sum(!is.na(okc_tr$diet))
+  diet_props <- sort(diet_props, decreasing = TRUE)
+  diet_levels <- names(diet_props)[diet_props >= others$step[[1]]$threshold]
+  for(i in diet_levels)
+    expect_equal(sum(others_te$diet == i, na.rm =TRUE), 
+                 sum(okc_te$diet == i, na.rm =TRUE))
+  
+  diet_levels <- c(diet_levels, others$step[[1]]$objects[["diet"]]$other)
+  expect_true(all(levels(others_te$diet) %in% diet_levels))
+  expect_true(all(diet_levels %in% levels(others_te$diet)))
+  
+  location_props <- table(okc_tr$location)/sum(!is.na(okc_tr$location))
+  location_props <- sort(location_props, decreasing = TRUE)
+  location_levels <- names(location_props)[location_props >= others$step[[1]]$threshold]
+  for(i in location_levels)
+    expect_equal(sum(others_te$location == i, na.rm =TRUE), 
+                 sum(okc_te$location == i, na.rm =TRUE))
+  
+  location_levels <- c(location_levels, others$step[[1]]$objects[["location"]]$other)
+  expect_true(all(levels(others_te$location) %in% location_levels))
+  expect_true(all(location_levels %in% levels(others_te$location)))
+  
+  expect_equal(is.na(okc_te$diet), is.na(others_te$diet))
+  expect_equal(is.na(okc_te$location), is.na(others_te$location))
+})
+
+
+test_that('printing', {
+  rec <- rec %>% step_other(diet, location)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = okc_tr))
+})
+
diff --git a/tests/testthat/test_pca.R b/tests/testthat/test_pca.R
new file mode 100644
index 0000000..074f776
--- /dev/null
+++ b/tests/testthat/test_pca.R
@@ -0,0 +1,74 @@
+library(testthat)
+library(recipes)
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+test_that('correct PCA values', {
+  pca_extract <- rec %>% 
+    step_center(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% 
+    step_scale(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>%
+    step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur, 
+             options = list(retx = TRUE))
+  
+  pca_extract_trained <- prep(pca_extract, training = biomass_tr, verbose = FALSE)
+  
+  pca_pred <- bake(pca_extract_trained, newdata = biomass_te)
+  pca_pred <- as.matrix(pca_pred)
+  
+  pca_exp <- prcomp(biomass_tr[, 3:7], center = TRUE, scale. = TRUE, retx = TRUE)
+  pca_pred_exp <- predict(pca_exp, biomass_te[, 3:7])[, 1:pca_extract$steps[[3]]$num]
+  
+  rownames(pca_pred) <- NULL
+  rownames(pca_pred_exp) <- NULL
+  
+  expect_equal(pca_pred, pca_pred_exp)
+})
+
+test_that('correct PCA values with threshold', {
+  pca_extract <- rec %>% 
+    step_center(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% 
+    step_scale(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>%
+    step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur, threshold = .5)
+  
+  pca_extract_trained <- prep(pca_extract, training = biomass_tr, verbose = FALSE)
+  pca_exp <- prcomp(biomass_tr[, 3:7], center = TRUE, scale. = TRUE, retx = TRUE)
+  # cumsum(pca_exp$sdev^2)/sum(pca_exp$sdev^2)
+
+  expect_equal(pca_extract_trained$steps[[3]]$num, 2)
+})
+
+
+test_that('Reduced rotation size', {
+  pca_extract <- rec %>% 
+    step_center(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% 
+    step_scale(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>%
+    step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur, num = 3)
+  
+  pca_extract_trained <- prep(pca_extract, training = biomass_tr, verbose = FALSE)
+  
+  pca_pred <- bake(pca_extract_trained, newdata = biomass_te)
+  pca_pred <- as.matrix(pca_pred)
+  
+  pca_exp <- prcomp(biomass_tr[, 3:7], center = TRUE, scale. = TRUE, retx = TRUE)
+  pca_pred_exp <- predict(pca_exp, biomass_te[, 3:7])[, 1:3]
+  rownames(pca_pred_exp) <- NULL
+  
+  rownames(pca_pred) <- NULL
+  rownames(pca_pred_exp) <- NULL
+  
+  expect_equal(pca_pred, pca_pred_exp)
+})
+
+
+test_that('printing', {
+  pca_extract <- rec %>% 
+    step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur)
+  expect_output(print(pca_extract))
+  expect_output(prep(pca_extract, training = biomass_tr))
+})
+
diff --git a/tests/testthat/test_poly.R b/tests/testthat/test_poly.R
new file mode 100644
index 0000000..5f40012
--- /dev/null
+++ b/tests/testthat/test_poly.R
@@ -0,0 +1,58 @@
+library(testthat)
+library(recipes)
+data(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass_tr)
+
+test_that('correct basis functions', {
+  with_poly <- rec %>% 
+    step_poly(carbon, hydrogen)
+  
+  with_poly <- prep(with_poly, training = biomass_tr, verbose = FALSE)
+  
+  with_poly_pred_tr <- bake(with_poly, newdata = biomass_tr)
+  with_poly_pred_te <- bake(with_poly, newdata = biomass_te)
+  
+  carbon_poly_tr_exp <- poly(biomass_tr$carbon, degree = 2)
+  hydrogen_poly_tr_exp <- poly(biomass_tr$hydrogen, degree = 2)
+  carbon_poly_te_exp <- predict(carbon_poly_tr_exp, biomass_te$carbon)
+  hydrogen_poly_te_exp <- predict(hydrogen_poly_tr_exp, biomass_te$hydrogen)
+  
+  carbon_poly_tr_res <- as.matrix(with_poly_pred_tr[, grep("carbon", names(with_poly_pred_tr))])
+  colnames(carbon_poly_tr_res) <- NULL
+  hydrogen_poly_tr_res <- as.matrix(with_poly_pred_tr[, grep("hydrogen", names(with_poly_pred_tr))])
+  colnames(hydrogen_poly_tr_res) <- NULL
+  
+  carbon_poly_te_res <- as.matrix(with_poly_pred_te[, grep("carbon", names(with_poly_pred_te))])
+  colnames(carbon_poly_te_res) <- 1:ncol(carbon_poly_te_res)
+  hydrogen_poly_te_res <- as.matrix(with_poly_pred_te[, grep("hydrogen", names(with_poly_pred_te))])
+  colnames(hydrogen_poly_te_res) <- 1:ncol(hydrogen_poly_te_res)
+  
+  ## remove attributes
+  carbon_poly_tr_exp <- matrix(carbon_poly_tr_exp, ncol = 2)
+  carbon_poly_te_exp <- matrix(carbon_poly_te_exp, ncol = 2)
+  hydrogen_poly_tr_exp <- matrix(hydrogen_poly_tr_exp, ncol = 2)
+  hydrogen_poly_te_exp <- matrix(hydrogen_poly_te_exp, ncol = 2) 
+  dimnames(carbon_poly_tr_res) <- NULL
+  dimnames(carbon_poly_te_res) <- NULL  
+  dimnames(hydrogen_poly_tr_res) <- NULL
+  dimnames(hydrogen_poly_te_res) <- NULL  
+  
+  expect_equal(carbon_poly_tr_res, carbon_poly_tr_exp)
+  expect_equal(carbon_poly_te_res, carbon_poly_te_exp)
+  expect_equal(hydrogen_poly_tr_res, hydrogen_poly_tr_exp)
+  expect_equal(hydrogen_poly_te_res, hydrogen_poly_te_exp)  
+})
+
+
+test_that('printing', {
+  with_poly <- rec %>% 
+    step_poly(carbon, hydrogen)
+  expect_output(print(with_poly))
+  expect_output(prep(with_poly, training = biomass_tr))
+})
+
diff --git a/tests/testthat/test_range.R b/tests/testthat/test_range.R
new file mode 100644
index 0000000..26feb7e
--- /dev/null
+++ b/tests/testthat/test_range.R
@@ -0,0 +1,105 @@
+library(testthat)
+library(recipes)
+data(biomass)
+
+biomass_tr <- biomass[1:10,]
+biomass_te <- biomass[c(13:14, 19, 522),]
+
+rec <- recipe(HHV ~ carbon + hydrogen,
+              data = biomass_tr)
+
+test_that('correct values', {
+  standardized <- rec %>% 
+    step_range(carbon, hydrogen, min = -12) 
+  
+  standardized_trained <- prep(standardized, training = biomass_tr, verbose = FALSE)
+  
+  obs_pred <- bake(standardized_trained, newdata = biomass_te)
+  obs_pred <- as.matrix(obs_pred)
+  
+  mins <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, min)
+  maxs <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, max)  
+  
+  new_min <- -12
+  new_max <- 1
+  new_range <- new_max - new_min
+  
+  carb <- ((new_range * (biomass_te$carbon - mins["carbon"])) / 
+           (maxs["carbon"] - mins["carbon"])) + new_min
+  carb <- ifelse(carb > new_max, new_max, carb)
+  carb <- ifelse(carb < new_min, new_min, carb)  
+  
+  hydro <- ((new_range * (biomass_te$hydrogen - mins["hydrogen"])) / 
+              (maxs["hydrogen"] - mins["hydrogen"])) + new_min
+  hydro <- ifelse(hydro > new_max, new_max, hydro)
+  hydro <- ifelse(hydro < new_min, new_min, hydro)  
+  
+  exp_pred <- cbind(carb, hydro)
+  colnames(exp_pred) <- c("carbon", "hydrogen")
+  expect_equal(exp_pred, obs_pred)
+})
+
+
+test_that('defaults', {
+  standardized <- rec %>% 
+    step_range(carbon, hydrogen) 
+  
+  standardized_trained <- prep(standardized, training = biomass_tr, verbose = FALSE)
+  
+  obs_pred <- bake(standardized_trained, newdata = biomass_te)
+  obs_pred <- as.matrix(obs_pred)
+  
+  mins <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, min)
+  maxs <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, max)  
+  
+  new_min <- 0
+  new_max <- 1
+  new_range <- new_max - new_min
+  
+  carb <- ((new_range * (biomass_te$carbon - mins["carbon"])) / 
+             (maxs["carbon"] - mins["carbon"])) + new_min
+  carb <- ifelse(carb > new_max, new_max, carb)
+  carb <- ifelse(carb < new_min, new_min, carb)  
+  
+  hydro <- ((new_range * (biomass_te$hydrogen - mins["hydrogen"])) / 
+              (maxs["hydrogen"] - mins["hydrogen"])) + new_min
+  hydro <- ifelse(hydro > new_max, new_max, hydro)
+  hydro <- ifelse(hydro < new_min, new_min, hydro)  
+  
+  exp_pred <- cbind(carb, hydro)
+  colnames(exp_pred) <- c("carbon", "hydrogen")
+  expect_equal(exp_pred, obs_pred)
+})
+
+
+test_that('one variable', {
+  standardized <- rec %>% 
+    step_range(carbon) 
+  
+  standardized_trained <- prep(standardized, training = biomass_tr, verbose = FALSE)
+  
+  obs_pred <- bake(standardized_trained, newdata = biomass_te)
+
+  mins <- min(biomass_tr$carbon)
+  maxs <- max(biomass_tr$carbon)
+  
+  new_min <- 0
+  new_max <- 1
+  new_range <- new_max - new_min
+  
+  carb <- ((new_range * (biomass_te$carbon - mins)) / 
+             (maxs - mins)) + new_min
+  carb <- ifelse(carb > new_max, new_max, carb)
+  carb <- ifelse(carb < new_min, new_min, carb)  
+
+  expect_equal(carb, obs_pred$carbon)
+})
+
+
+test_that('printing', {
+  standardized <- rec %>% 
+    step_range(carbon, hydrogen, min = -12) 
+  expect_output(print(standardized))
+  expect_output(prep(standardized, training = biomass_tr))
+})
+
diff --git a/tests/testthat/test_ratio.R b/tests/testthat/test_ratio.R
new file mode 100644
index 0000000..727d977
--- /dev/null
+++ b/tests/testthat/test_ratio.R
@@ -0,0 +1,96 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+ex_dat <- data.frame(
+  x1 = -1:8,
+  x2 = 1,
+  x3 = c(1:9, NA),
+  x4 = 11:20,
+  x5 = letters[1:10]
+)
+
+rec <- recipe( ~ x1 + x2 + x3 + x4 + x5, data = ex_dat)
+
+test_that('1:many', {
+  rec1 <- rec %>% 
+    step_ratio(x1, denom = denom_vars(all_numeric()))
+  rec1 <- prep(rec1, ex_dat, verbose = FALSE)
+  obs1 <- bake(rec1, ex_dat)
+  res1 <- tibble(
+    x1_o_x2   = ex_dat$x1/ex_dat$x2,
+    x1_o_x3   = ex_dat$x1/ex_dat$x3,
+    x1_o_x4   = ex_dat$x1/ex_dat$x4
+  )
+  for(i in names(res1)) 
+    expect_equal(res1[i], obs1[i])
+})
+
+
+test_that('many:1', {
+  rec2 <- rec %>% 
+    step_ratio(all_numeric(), denom = denom_vars(x1))
+  rec2 <- prep(rec2, ex_dat, verbose = FALSE)
+  obs2 <- bake(rec2, ex_dat)
+  res2 <- tibble(
+    x2_o_x1   = ex_dat$x2/ex_dat$x1,
+    x3_o_x1   = ex_dat$x3/ex_dat$x1,
+    x4_o_x1   = ex_dat$x4/ex_dat$x1
+  )
+  for(i in names(res2)) 
+    expect_equal(res2[i], obs2[i])
+})
+
+
+test_that('many:many', {
+  rec3 <- rec %>% 
+    step_ratio(all_numeric(), denom = denom_vars(all_numeric()))
+  rec3 <- prep(rec3, ex_dat, verbose = FALSE)
+  obs3 <- bake(rec3, ex_dat)
+  res3 <- tibble(
+    x2_o_x1   = ex_dat$x2/ex_dat$x1,
+    x3_o_x1   = ex_dat$x3/ex_dat$x1,
+    x4_o_x1   = ex_dat$x4/ex_dat$x1,
+
+    x1_o_x2   = ex_dat$x1/ex_dat$x2,
+    x3_o_x2   = ex_dat$x3/ex_dat$x2,
+    x4_o_x2   = ex_dat$x4/ex_dat$x2,
+
+    x1_o_x3   = ex_dat$x1/ex_dat$x3,
+    x2_o_x3   = ex_dat$x2/ex_dat$x3,
+    x4_o_x3   = ex_dat$x4/ex_dat$x3,   
+    
+    x1_o_x4   = ex_dat$x1/ex_dat$x4,
+    x2_o_x4   = ex_dat$x2/ex_dat$x4,
+    x3_o_x4   = ex_dat$x3/ex_dat$x4
+  )
+  for(i in names(res3)) 
+    expect_equal(res3[i], obs3[i])
+})
+
+
+
+test_that('wrong type', {
+  rec4 <- rec %>% 
+    step_ratio(x1, denom = denom_vars(all_predictors()))
+  expect_error(prep(rec4, ex_dat, verbose = FALSE))
+
+  rec5 <- rec %>% 
+    step_ratio(all_predictors(), denom = denom_vars(x1))
+  expect_error(prep(rec5, ex_dat, verbose = FALSE))
+  
+  rec6 <- rec %>% 
+    step_ratio(all_predictors(), denom = denom_vars(all_predictors()))
+  expect_error(prep(rec6, ex_dat, verbose = FALSE))  
+})
+
+
+test_that('printing', {
+  rec3 <- rec %>% 
+    step_ratio(all_numeric(), denom = denom_vars(all_numeric()))
+  expect_output(print(rec3))
+  expect_output(prep(rec3, training = ex_dat))
+})
+
+
diff --git a/tests/testthat/test_regex.R b/tests/testthat/test_regex.R
new file mode 100644
index 0000000..d2b7b27
--- /dev/null
+++ b/tests/testthat/test_regex.R
@@ -0,0 +1,47 @@
+library(testthat)
+library(recipes)
+
+data(covers)
+covers$rows <- 1:nrow(covers)
+covers$ch_rows <- paste(1:nrow(covers))
+
+rec <- recipe(~ description + rows + ch_rows, covers)
+
+test_that('default options', {
+  rec1 <- rec %>%
+    step_regex(description, pattern = "(rock|stony)") %>%
+    step_regex(description, result = "all ones")
+  rec1 <- prep(rec1, training = covers)
+  res1 <- bake(rec1, newdata = covers)
+  expect_equal(res1$X.rock.stony., 
+               as.numeric(grepl("(rock|stony)", covers$description)))
+  expect_equal(res1$`all ones`, rep(1, nrow(covers)))
+})
+
+
+test_that('nondefault options', {
+  rec2 <- rec %>%
+    step_regex(description, pattern = "(rock|stony)", 
+               result = "rocks",
+               options = list(fixed = TRUE)) 
+  rec2 <- prep(rec2, training = covers)
+  res2 <- bake(rec2, newdata = covers)
+  expect_equal(res2$rocks, rep(0, nrow(covers)))
+})
+
+
+test_that('bad selector(s)', {
+  expect_error(rec %>% step_regex(description, rows, pattern = "(rock|stony)"))
+  rec3 <- rec %>% step_regex(starts_with("b"), pattern = "(rock|stony)")
+  expect_error(prep(rec3, training = covers))
+  rec4 <- rec %>% step_regex(rows, pattern = "(rock|stony)")
+  expect_error(prep(rec4, training = covers))
+})
+
+
+test_that('printing', {
+  rec1 <- rec %>%
+    step_regex(description, pattern = "(rock|stony)")
+  expect_output(print(rec1))
+  expect_output(prep(rec1, training = covers))
+})
diff --git a/tests/testthat/test_retraining.R b/tests/testthat/test_retraining.R
new file mode 100644
index 0000000..0aaf25d
--- /dev/null
+++ b/tests/testthat/test_retraining.R
@@ -0,0 +1,27 @@
+context("Testing retraining")
+
+data(biomass)
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass)
+
+test_that('training in stages', {
+  skip_on_cran()
+  at_once <- rec %>% 
+    step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) %>% 
+    step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) 
+  
+  at_once_trained <- prep(at_once, training = biomass, verbose = FALSE)
+  
+  ## not train in stages
+  center_first <- rec %>% 
+    step_center(carbon, hydrogen, oxygen, nitrogen, sulfur)
+  center_first_trained <- prep(center_first, training = biomass, verbose = FALSE)
+  in_stages <- center_first_trained %>%
+    step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) 
+  in_stages_trained <- prep(in_stages, training = biomass, verbose = FALSE)
+  in_stages_retrained <- prep(in_stages, training = biomass, verbose = FALSE, fresh = TRUE) 
+  
+  expect_equal(at_once_trained, in_stages_trained)
+  expect_equal(at_once_trained, in_stages_retrained)
+})
diff --git a/tests/testthat/test_rm.R b/tests/testthat/test_rm.R
new file mode 100644
index 0000000..3b00d8c
--- /dev/null
+++ b/tests/testthat/test_rm.R
@@ -0,0 +1,34 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+set.seed(12)
+ex_dat <- data.frame(x1 = rnorm(n),
+                     x2 = runif(n))
+
+test_that('simple logit trans', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_rm(x1)
+  
+  rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+  rec_rm <- bake(rec_trained, newdata = ex_dat)
+  
+  expect_equal(colnames(rec_rm), "x2")
+})
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_rm(x1)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_rm(x1)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_roles.R b/tests/testthat/test_roles.R
new file mode 100644
index 0000000..f126e62
--- /dev/null
+++ b/tests/testthat/test_roles.R
@@ -0,0 +1,36 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+data(biomass)
+
+test_that('default method', {
+  rec <- recipe(x = biomass)
+  exp_res <- tibble(variable = colnames(biomass),
+                    type = rep(c("nominal", "numeric"), c(2, 6)),
+                    role = NA,
+                    source = "original")
+  expect_equal(summary(rec, TRUE), exp_res)
+})
+
+test_that('changing roles', {
+  rec <- recipe(x = biomass)
+  rec <- add_role(rec, sample, new_role = "some other role")
+  exp_res <- tibble(variable = colnames(biomass),
+                    type = rep(c("nominal", "numeric"), c(2, 6)),
+                    role = rep(c("some other role", NA), c(1, 7)),
+                    source = "original")
+  expect_equal(summary(rec, TRUE), exp_res)
+})
+
+test_that('change existing role', {
+  rec <- recipe(x = biomass)
+  rec <- add_role(rec, sample, new_role = "some other role")
+  rec <- add_role(rec, sample, new_role = "other other role")
+  exp_res <- tibble(variable = colnames(biomass),
+                    type = rep(c("nominal", "numeric"), c(2, 6)),
+                    role = rep(c("other other role", NA), c(1, 7)),
+                    source = "original")
+  expect_equal(summary(rec, TRUE), exp_res)
+})
+
diff --git a/tests/testthat/test_roll.R b/tests/testthat/test_roll.R
new file mode 100644
index 0000000..93260fb
--- /dev/null
+++ b/tests/testthat/test_roll.R
@@ -0,0 +1,75 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+set.seed(5522)
+sim_dat <- data.frame(x1 = (20:100) / 10)
+n <- nrow(sim_dat)
+sim_dat$y1 <- sin(sim_dat$x1) + rnorm(n, sd = 0.1)
+sim_dat$y2 <- cos(sim_dat$x1) + rnorm(n, sd = 0.1)
+sim_dat$x2 <- runif(n)
+sim_dat$x3 <- rnorm(n)
+sim_dat$fac <- sample(letters[1:3], size = n, replace = TRUE)
+
+rec <- recipe( ~ ., data = sim_dat)
+
+test_that('error checks', {
+  
+  expect_error(rec %>% step_window())
+  expect_error(rec %>% step_window(y1, size = 6))
+  expect_error(rec %>% step_window(y1, size = NA))
+  expect_error(rec %>% step_window(y1, size = NULL))
+  expect_error(rec %>% step_window(y1, statistic = "average"))
+  expect_error(rec %>% step_window(y1, size = 1)) 
+  expect_error(rec %>% step_window(y1, size = 2)) 
+  expect_error(rec %>% step_window(y1, size = -1))
+  expect_warning(rec %>% step_window(y1, size = pi)) 
+  expect_error(prep(rec %>% step_window(fac), training = sim_dat)) 
+  expect_error(prep(rec %>% step_window(y1, size = 1000L), training = sim_dat))   
+  bad_names <- rec %>%
+    step_window(starts_with("y"), names = "only_one_name")
+  expect_error(prep(bad_names, training = sim_dat))
+  
+})
+
+test_that('basic moving average', {
+  simple_ma <- rec %>%
+    step_window(starts_with("y"))
+  simple_ma <- prep(simple_ma, training = sim_dat)
+  simple_ma_res <- bake(simple_ma, newdata = sim_dat)
+  expect_equal(names(sim_dat), names(simple_ma_res))
+  
+  for (i in 2:(n - 1)) {
+    expect_equal(simple_ma_res$y1[i], mean(sim_dat$y1[(i - 1):(i + 1)]))
+    expect_equal(simple_ma_res$y2[i], mean(sim_dat$y2[(i - 1):(i + 1)]))
+  }
+  expect_equal(simple_ma_res$y1[1], mean(sim_dat$y1[1:3]))
+  expect_equal(simple_ma_res$y2[1], mean(sim_dat$y2[1:3]))
+  expect_equal(simple_ma_res$y1[n], mean(sim_dat$y1[(n - 2):n]))
+  expect_equal(simple_ma_res$y2[n], mean(sim_dat$y2[(n - 2):n]))  
+
+})
+
+test_that('creating new variables', {
+  new_names <- rec %>%
+    step_window(starts_with("y"), names = paste0("new", 1:2), role = "predictor")
+  new_names <- prep(new_names, training = sim_dat)
+  new_names_res <- bake(new_names, newdata = sim_dat)
+  
+  simple_ma <- rec %>%
+    step_window(starts_with("y"))
+  simple_ma <- prep(simple_ma, training = sim_dat)
+  simple_ma_res <- bake(simple_ma, newdata = sim_dat)
+  
+  expect_equal(new_names_res$new1, simple_ma_res$y1)
+  expect_equal(new_names_res$new2, simple_ma_res$y2)  
+})
+
+test_that('printing', {
+  new_names <- rec %>%
+    step_window(starts_with("y"), names = paste0("new", 1:2), role = "predictor")
+  expect_output(print(new_names))
+  expect_output(prep(new_names, training = sim_dat))
+})
+
+
diff --git a/tests/testthat/test_select_terms.R b/tests/testthat/test_select_terms.R
new file mode 100644
index 0000000..51f1768
--- /dev/null
+++ b/tests/testthat/test_select_terms.R
@@ -0,0 +1,106 @@
+library(testthat)
+library(recipes)
+library(tibble)
+library(tidyselect)
+library(rlang)
+
+data(okc)
+rec1 <- recipe(~ ., data = okc)
+info1 <- summary(rec1)
+
+data(biomass)
+rec2 <- recipe(biomass) %>%
+  add_role(carbon, hydrogen, oxygen, nitrogen, sulfur,
+           new_role = "predictor") %>%
+  add_role(HHV, new_role = "outcome") %>%
+  add_role(sample, new_role = "id variable") %>%
+  add_role(dataset, new_role = "splitting indicator")
+info2 <- summary(rec2)
+
+test_that('simple role selections', {
+  expect_equal(
+    terms_select(info = info1, quos(all_predictors())),
+    info1$variable
+  )
+  expect_error(terms_select(info = info1, quos(all_outcomes())))
+  expect_equal(
+    terms_select(info = info2, quos(all_outcomes())),
+    "HHV"
+  )
+  expect_equal(
+    terms_select(info = info2, quos(has_role("splitting indicator"))),
+    "dataset"
+  )
+})
+
+test_that('simple type selections', {
+  expect_equal(
+    terms_select(info = info1, quos(all_numeric())),
+    c("age", "height")
+  )
+  expect_equal(
+    terms_select(info = info1, quos(has_type("date"))),
+    "date"
+  )
+  expect_equal(
+    terms_select(info = info1, quos(all_nominal())),
+    c("diet", "location")
+  )
+})
+
+
+test_that('simple name selections', {
+  expect_equal(
+    terms_select(info = info1, quos(matches("e$"))),
+    c("age", "date")
+  )
+  expect_equal(
+    terms_select(info = info2, quos(contains("gen"))),
+    c("hydrogen", "oxygen", "nitrogen")
+  )
+  expect_equal(
+    terms_select(info = info2, quos(contains("gen"), -nitrogen)),
+    c("hydrogen", "oxygen")
+  )
+  expect_equal(
+    terms_select(info = info1, quos(date, age)),
+    c("date", "age")
+  )
+  ## This is weird but consistent with `dplyr::select_vars`
+  expect_equal(
+    terms_select(info = info1, quos(-age, date)),
+    c("diet", "height", "location", "date")
+  )
+  expect_equal(
+    terms_select(info = info1, quos(date, -age)),
+    "date"
+  )
+  expect_error(terms_select(info = info1, quos(log(date))))
+  expect_error(terms_select(info = info1, quos(date:age)))
+  expect_error(terms_select(info = info1, quos(I(date:age))))
+  expect_error(terms_select(info = info1, quos(matches("blahblahblah"))))
+  expect_error(terms_select(info = info1))
+})
+
+
+test_that('combinations', {
+  expect_equal(
+    terms_select(info = info2, quos(matches("[hH]"), -all_outcomes())),
+    "hydrogen"
+  )
+  expect_equal(
+    terms_select(info = info2, quos(all_numeric(), -all_predictors())),
+    "HHV"
+  )
+  expect_equal(
+    terms_select(info = info2, quos(all_numeric(), -all_predictors(), dataset)),
+    c("HHV", "dataset")
+  )
+  expect_equal(
+    terms_select(info = info2, quos(all_numeric(), -all_predictors(), dataset, -dataset)),
+    "HHV"
+  )
+})
+
+
+
diff --git a/tests/testthat/test_shuffle.R b/tests/testthat/test_shuffle.R
new file mode 100644
index 0000000..ceca18c
--- /dev/null
+++ b/tests/testthat/test_shuffle.R
@@ -0,0 +1,74 @@
+library(testthat)
+library(recipes)
+
+n <- 50
+set.seed(424)
+dat <- data.frame(
+  x1 = sort(rnorm(n)),
+  x2 = sort(rep(1:5, each = 10)),
+  x3 = sort(factor(rep(letters[1:3], c(2, 2, 46)))),
+  x4 = 1,
+  y = sort(runif(n))
+  )
+
+test_that('numeric data', {
+  rec1 <- recipe(y ~ ., data = dat) %>%
+    step_shuffle(all_numeric())
+  
+  rec1 <- prep(rec1, training = dat, verbose = FALSE)
+  set.seed(7046)
+  dat1 <- bake(rec1, dat)
+  exp1 <- c(FALSE, FALSE, TRUE, TRUE)
+  obs1 <- rep(NA, 4)
+  for (i in 1:ncol(dat1))
+    obs1[i] <-
+    isTRUE(all.equal(dat[, i], getElement(dat1, names(dat)[i])))
+  expect_equal(exp1, obs1)
+})
+
+test_that('nominal data', {
+  rec2 <- recipe(y ~ ., data = dat) %>%
+    step_shuffle(all_nominal())
+  
+  rec2 <- prep(rec2, training = dat, verbose = FALSE)
+  set.seed(804)
+  dat2 <- bake(rec2, dat)
+  exp2 <- c(TRUE, TRUE, FALSE, TRUE)
+  obs2 <- rep(NA, 4)
+  for (i in 1:ncol(dat2))
+    obs2[i] <-
+    isTRUE(all.equal(dat[, i], getElement(dat2, names(dat)[i])))
+  expect_equal(exp2, obs2)
+})
+
+test_that('all data', {
+  rec3 <- recipe(y ~ ., data = dat) %>%
+    step_shuffle(everything())
+  
+  rec3 <- prep(rec3, training = dat, verbose = FALSE)
+  set.seed(2516)
+  dat3 <- bake(rec3, dat)
+  exp3 <- c(FALSE, FALSE, FALSE, TRUE)
+  obs3 <- rep(NA, 4)
+  for (i in 1:ncol(dat3))
+    obs3[i] <-
+    isTRUE(all.equal(dat[, i], getElement(dat3, names(dat)[i])))
+  expect_equal(exp3, obs3)
+})
+
+
+test_that('printing', {
+  rec3 <- recipe(y ~ ., data = dat) %>%
+    step_shuffle(everything())
+  expect_output(print(rec3))
+  expect_output(prep(rec3, training = dat))
+})
+
+test_that('bake a single row', {
+  rec4 <- recipe(y ~ ., data = dat) %>%
+    step_shuffle(everything())
+  
+  rec4 <- prep(rec4, training = dat, verbose = FALSE)
+  expect_warning(dat4 <- bake(rec4, dat[1,], everything()))
+  expect_equal(dat4, dat[1,])
+})
diff --git a/tests/testthat/test_spatialsign.R b/tests/testthat/test_spatialsign.R
new file mode 100644
index 0000000..821f396
--- /dev/null
+++ b/tests/testthat/test_spatialsign.R
@@ -0,0 +1,35 @@
+library(testthat)
+library(recipes)
+data("biomass")
+
+rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
+              data = biomass)
+
+test_that('spatial sign', {
+  sp_sign <- rec %>% 
+    step_center(carbon, hydrogen) %>% 
+    step_scale(carbon, hydrogen) %>%
+    step_spatialsign(carbon, hydrogen)
+  
+  sp_sign_trained <- prep(sp_sign, training = biomass, verbose = FALSE)
+  
+  sp_sign_pred <- bake(sp_sign_trained, newdata = biomass)
+  sp_sign_pred <- as.matrix(sp_sign_pred)[, c("carbon", "hydrogen")]
+  
+  x <- as.matrix(scale(biomass[, 3:4], center = TRUE, scale = TRUE))
+  x <- t(apply(x, 1, function(x) x/sqrt(sum(x^2))))
+  
+  expect_equal(sp_sign_pred, x)
+})
+
+
+test_that('printing', {
+  sp_sign <- rec %>% 
+    step_center(carbon, hydrogen) %>% 
+    step_scale(carbon, hydrogen) %>%
+    step_spatialsign(carbon, hydrogen)
+  expect_output(print(sp_sign))
+  expect_output(prep(sp_sign, training = biomass))
+})
+
+
diff --git a/tests/testthat/test_sqrt.R b/tests/testthat/test_sqrt.R
new file mode 100644
index 0000000..8d57974
--- /dev/null
+++ b/tests/testthat/test_sqrt.R
@@ -0,0 +1,29 @@
+library(testthat)
+library(recipes)
+library(tibble)
+
+n <- 20
+ex_dat <- data.frame(x1 = seq(0, 1, length = n),
+                     x2 = rep(1:5, 4))
+
+test_that('simple sqrt trans', {
+  
+      rec <- recipe(~., data = ex_dat) %>% 
+        step_sqrt(x1, x2)
+      
+      rec_trained <- prep(rec, training = ex_dat, verbose = FALSE)
+      rec_trans <- bake(rec_trained, newdata = ex_dat)
+
+      exp_res <- as_tibble(lapply(ex_dat, sqrt))
+      expect_equal(rec_trans, exp_res)
+  
+})
+
+
+test_that('printing', {
+  rec <- recipe(~., data = ex_dat) %>% 
+    step_sqrt(x1, x2)
+  expect_output(print(rec))
+  expect_output(prep(rec, training = ex_dat))
+})
+
diff --git a/tests/testthat/test_stringsAsFactors.R b/tests/testthat/test_stringsAsFactors.R
new file mode 100644
index 0000000..be67712
--- /dev/null
+++ b/tests/testthat/test_stringsAsFactors.R
@@ -0,0 +1,42 @@
+library(testthat)
+library(recipes)
+
+n <- 20
+
+set.seed(752)
+as_fact <- data.frame(
+  numbers = rnorm(n),
+  fact = factor(sample(letters[1:3], n, replace = TRUE)),
+  ord = factor(sample(LETTERS[22:26], n, replace = TRUE),
+               ordered = TRUE)
+)
+as_str <- as_fact
+as_str$fact <- as.character(as_str$fact)
+as_str$ord <- as.character(as_str$ord)
+
+test_that('stringsAsFactors = FALSE', {
+  rec1 <- recipe(~ ., data = as_fact) %>%
+    step_center(numbers)
+  rec1 <- prep(rec1, training = as_fact, retain = TRUE, 
+                  stringsAsFactors = FALSE, verbose = FALSE)
+  rec1_as_fact <- bake(rec1, newdata = as_fact)
+  rec1_as_str <- bake(rec1, newdata = as_str) 
+  expect_equal(as_fact$fact, rec1_as_fact$fact)
+  expect_equal(as_fact$ord, rec1_as_fact$ord)  
+  expect_equal(as_str$fact, rec1_as_str$fact)
+  expect_equal(as_str$ord, rec1_as_str$ord)    
+  
+})
+
+test_that('stringsAsFactors = TRUE', {
+  rec2 <- recipe(~ ., data = as_fact) %>%
+    step_center(numbers)
+  rec2 <- prep(rec2, training = as_fact, retain = TRUE, 
+                  stringsAsFactors = TRUE, verbose = FALSE)
+  rec2_as_fact <- bake(rec2, newdata = as_fact)
+  rec2_as_str <- bake(rec2, newdata = as_str) 
+  expect_equal(as_fact$fact, rec2_as_fact$fact)
+  expect_equal(as_fact$ord, rec2_as_fact$ord)  
+  expect_equal(as_fact$fact, rec2_as_str$fact)
+  expect_equal(as_fact$ord, rec2_as_str$ord)    
+})
diff --git a/vignettes/Custom_Steps.Rmd b/vignettes/Custom_Steps.Rmd
new file mode 100644
index 0000000..a1cc3e1
--- /dev/null
+++ b/vignettes/Custom_Steps.Rmd
@@ -0,0 +1,247 @@
+---
+title: "Creating Custom Step Functions"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Custom Steps}
+  %\VignetteEncoding{UTF-8}  
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+```{r ex_setup, include=FALSE}
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+```
+
+`recipes` contains a number of different steps included in the package:
+
+```{r step_list}
+library(recipes)
+steps <- apropos("^step_")
+steps[!grepl("new$", steps)]
+```
+
+You might want to make your own and this page describes how to do that. If you are looking for good examples of existing steps, I would suggest looking at the code for [centering](https://github.com/topepo/recipes/blob/master/R/center.R) or [PCA](https://github.com/topepo/recipes/blob/master/R/pca.R) to start. 
+
+
+# A new step definition
+
+At an example, let's create a step that replaces the value of a variable with its percentile from the training set. The date that I'll use is from the `recipes` package:
+
+```{r initial}
+data(biomass)
+str(biomass)
+
+biomass_tr <- biomass[biomass$dataset == "Training",]
+biomass_te <- biomass[biomass$dataset == "Testing",]
+```
+
+To illustrate the transformation with the `carbon` variable, the training set distribution of that variables is shown below with a vertical line for the first value of the test set. 
+
+```{r carbon_dist}
+library(ggplot2)
+theme_set(theme_bw())
+ggplot(biomass_tr, aes(x = carbon)) + 
+  geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + 
+  geom_vline(xintercept = biomass_te$carbon[1], lty = 2)
+```
+
+Based on the training set, `r round(mean(biomass_tr$carbon <= biomass_te$carbon[1])*100, 1)`% of the data are less than a value of `r biomass_te$carbon[1]`. There are some applications where it might be advantageous to represent the predictor values are percentiles rather than their original values. 
+
+Our new step will do this computation for any numeric variables of interest. We will call this `step_percentile`. The code below is designed for illustration and not speed or best practices. I've left out a lot of error trapping that we would want in a real implementation.  
+
+# Create the initial function. 
+
+The user-exposed function `step_percentile` is just a simple wrapper around an internal function called `add_step`. This function takes the same arguments as your function and simply adds it to a new recipe. The `...` signfies the variable selectors that can be used.
+
+```{r initial_def}
+step_percentile <- function(recipe, ..., role = NA, 
+                            trained = FALSE, ref_dist = NULL,
+                            approx = FALSE, 
+                            options = list(probs = (0:100)/100, names = TRUE)) {
+## bake but do not evaluate the variable selectors with
+## the `quos` function in `rlang`
+  terms <- rlang::quos(...) 
+  if(length(terms) == 0)
+    stop("Please supply at least one variable specification. See ?selections.")
+  add_step(
+    recipe, 
+    step_percentile_new(
+      terms = terms, 
+      trained = trained,
+      role = role, 
+      ref_dist = ref_dist,
+      approx = approx,
+      options = options))
+}
+```
+
+You should always keep the first four arguments (`recipe` though `trained`) the same as listed above. Some notes:
+
+ * the `role` argument is used when you either 1) create new variables and want their role to be pre-set or 2) replace the existing variables with new values. The latter is what we will be doing and using `role = NA` will leave the existing role intact. 
+ * `trained` is set by the package when the estimation step has been run. You should default your function definition's argument to `FALSE`.  
+
+I've added extra arguments specific to this step. In order to calculate the percentile, the training data for the relevant columns will need to be saved. This data will be saved in the `ref_dist` object. 
+However, this might be problematic if the data set is large. `approx` would be used when you want to save a grid of pre-computed percentiles from the training set and use these to estimate the percentile for a new data point. If `approx = TRUE`, the argument `ref_dist` will contain the grid for each variable. 
+
+We will use the `stats::quantile` to compute the grid. However, we might also want to have control over the granularity of this grid, so the `options` argument will be used to define how that calculations is done. We could just use the ellipses (aka `...`) so that any options passed to `step_percentile` that are not one of its arguments will then be passed to `stats::quantile`. We recommend making a seperate list object with the options and use these inside the function. 
+
+
+# Initialization of new objects
+
+Next, you can utilize the internal function `step` that sets the class of new objects. Using `subclass = "percentile"` will set the class of new objects to `"step_percentile". 
+
+```{r initialize}
+step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, 
+                                ref_dist = NULL, approx = NULL, options = NULL) {
+  step(
+    subclass = "percentile", 
+    terms = terms,
+    role = role,
+    trained = trained,
+    ref_dist = ref_dist,
+    approx = approx,
+    options = options
+  )
+}
+```
+
+# Define the estimation procedure
+
+You will need to create a new `prep` method for your step's class. To do this, three arguments that the method should have:
+
+```r
+function(x, training, info = NULL)
+```
+
+where
+
+ * `x` will be the `step_percentile` object
+ * `training` will be a _tibble_ that has the training set data
+ * `info` will also be a tibble that has information on the current set of data available. This information is updated as each step is evaluated by its specific `prep` method so it may not have the variables from the original data. The columns in this tibble are `variable` (the variable name), `type` (currently either "numeric" or "nominal"), `role` (defining the variable's role), and `source` (either "original" or "derived" depending on where it originated).
+
+You can define other options. 
+
+The first thing that you might want to do in the `prep` function is to translate the specification listed in the `terms` argument to column names in the current data. There is an internal function called `terms_select` that can be used to obtain this. 
+
+```{r prep_1, eval = FALSE}
+prep.step_percentile <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(terms = x$terms, info = info) 
+}
+```
+
+Once we have this, we can either save the original data columns or estimate the approximation grid. For the grid, we will use a helper function that enables us to run `do.call` on a list of arguments that include the `options` list.  
+
+```{r prep_2}
+get_pctl <- function(x, args) {
+  args$x <- x
+  do.call("quantile", args)
+}
+
+prep.step_percentile <- function(x, training, info = NULL, ...) {
+  col_names <- terms_select(terms = x$terms, info = info) 
+  ## You can add error trapping for non-numeric data here and so on.
+  ## We'll use the names later so
+  if(x$options$names == FALSE)
+    stop("`names` should be set to TRUE")
+  
+  if(!x$approx) {
+    x$ref_dist <- training[, col_names]
+  } else {
+    pctl <- lapply(
+      training[, col_names],  
+      get_pctl, 
+      args = x$options
+    )
+    x$ref_dist <- pctl
+  }
+  ## Always return the updated step
+  x
+}
+```
+
+# Create the `bake` method
+
+Remember that the `prep` function does not _apply_ the step to the data; it only estimates any required values such as `ref_dist`. We will need to create a new method for our `step_percentile` class. The minimum arguments for this are
+
+```r
+function(object, newdata, ...)
+```
+
+where `object` is the updated step function that has been through the corresponding `prep` code and `newdata` is a tibble of data to be preprocessingcessed. 
+
+Here is the code to convert the new data to percentiles. Two initial helper functions handle the two cases (approximation or not). We always return a tibble as the output. 
+
+```{r bake}
+## Two helper functions
+pctl_by_mean <- function(x, ref) mean(ref <= x)
+
+pctl_by_approx <- function(x, ref) {
+  ## go from 1 column tibble to vector
+  x <- getElement(x, names(x))
+  ## get the percentiles values from the names (e.g. "10%")
+  p_grid <- as.numeric(gsub("%$", "", names(ref))) 
+  approx(x = ref, y = p_grid, xout = x)$y/100
+}
+
+bake.step_percentile <- function(object, newdata, ...) {
+  require(tibble)
+  ## For illustration (and not speed), we will loop through the affected variables
+  ## and do the computations
+  vars <- names(object$ref_dist)
+  
+  for(i in vars) {
+    if(!object$approx) {
+      ## We can use `apply` since tibbles do not drop dimensions:
+      newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, 
+                            ref = object$ref_dist[, i])
+    } else 
+      newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]])
+  }
+  ## Always convert to tibbles on the way out
+  as_tibble(newdata)
+}
+```
+
+# Running the example
+
+Let's use the example data to make sure that it works: 
+
+```{r example}
+rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)])
+rec_obj <- rec_obj %>%
+  step_percentile(all_predictors(), approx = TRUE) 
+
+rec_obj <- prep(rec_obj, training = biomass_tr)
+
+percentiles <- bake(rec_obj, biomass_te)
+percentiles
+```
+
+The plot below shows how the original data line up with the percentiles for each split of the data for one of the predictors:
+
+```{r cdf_plot, echo = FALSE}
+grid_pct <- rec_obj$steps[[1]]$options$probs
+plot_data <- data.frame(
+  carbon = c(
+    quantile(biomass_tr$carbon, probs = grid_pct), 
+    biomass_te$carbon
+  ),
+  percentile = c(grid_pct, percentiles$carbon),
+  dataset = rep(
+    c("Training", "Testing"), 
+    c(length(grid_pct), nrow(percentiles))
+  )
+)
+
+ggplot(plot_data, 
+       aes(x = carbon, y = percentile, col = dataset)) + 
+  geom_point(alpha = .4, cex = 2) + 
+  theme(legend.position = "top")
+```
diff --git a/vignettes/Ordering.Rmd b/vignettes/Ordering.Rmd
new file mode 100644
index 0000000..1a1c513
--- /dev/null
+++ b/vignettes/Ordering.Rmd
@@ -0,0 +1,28 @@
+---
+title: "Ordering of Steps"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Ordering of Steps}
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider:
+
+* If using a Box-Cox transformation, don't center the data first or do any operations that might make the data non-positive. Alternatively, use the Yeo-Johnson transformation so you don't have to worry about this. 
+* Recipes do not automatically create dummy variables (unlike _most_ formula methods). If you want to center, scale, or do any other operations on _all_ of the predictors, run `step_dummy` first so that numeric columns are in the data set instead of factors. 
+* As noted in the help file for `step_interact`, you should make dummy variables _before_ creating the interactions.
+* If you are lumping infrequently categories together with `step_other`, call `step_other` before `step_dummy`.
+
+While your project's needs may vary, here is a suggested order of _potential_ steps that should work for most problems:
+
+1. Impute
+1. Individual transformations for skewness and other issues
+1. Discretize (if needed and if you have no other choice) 
+1. Create dummy variables
+1. Create interactions
+1. Normalization steps (center, scale, range, etc) 
+1. Multivariate transformation (e.g. PCA, spatial sign, etc) 
+
+Again, your milage may vary for your particular problem. 
diff --git a/vignettes/Selecting_Variables.Rmd b/vignettes/Selecting_Variables.Rmd
new file mode 100644
index 0000000..9f7b6ea
--- /dev/null
+++ b/vignettes/Selecting_Variables.Rmd
@@ -0,0 +1,73 @@
+---
+title: "Selecting Variables"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Selecting Variables}
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+```{r ex_setup, include=FALSE}
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+```
+
+When recipe steps are used, there are different approaches that can be used to select which variables or features should be used. 
+
+The three main characteristics of variables that can be queried: 
+
+ * the name of the variable
+ * the data type (e.g. numeric or nominal)
+ * the role that was declared by the recipe
+ 
+The manual pages for `?selections` and  `?has_role` have details about the available selection methods. 
+ 
+To illustrate this, the credit data will be used: 
+
+```{r credit}
+library(recipes)
+data("credit_data")
+str(credit_data)
+
+rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data)
+rec
+```
+
+Before any steps are used the information on the original variables is:
+
+```{r var_info_orig}
+summary(rec, original = TRUE)
+```
+
+We can add a step to compute dummy variables on the non-numeric data after we impute any missing data:
+
+```{r dummy_1}
+dummied <- rec %>% step_dummy(all_nominal())
+```
+
+This will capture _any_ variables that are either character strings or factors: `Status` and `Records`. However, since `Status` is our outcome, we might want to keep it as a factor so we can _subtract_ that variable out either by name or by role:
+
+```{r dummy_2}
+dummied <- rec %>% step_dummy(Records) # or
+dummied <- rec %>% step_dummy(all_nominal(), - Status) # or
+dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) 
+```
+
+Using the last definition: 
+
+```{r dummy_3}
+dummied <- prep(dummied, training = credit_data)
+with_dummy <- bake(dummied, newdata = credit_data)
+with_dummy
+```
+
+`Status` is unaffected. 
+
+One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, `Records` is a factor variable before the step is executed. Afterwards, `Records` is gone and the binary variable `Records_yes` is in its place. One reason to have general selection routines like `all_predictors` or `contains` is to be able to select variables that have not be created yet. 
+
diff --git a/vignettes/Simple_Example.Rmd b/vignettes/Simple_Example.Rmd
new file mode 100644
index 0000000..1828330
--- /dev/null
+++ b/vignettes/Simple_Example.Rmd
@@ -0,0 +1,134 @@
+---
+title: "Basic Recipes"
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Basic Recipes}
+output:
+  knitr:::html_vignette:
+    toc: yes
+---
+
+```{r ex_setup, include=FALSE}
+knitr::opts_chunk$set(
+  message = FALSE,
+  digits = 3,
+  collapse = TRUE,
+  comment = "#>"
+  )
+options(digits = 3)
+```
+
+This document demonstrates some basic uses of recipes. First, some definitions are required: 
+
+ * __variables__ are the original (raw) data columns in a data frame or tibble. For example, in a traditional formula `Y ~ A + B + A:B`, the variables are `A`, `B`, and `Y`. 
+ * __roles__ define how variables will be used in the model. Examples are: `predictor` (independent variables), `response`, and `case weight`. This is meant to be open-ended and extensible. 
+ * __terms__ are columns in a design matrix such as `A`, `B`, and `A:B`. These can be other derived entities that are grouped such a a set of principal components or a set of columns that define a basis function for a variable. These are synonymous with features in machine learning. Variables that have `predictor` roles would automatically be main effect terms  
+
+## An Example
+
+The cell segmentation data will be used. It has 58 predictor columns, a factor variable `Class` (the outcome), and two extra labelling columns. Each of the predictors has a suffix for the optical channel (`"Ch1"`-`"Ch4"`). We will first separate the data into a training and test set then remove unimportant variables:
+
+```{r data}
+library(recipes)
+library(caret)
+data(segmentationData)
+
+seg_train <- segmentationData %>% 
+  filter(Case == "Train") %>% 
+  select(-Case, -Cell)
+seg_test  <- segmentationData %>% 
+  filter(Case == "Test")  %>% 
+  select(-Case, -Cell)
+```
+
+The idea is that the preprocessing operations will all be created using the training set and then these steps will be applied to both the training and test set. 
+
+## An Initial Recipe
+
+For a first recipe, let's plan on centering and scaling the predictors. First, we will create a recipe from the original data and then specify the processing steps. 
+
+Recipes can be created manually by sequentially adding roles to variables in a data set. 
+
+If the analysis only required **outcomes** and **predictors**, the easiest way to create the initial recipe is to use the standard formula method:
+
+```{r first_rec}
+rec_obj <- recipe(Class ~ ., data = seg_train)
+rec_obj
+```
+
+The data contained in the `data` argument need not be the training set; this data is only used to catalog the names of the variables and their types (e.g. numeric, etc.).  
+
+(Note that the formula method here is used to declare the variables and their roles and nothing else. If you use inline functions (e.g. `log`) it will complain. These types of operations can be added later.)
+
+## Preprocessing Steps
+
+From here, preprocessing steps can be added sequentially in one of two ways:
+```{r step_code, eval = FALSE}
+rec_obj <- step_name(rec_obj, arguments)    ## or
+rec_obj <- rec_obj %>% step_name(arguments)
+```
+`step_center` and the other functions will always return updated recipes. 
+
+One other important facet of the code is the method for specifying which variables should be used in different steps. The manual page `?selections` has more details but [`dplyr`](https://cran.r-project.org/package=dplyr)-like selector functions can be used: 
+
+ * use basic variable names (e.g. `x1, x2`),
+ *  [`dplyr`](https://cran.r-project.org/package=dplyr) functions for selecting variables: `contains`, `ends_with`, `everything`, `matches`, `num_range`, and `starts_with`,
+ * functions that subset on the role of the variables that have been specified so far: `all_outcomes`, `all_predictors`, `has_role`, or 
+ * similar functions for the type of data: `all_nominal`, `all_numeric`, and `has_type`. 
+
+Note that the functions listed above are the only ones that can be used to selecto variables inside the steps. Also, minus signs can be used to deselect variables. 
+
+For our data, we can add the two operations for all of the predictors:
+```{r center_scale}
+standardized <- rec_obj %>%
+  step_center(all_predictors()) %>%
+  step_scale(all_predictors()) 
+standardized
+```
+
+It is important to realize that the _specific_ variables have not been declared yet (in this example). In some preprocessing steps, variables will be added or removed from the current list of possible variables. 
+
+If this is the only preprocessing steps for the predictors, we can now estimate the means and standard deviations from the training set. The `prep` function is used with a recipe and a data set:
+```{r trained}
+trained_rec <- prep(standardized, training = seg_train)
+```
+Now that the statistics have been estimated, the preprocessing can be applied to the training and test set:
+```{r apply}
+train_data <- bake(trained_rec, newdata = seg_train)
+test_data  <- bake(trained_rec, newdata = seg_test)
+```
+`bake` returns a tibble: 
+```{r tibbles}
+class(test_data)
+test_data
+```
+
+
+## Adding Steps
+
+After exploring the data, more preprocessing might be required. Steps can be added to the trained recipe. Suppose that we need to create PCA components but only from the predictors from channel 1 and any predictors that are areas: 
+```{r pca}
+trained_rec <- trained_rec %>%
+  step_pca(ends_with("Ch1"), contains("area"), num = 5)
+trained_rec
+```
+Note that only the last step has been estimated; the first two were previously trained and these activities are not duplicated. We can add the PCA estimates using `prep` again:
+```{r pca_training}
+trained_rec <- prep(trained_rec, training = seg_train)
+```
+`bake` can be reapplied to get the principal components in addition to the other variables:
+
+```{r pca_bake}
+test_data  <- bake(trained_rec, newdata = seg_test)
+names(test_data)
+```
+
+Note that the PCA components have replaced the original variables that were from channel 1 or measured an area aspect of the cells. 
+
+
+There are a number of different steps included in the package:
+
+```{r step_list}
+steps <- apropos("^step_")
+steps[!grepl("new$", steps)]
+```

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/r-cran-recipes.git



More information about the debian-science-commits mailing list