Package 'mosaicModel' reference manual

Title:	An Interface to Statistical Modeling Independent of Model Architecture
Description:	Provides functions for evaluating, displaying, and interpreting statistical models. The goal is to abstract the operations on models from the particular architecture of the model. For instance, calculating effect sizes rather than looking at coefficients. The package includes interfaces to both regression and classification architectures, including lm(), glm(), rlm() in 'MASS', random forests and recursive partitioning, k-nearest neighbors, linear and quadratic discriminant analysis, and models produced by the 'caret' package's train(). It's straightforward to add in other other model architectures.
Authors:	Kaplan Daniel [aut, cre], Pruim Randall [aut, cre]
Maintainer:	Daniel Kaplan <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.0
Built:	2024-10-27 04:21:08 UTC
Source:	https://github.com/cran/mosaicModel

Prices for life insurance

Description

The AARP (original named the "American Association of Retired People" but now just AARP) offers life insurance to it's members. The data come from a full-page advertisement (circa 2012) in the "AARP Bulletin", which has the second largest circulation in the world of any magazine, with upward of 40 million subscribers. (Only the "AARP Magazine" has a larger circulation.)

Usage

data(AARP)
data(AARP)

Format

A data frame with 36 observations on the following variables.

Age The age of the person covered by the insurance policy.
Sex The sex of the person covered by the insurance policy.
Coverage The "death benefit" in 1000 USD.
Cost Monthly cost in USD.

Details

Life insurance provides a "death benefit", money paid out to the insured person's survivors upon death of the insured. There is a cost for the insurance. Among other factors, the cost depends on both age and sex. (For this type of insurance, called "term insurance", the cost changes as the insured person ages.)

Source

The "AARP Bulletin". A copy of the ad is available at this link.

Examples

mod_1 <- lm(Cost ~ Age + Coverage, data = AARP)
mod_effect(mod_1, ~ Coverage)
mod_1 <- lm(Cost ~ Age + Coverage, data = AARP)
mod_effect(mod_1, ~ Coverage)

Nitrogen fixing by alder plants

Description

These data were collected by biologist Mike Anderson in a study of nitrogen fixation by bacteria growing on the root nodules of alder bushes.

Usage

data(Alder)
data(Alder)

Format

A data frame Alder with 196 rows and 24 variables:

LAND. landscape, floodplain vs. upland.
SAMPPER. sampling period: early, mid, late
SPECIES. host species, Alnus tenuifolia (AT) vs. A. crispa (AC)
STAGE. successional stage, early vs. late in floodplain and upland landscapes. This is not equivalent across landscapes.
JULDAY. Julian day
PERNODN. nodule percent nitrogen by mass
RF. bacterial genotype
SNF. nitrogen fixation rate of nodule tissue, umol N2/gram of nodule dry weight/hr
SLA. specific leaf weight, grams of leaf weight/square-meter, dry
ONECM. soil temperature at 1 cm depth
FIVECM. soil temperature at 5 cm depth
PERH2O. soil moisture, percent H2O by mass
DEL. del15N of leaf tissue
DELNOD. del15N of nodule tissue
NPERAREA. leaf nitrogen content per unit leaf area
NDiff. nitrogen content difference between leaf and nodule of the same plant
delDiff. del15N difference between leaf and nodule of the same plant
SITE. Site designations: 1A,B,C for replicate early succession floodplain sites, 4A,B,C for late succession floodplain, UP1A,B,C for early succession upland and UP3A,B,C for late succession upland
HABSPEC. habitat+species, concatenated LAND, STAGE, SPECIES
SITESPEC. concatenated SITE, SPECIES
REP. replicate site within a given level of HABSPEC
PLNO. plant number, unique for individuals of each species (AT1-180, AC1-270)

Details

Two questions that Anderson wanted to answer are: (1) Can any variation in nitrogen fixation (variable SNF) be attributed to genotype (variable RF)? (2) What are the major sources of variation in SNF and PERLEAFN? Variables of biological interest are seasonality (SAMPPER or JULDAY), soil temperature and moisture, and habitat differences (STAGE for host species AT andSTAGE and LAND for host species AC).

Three replicate sites were sampled for each landscape/stage combination in three sampling periods across the growing season. Site sampling was arranged in a Latin Square design in order to systematize any effects of seasonality on N2-fixation rates.

Source

Michael Anderson

References

Anderson MD, Ruess RW, Myrold DD, Taylor DL. “Host species and habitat affect modulation by specific Frankia genotypes in interior Alaska” Oecologia (2009) 160:619-630.

Examples

mod <- lm(logSNF ~ RF + SITESPEC, data = Alder)
mod <- lm(logSNF ~ RF + SITESPEC, data = Alder)

Birth weights and maternal data

Description

Birth weight, date, and gestational period collected as part of the Child Health and Development Studies in 1961 and 1962. Information about the baby's parents — age, education, height, weight, and whether the mother smoked is also recorded. The data were present by Nolan and Speed to address the question of whether there is a link between maternal smoking and the baby's health.

Usage

data(Birth_weight)
data(Birth_weight)

Format

A data frame with 886 observations on the following variables.

baby_wt Birth weight of baby, in ounces.
mother_wt in pounds
gestation Length of the pregnancy, in days.
smoker Whether the mother smoked during the pregnancy.
incomeFamily yearly income in 2500USD increments 0 = under 2500, 1=2500-4999, ..., 8= 12,500-14,999, 9=15000+

Source

D. Nolan and T.P. Speed (2009) "Stat Labs: Mathematical Statistics Through Applications"

Examples

mod_1 <- lm(baby_wt ~ gestation + mother_wt, data = Birth_weight)
mod_effect(mod_1, ~ gestation)
mod_1 <- lm(baby_wt ~ gestation + mother_wt, data = Birth_weight)
mod_effect(mod_1, ~ gestation)

Function builder for confidence intervals on proportions

Description

Similar to proportion, but

Usage

ci.proportion(nm = NULL, level = 0.95)
ci.proportion(nm = NULL, level = 0.95)

Arguments

`nm`	The level for which to find the proportion
`level`	The confidence interval (Default: 0.95)

Examples

## Not run: 
df_stats(mtcars, ~ cyl, cyl_prop = ci.proportion(6, level = 0.90))

## End(Not run)
## Not run: 
df_stats(mtcars, ~ cyl, cyl_prop = ci.proportion(6, level = 0.90))

## End(Not run)

Grades at a small college

Description

These are the actual grades for 400+ individual students in the courses they took at a small, liberal-arts college in the midwest US. All the students graduated in 2006. Each row corresponds to a single student in a single course. The data have been de-identified by translating the student ID, the instructor ID, and the name of the department. Typically a graduating student has taken about 32 courses. As another form of de-identification, only half of the courses each student, selected randomly, are included. Only courses with 10 or more students enrolled were included.

Usage

data(College_grades)
data(College_grades)

Format

A data frame with 6146 Grades for 443 students.

grade The letter grade for the student in this course: A is the highest.
sessionID An identifier for the course taken. Courses offered multiple times in one semester or across semesters have individual IDs.
sid The student ID
dept The department in which the course was offered. 100 is entry-level, 200 sophomore-level, 300 junior-level, 400 senior-level.
enroll Student enrollment in the course. This includes students who are not part of this sample.
iid Instructor ID
gradepoint A translation of the letter grade into a numerical scale. 4 is high. Some letter grades are not counted in a student's gradepoint average. These have NA for the gradepoint.

Source

The data were helpfully provided by the registrar of the college with the proviso that the de-identification steps outlined above be performed.

Examples

## Not run: 
GPA <- lm(gradepoint ~ sid - 1, data = College_grades)

## End(Not run)
## Not run: 
GPA <- lm(gradepoint ~ sid - 1, data = College_grades)

## End(Not run)

Calculate measures of collinearity

Description

Calculate measures of collinearity

Usage

collinearity(formula, data, format = c("SeIF", "degrees", "radians", "VIF"))
collinearity(formula, data, format = c("SeIF", "degrees", "radians", "VIF"))

Arguments

`formula`	a formula giving, on the right-hand side, the explanatory variables to be considered. Iteractions, etc. may also be specified.
`data`	a data frame from which to draw the variables in the formula
`format`	choice of `"SeIF"` for inflation of standard errors, `"degrees"` or `"radians"` for collinearity described as an angle or `"VIF"` for the variance inflation factor (which is the square of SeIF).

Examples

collinearity( ~ cyl * disp * hp, data = mtcars)
collinearity( ~ cyl * disp * hp, data = mtcars, format = "degrees")
collinearity( ~ cyl * disp * hp, data = mtcars)
collinearity( ~ cyl * disp * hp, data = mtcars, format = "degrees")

Construct a call for refitting a model from the model itself

Description

This will typically not be used by the end-user.

Usage

construct_fitting_call(model, data_name = "training", ...)
construct_fitting_call(model, data_name = "training", ...)

Arguments

`model`	the model in question
`data_name`	character string specifying the name of the data frame used for the refitting. This object must be defined in the environment in which the call is being made.
`...`	(not used)

Details

This provides a way to refit a model on either resampled or sub-sampled data. Not all model architectures support this. If not, then you can't use mod_ensemble or mod_cv, or use the bootstrap= argument in any of the other functions.

Interval statistics for use with df_stats()

Description

Function builders for calculating intervals. These must always be evaluated with the result being handed as a argument to df_stats().

Usage

coverage(level = 0.95)
coverage(level = 0.95)

Arguments

level

Number in 0 to 1 specifying the confidence level for the interval. (Default: 0.95)

Examples

cover <- coverage(0.95)
df_stats(hp ~ cyl, data = mtcars, c95 = cover)

cover <- coverage(0.95)
df_stats(hp ~ cyl, data = mtcars, c95 = cover)

Data from the US FBI Uniform Crime Report, 1960

Description

A report of the number of offenses reported to police per million population, and many other social and demographic variables. Each case corresponds to a state in the US.

Usage

data(Crime)
data(Crime)

Format

A data frame with 47 cases, each of which is a US state, with observations on the following variables.

R Crime rate: number of offenses reported to police per million population.
Age Number of males aged 14-24 per 1000 population
N State population (in 100,000s)
W State-wise median value of transferable goods and assets or family income in tens of dollars.
X Number of families per 1000 earning below half the median income.
ExDiff Change in per capita expenditure on police by state and local government from 1950 to 1960
Ex0 1960 per capita expenditures on police.

Source

FBI Uniform Crime Report via DASL: Data and Story Library

Examples

mod_1 <- lm(R ~ W, data = Crime)
mod_2 <- lm(R ~ X, data = Crime)
mod_3 <- lm(R ~ W + X, data = Crime)
mod_effect(mod_1, ~ W)
mod_effect(mod_3, ~ W)
mod_effect(mod_2, ~ X)
mod_effect(mod_3, ~ X)
mod_1 <- lm(R ~ W, data = Crime)
mod_2 <- lm(R ~ X, data = Crime)
mod_3 <- lm(R ~ W + X, data = Crime)
mod_effect(mod_1, ~ W)
mod_effect(mod_3, ~ W)
mod_effect(mod_2, ~ X)
mod_effect(mod_3, ~ X)

Extract training data from model

Description

This typically will not be used by an end-user.

Usage

data_from_mod(model, ...)
data_from_mod(model, ...)

Arguments

`model`	the model from which to extract the training data
`...`	additional arguments (not used)

Details

not all model architectures keep track of the training data If a model architecture isn't recognized, you'll have to add a method for that class. See vignette.

Formula interface to counts

Description

Counts the number of cases in a data frame broken down by the variables in the formula.

Usage

df_counts(formula, data, wide = FALSE, margins = FALSE)
df_counts(formula, data, wide = FALSE, margins = FALSE)

Arguments

`formula`	the formula describing the relationship
`data`	a data frame (or you can pipe this in)
`wide`	reformat the output as a cross-tabulation. This makes sense only when there are just two variables
`margins`	show the marginal counts. Makes the most sense if `wide = TRUE`.

Joint and conditional proportions

Description

Uses a formula interface to specify how the proportions are to be calculated.

Usage

df_props(formula, data, as.percent = FALSE, ..., wide = FALSE,
  margins = FALSE, format = c("proportion", "percent", "count"))
df_props(formula, data, as.percent = FALSE, ..., wide = FALSE,
  margins = FALSE, format = c("proportion", "percent", "count"))

Arguments

`formula`	the formula describing the relationship
`data`	a data frame (or you can pipe this in)
`as.percent`	show proportions in percent (e.g. multiply by 100)
`...`	statistics functions to be applied to the data, e.g. mean, sd, confidence(0.95)
`wide`	reformat the output as a cross-tabulation. This makes sense only when there are just two variables
`margins`	show the marginal probabilities. Makes the most sense if `wide = TRUE`.
`format`	Use just for internal purposes.

Details

Using | in the formula specifies a conditional proportion

~ A : proportion of cases in each level of A
~ A + B: joint proportion: each level of A crossed with B
~ A | B: conditional proportion: for each level of B, what fraction are in each level of A
A ~ B: another way of specifying the conditional proportion

Examples

df_props(mtcars, ~ cyl + gear) 
df_props(mtcars, ~ cyl | gear)
df_props(mtcars, ~ cyl + gear, wide = TRUE)
df_props(mtcars, ~ cyl + gear, margins = TRUE)
df_props(mtcars, ~ cyl | gear, margins = TRUE)

df_props(mtcars, ~ cyl + gear) 
df_props(mtcars, ~ cyl | gear)
df_props(mtcars, ~ cyl + gear, wide = TRUE)
df_props(mtcars, ~ cyl + gear, margins = TRUE)
df_props(mtcars, ~ cyl | gear, margins = TRUE)

Find typical levels of explanatory variables in a model/dataset.

Description

This function tries to choose sensible values of the explanatory variables from the data used to build a model or any other specified data. (or from data specified with the data = argument.)

Usage

df_typical(data = NULL, nlevels = 3, at = list(), model = NULL, ...)
df_typical(data = NULL, nlevels = 3, at = list(), model = NULL, ...)

Arguments

`data`	optional data frame from which to extract levels for explanatory variables
`nlevels`	how many levels to construct for input variables. For quantitative variables, this is a suggestion. Set to `Inf` to get all levels for categorical variables and 100 levels for quantitative variables.
`at`	named list giving specific values at which to hold the variables. Use this to override the automatic generation of levels for any or all explanatory variables.
`model`	the model to display graphically
`...`	a more concise mechanism to passing desired values for variables

Details

For categorical variables, the most populated levels are used. For quantitative variables, a sequence of pretty() values is generated.

For categorical variables, will return the nlevels most popular levels, unless the levels are specified explicitly in an argument.

Value

A dataframe containing all combinations of the selected values for the explanatory variables. If there are p explanatory variables, there will be about nlevels^p cases.

Examples

## Not run: 
df_typical(mosaicData::Galton, nlevels = 2, father = 70, mother = 68, nkids = 3)
df_typical(mosaicData::Galton, nlevels = 2)
mod1 <- lm(wage ~ age * sex + sector, data = mosaicData::CPS85)
df_typical(model = mod1, nlevels = 3)

## End(Not run)
## Not run: 
df_typical(mosaicData::Galton, nlevels = 2, father = 70, mother = 68, nkids = 3)
df_typical(mosaicData::Galton, nlevels = 2)
mod1 <- lm(wage ~ age * sex + sector, data = mosaicData::CPS85)
df_typical(model = mod1, nlevels = 3)

## End(Not run)

Get the names of the explanatory or response variables in a model

Description

This will typically not be used by the end_user. These functions let you interrogate any model architecture that's covered by mosaicModel about the response and explanatory variables. These are used internally in functions such as mod_plot.

Usage

explanatory_vars(model, ...)
explanatory_vars(model, ...)

Arguments

`model`	the model in question
`...`	(not used)

Extract the model formula used in specifying the model

Description

This typically will not be used by an end-user.

Usage

formula_from_mod(model, ...)
formula_from_mod(model, ...)

Arguments

`model`	the model
`...`	(not used)

Details

Not all model architectures support this. If a model architecture isn't recognized, you'll have to add a method for that class. See vignette.

Heating degree days in Minneapolis, Minnesota, USA

Description

A "heating degree day" is a measure of weather coldness. It's defined to be the difference between the outdoor ambient temperature and 65 degrees F, but has a value of zero when the ambient temperature is above 65 degrees. This difference is averaged over time and multiplied by the number of days in the time period covered. The heating degree day is often used as a measure of the demand for domestic heating in a locale.

Usage

data(HDD_Minneapolis)
data(HDD_Minneapolis)

Format

A data frame HDD_Minneapolis with 1412 rows and 4 variables:

year the year
month the month
hdd the number of heating degree days for that period.
loc the location at which the temperature was measured. In the early years, this was downtown Minneapolis. Later, the site was moved to the Minneapolis/Saint-Paul International Airport.

Details

These data report monthly heating degree days. For teaching purposes, the data give an extreme example of how a relationship (hdd vs year) can be revealed by including a covariate (month). Although interest focusses on the change in temperature over the century the data cover, there is such regular seasonal variation that no systematic trend over the years is evident unless month is taken into account.

Examples

mod_1 <- lm(hdd ~ year, data = HDD_Minneapolis)
mod_2 <- lm(hdd ~ year + month, data = HDD_Minneapolis)
mod_1 <- lm(hdd ~ year, data = HDD_Minneapolis)
mod_2 <- lm(hdd ~ year + month, data = HDD_Minneapolis)

Houses for sale

Description

A random sample of 1,728 homes taken from public records from the Saratoga County (http://www.saratogacountyny.gov/departments/real-property-tax-service-agency/). Collected by Candice Corvetti (Williams College '07) for her senior thesis.

Usage

data(Houses_for_sale)
data(Houses_for_sale)

Format

A dataframe with 1728 cases, each of which is a house for sale.

Details

These data are part of a case study developed by Prof. Dick de Veaux at Williams. They are available from the American Statistical Association's Stat 101 collection of case studies and included in this package for convenience.

References

Dick De Veaux (2015) "How much is a fireplace worth?" Stats 101: A resource for teaching introductory statistics, American Statistical Association

Examples

mod_1 <- lm(price ~ fireplaces, data = Houses_for_sale)
mod_2 <- lm(price ~ fireplaces + living_area, data = Houses_for_sale)
mod_effect(mod_1, ~ fireplaces)
mod_effect(mod_2, ~ fireplaces)
mod_plot(mod_2, ~ living_area + fireplaces)
mod_1 <- lm(price ~ fireplaces, data = Houses_for_sale)
mod_2 <- lm(price ~ fireplaces + living_area, data = Houses_for_sale)
mod_effect(mod_1, ~ fireplaces)
mod_effect(mod_2, ~ fireplaces)
mod_plot(mod_2, ~ living_area + fireplaces)

Compare models with k-fold cross validation

Description

Compare models with k-fold cross validation

Usage

mod_cv(..., k = 10, ntrials = 5, error_type = c("default", "mse", "sse",
  "mad", "LL", "mLL", "dev", "class_error"))
mod_cv(..., k = 10, ntrials = 5, error_type = c("default", "mse", "sse",
  "mad", "LL", "mLL", "dev", "class_error"))

Arguments

`...`	one or more models on which to perform the cross validation
`k`	the k in k-fold. cross-validation will use k-1/k of the data for training.
`ntrials`	how many random partitions to make. Each partition will be one case in the output of the function
`error_type`	The kind of output to produce from each cross-validation. See `mod_error` for details.

Details

The purpose of cross-validation is to provide "new" data on which to test a model's performance. In k-fold cross-validation, the data set used to train the model is broken into new training and testing data. This is accomplished simply by using most of the data for training while reserving the remaining data for evaluating the model: testing. Rather than training a single model, k models are trained, each with its own particular testing set. The testing sets in the k models are arranged to cover the whole of the data set. On each of the k testing sets, a performance output is calculated. Which output is most appropriate depends on the kind of model: regression model or classifier. The most basic measure is the mean square error: the difference between the actual response variable in the testing data and the output of the model when presented with inputs from the testing data. This is appropriate in many regression models.

Calculate effect sizes in a model

Description

Like a derivative or finite-difference

Usage

mod_effect(model, formula, step = NULL, bootstrap = 0, to = step,
  nlevels = 1, data = NULL, at = NULL, class_level = NULL, ...)
mod_effect(model, formula, step = NULL, bootstrap = 0, to = step,
  nlevels = 1, data = NULL, at = NULL, class_level = NULL, ...)

Arguments

`model`	the model from which the effect size is to be calculated
`formula`	a formula whose right-hand side is the variable with respect to which the effect size is to be calculated.
`step`	the numerical stepsize for the change var, or a comparison category for a categorical change var. This will be either a character string or a number, depending on the type of variable specified in the formula.
`bootstrap`	The number of bootstrap replications to construct. If greater than 1, calculate a standard error using that number of replications.
`to`	a synonym for step. (In English, "to" is more appropriate for a categorical input, "step" for a quantitative. But you can use either.)
`nlevels`	integer specifying the number of levels to use for "typical" inputs. (Default: up to 3)
`data`	Specifies exactly the cases at which you want to calculate the effect size.
`at`	similar to `...` but expects a list or dataframe of the values you want to set. Like `...`, all combinations of the values specified will be used as inputs.
`class_level`	Name of the categorical level for which the probability is to be used. Applies only to classifiers. (Default: Use the first level.) Unlike `...` or `at`, no new combinations will be created.
`...`	additional arguments for evaluation levels of explanatory variables.

Details

When you want to force or restrict the effect size calculation to specific values for explanatory variables, list those variables and levels as a vector in ... For example, educ = c(10, 12, 16) will cause the effect size to be calculated at each of those three levels of education. Any variables whose levels are not specified in ... will have values selected automatically.

Value

a data frame giving the effect size and the values of the explanatory variables at which the effect size was calculated. There will also be a column to_ showing the value jumped to for the variable with respect to which the effect size is calculated. When bootstrap is greater than 1, there will be a standard error reported on the effect size; see the variable ending in _se.

Examples

mod1 <- lm(wage ~ age * sex * educ + sector, data = mosaicData::CPS85)
mod_effect(mod1, ~ sex)
mod_effect(mod1, ~ sector)
mod_effect(mod1, ~ age, sex = "M", educ = c(10, 12, 16), age = c(30, 40))
mod_effect(mod1, ~ age, sex = "F", age = 34, step = 1)
mod_effect(mod1, ~ sex, age = 35, sex = "M", to = "F" )
# For classifiers, the change in *probability* of a level is reported.
mod2 <- rpart::rpart(sector ~ age + sex + educ + wage, data = mosaicData::CPS85)
mod_effect(mod2, ~ educ)
mod_effect(mod2, ~ educ, class_level = "manag")
mod1 <- lm(wage ~ age * sex * educ + sector, data = mosaicData::CPS85)
mod_effect(mod1, ~ sex)
mod_effect(mod1, ~ sector)
mod_effect(mod1, ~ age, sex = "M", educ = c(10, 12, 16), age = c(30, 40))
mod_effect(mod1, ~ age, sex = "F", age = 34, step = 1)
mod_effect(mod1, ~ sex, age = 35, sex = "M", to = "F" )
# For classifiers, the change in *probability* of a level is reported.
mod2 <- rpart::rpart(sector ~ age + sex + educ + wage, data = mosaicData::CPS85)
mod_effect(mod2, ~ educ)
mod_effect(mod2, ~ educ, class_level = "manag")

Create bootstrapped ensembles of a model

Description

Create bootstrapped ensembles of a model

Usage

mod_ensemble(model, nreps = 2, data = NULL)
mod_ensemble(model, nreps = 2, data = NULL)

Arguments

`model`	a model whose data will be used for resampling
`nreps`	how many resampling trials should be created
`data`	a data table to use for resampling. This is not needed for many common model types, such as `lm`, `glm`, etc. See details.

Details

The approach to bootstrapping implemented by this function is to create a set of bootstrap trials all in one go. Then, other functions such as mod_effect() and mod_eval() will be used to extract the information from each of the bootstrap replicates. Many model types in R carry the data used to train the model as part of the model object produced. For these types of models, e.g. lm and glm, there is no need to provide a value for the data argument. But there are some types of models for which the training data cannot be extracted from the model object. In such situations, you use data = to provide the data set to use for resampling.

Mean square prediction error

Description

Compares model predictions to the actual value of the response variable. To do this, testing data must be provided with both the input variables and the corresponding response variable. The measure calculated for a quantitative response variable is the mean square prediction error (MSPE). For categorical response variables, an analog of MSPE can be calculated (see details) but by default, a mean log-likelihood (mean per case) is computed instead.

Usage

mod_error(model, testdata, error_type = c("default", "mse", "sse", "mad",
  "LL", "mLL", "dev", "class_error"))
mod_error(model, testdata, error_type = c("default", "mse", "sse", "mad",
  "LL", "mLL", "dev", "class_error"))

Arguments

model

The model whose prediction error is to be estimated.

testdata

A data frame giving both model inputs and the actual value of the response variable. If no testing data is provided, the training data will be used and a warning issued.

error_type

The measure of error you are interested in. By default, this is mean-square error for regression models and log-likelihood for classifiers. The choices are:

"mse" – mean square error
"sse" – sum of square errors
"mad" – mean absolute deviation
"LL" – log-likelihood
"mLL" – mean log-likehood (per case in the testing data)
"dev" – deviance. (Plus a constant, which is often zero. The constant is fixed for a given testing data set, regardless of the model. So differences between deviances of two models are correct.)
"class_error" – classification error rate.

Details

When the response variable is categorical, the model (called a 'classifier' in such situations) must be capable of computing probabilities for each output rather than just a bare category. This is true for many commonly encountered classifier model architectures.

The analog of the mean squared error for classifiers is the mean of (1-p)^2, where p is the probability assigned by the model to the actual output. This is a rough approximation to the log-likelihood. By default, the log-likelihood will be calculated, but for pedagogical reasons you may prefer (1-p)^2, in which case set error_type = "mse". Classifiers can assign a probability of zero to the actual output, in which case the log-likelihood is -Inf. The "mse" error type avoids this.

Examples

mod <- lm(mpg ~ hp + wt, data = mtcars)
mod_error(mod) # In-sample prediction error.
## Not run: 
classifier <- rpart::rpart(Species ~ ., data = iris)
mod_error(classifier)
mod_error(classifier, error_type = "LL") 
# More typically
inds <- sample(1:nrow(iris), size = 100)
Training <- iris[inds, ]
Testing  <- iris[ - inds, ]
classifier <- rpart::rpart(Species ~ ., data = Training)
# This may well assign zero probability to events that appeared in the
# Testing data 
mod_error(classifier, testdata = Testing)
mod_error(classifier, testdata = Testing, error_type = "mse")

## End(Not run)
mod <- lm(mpg ~ hp + wt, data = mtcars)
mod_error(mod) # In-sample prediction error.
## Not run: 
classifier <- rpart::rpart(Species ~ ., data = iris)
mod_error(classifier)
mod_error(classifier, error_type = "LL") 
# More typically
inds <- sample(1:nrow(iris), size = 100)
Training <- iris[inds, ]
Testing  <- iris[ - inds, ]
classifier <- rpart::rpart(Species ~ ., data = Training)
# This may well assign zero probability to events that appeared in the
# Testing data 
mod_error(classifier, testdata = Testing)
mod_error(classifier, testdata = Testing, error_type = "mse")

## End(Not run)

Evaluate a model for specified inputs

Description

Find the model outputs for specified inputs. This is equivalent to the generic predict() function, except it will choose sensible values by default. This simplifies getting a quick look at model values.

Usage

mod_eval(model = NULL, data = NULL, append = TRUE, interval = c("none",
  "prediction", "confidence"), nlevels = 2, bootstrap = 0, ...,
  on_training = FALSE)
mod_eval(model = NULL, data = NULL, append = TRUE, interval = c("none",
  "prediction", "confidence"), nlevels = 2, bootstrap = 0, ...,
  on_training = FALSE)

Arguments

`model`	the model to display graphically
`data`	optional set of cases from which to extract levels for explanatory variables
`append`	flag whether to include the inputs to the model along with the calculated model value in the output. Default: `TRUE`.
`interval`	the type of interval to use: "none", "confidence", "prediction". But not all types are available for all model architectures.
`nlevels`	how many levels to construct for input variables. (default: 3) For quantitative variables, this is a suggestion; an attempt is made to have the levels equally spaced. If you're dissatisfied with the result, use the ... to specify exactly what levels you want for any variable.
`bootstrap`	if > 1, the number of bootstrap trials to run to construct a standard error on the model output for each value of the inputs. This is an alternative to `interval`; you can't use both.
`...`	arguments about or values at which to evaluate the model or the kind of output to be passed along to predict(). Unlike `data =` the variables given in `at =` or `...` will be crossed, so that the evaluation will occur at all combinations of the various levels.
`on_training`	flag whether to use the training data for evaluation. Only needed when there are random terms, e.g. from `rand()`, `shuffle()`, .... See details.

Details

There are four distinct ways to specify the values at which the model is to be evaluated. (1) Look for some "typical values" in the data to create a handful of inputs. This is useful for getting a quick look at what the output of the model looks like. This is the default behavior. (2) Using data = to a dataframe containing the explanatory variables will evaluate the model at all of the cases contained in that dataframe. (3) Setting input variables explicitly by using arguments of the form var_name = values, e.g. sex = "F". If not all input variables are specified in this way, the ones that are not will have values set per (1). All combinations of the various variables will be created. See the nlevels argument. (4) Evaluating the model on the training data. There are two ways to do this. The first is to set the data argument to the same data frame used to train the model. The second is to use the on_training = TRUE argument. These are equivalent unless there is some random component among the explanatory terms, as with mosaic::rand(), mosaic::shuffle() and so on.

Value

A dataframe containing both the explanatory variable inputs and the resulting model output (in the model_value field). This differs from the output of predict(), which for many model classes/architectures may be a vector or matrix.

A data frame containing both the inputs to the model and the corresponding outputs.

Examples

## Not run: 
mod1 <- lm(wage ~ age * sex + sector, data = mosaicData::CPS85)
mod_eval(mod1)
mod2 <- glm(married == "Married" ~ age + sex * sector,
            data = mosaicData::CPS85, family = "binomial")
mod_eval(mod2, nlevels = 2)
mod_eval(mod2, nlevels = 2, sex = "F") 

## End(Not run)
## Not run: 
mod1 <- lm(wage ~ age * sex + sector, data = mosaicData::CPS85)
mod_eval(mod1)
mod2 <- glm(married == "Married" ~ age + sex * sector,
            data = mosaicData::CPS85, family = "binomial")
mod_eval(mod2, nlevels = 2)
mod_eval(mod2, nlevels = 2, sex = "F") 

## End(Not run)

Internal functions for evaluating models

Description

These functions are the interface to the various model types for mod_eval(), and through that to all the other mod_ functions that need to evaluate models, e.g. mod_effect(), mod_cv(), and so on.

Usage

mod_eval_fun(model, data = NULL, interval = "none", ...)
mod_eval_fun(model, data = NULL, interval = "none", ...)

Arguments

`model`	A model object of the classes permitted
`data`	Usually, a data table specifying the inputs to the model. But if not specified, the training data will be used.
`interval`	One of "none", "confidence", or "prediction". Not all model types support "prediction" or even "confidence".
`...`	additional arguments

Details

All of the eval_ functions are ex These functions return a numerical vector (for regression types) or a matrix of probabilities (for classifiers)

Transforms a model into a function of inputs -> output

Description

Implicit in many statistical models is a function that takes the explanatory variables as inputs and returns the corresponding model value at those inputs. mod_fun creates an R function that works this way. The function returned by 'mod_fun“ has arguments named for each of the explanatory variables. In calling that returned function, you can specify as many or as few of these as you like.

Usage

mod_fun(mod, nlevels = 1)
mod_fun(mod, nlevels = 1)

Arguments

`mod`	the model to be rendered in a functional form
`nlevels`	the number of levels for which to find "typical levels" for those arguments not specified in the call to the returned function

Details

When you evaluate the function, you can set the values of all, any, or none of the arguments. Any arguments that you do not set will automatically be set to "typical values" as in mod_eval.

There's nothing essential about the behavior of 'mod_eval“ that explicitly names the arguments to the model function with the names of the explanatory variables. This has been done purely for pedagogical reasons, as a reminder of what those variables are and to make it possible to spot mistaken inputs to models.

Value

a function whose arguments are the explanatory variable used in the model

Examples

my_mod <- lm(mpg ~ hp * cyl, data = mtcars)
f <- mod_fun(my_mod)
names(formals(f)) # the arguments will be the explanatory variables
f(hp = 1:2)
f(hp = 1:2, cyl = 3:4)
f() # typical values for inputs
my_mod <- lm(mpg ~ hp * cyl, data = mtcars)
f <- mod_fun(my_mod)
names(formals(f)) # the arguments will be the explanatory variables
f(hp = 1:2)
f(hp = 1:2, cyl = 3:4)
f() # typical values for inputs

Plot out model values

Description

Plot out model values

Usage

mod_plot(model = NULL, formula = NULL, data = NULL, bootstrap = 0,
  nlevels = 3, at = list(), class_level = NULL, interval = c("none",
  "confidence", "prediction"), post_transform = NULL, size = 1,
  alpha = 0.8, ...)
mod_plot(model = NULL, formula = NULL, data = NULL, bootstrap = 0,
  nlevels = 3, at = list(), class_level = NULL, interval = c("none",
  "confidence", "prediction"), post_transform = NULL, size = 1,
  alpha = 0.8, ...)

Arguments

`model`	the model to display graphically. Can also be an ensemble produced with `mod_ensemble()`
`formula`	setting the y ~ x + color variables
`data`	optional data set from which to extract levels for explanatory variables
`bootstrap`	when > 1, this will generate bootstrap replications of the model and plot all of them. Use as an alternative to `interval` for confidence intervals.
`nlevels`	how many levels to display for those variables shown at discrete levels
`at`	named list giving specific values at which to hold the variables. You can accomplish this without forming a list by using `...`. See examples.
`class_level`	character string. If a probability for a classifier is being shown, which levels of the response variable to use in the plot. (Default: the first one.)
`interval`	show confidence or prediction intervals: values "none", "confidence", "prediction"
`post_transform`	a scalar transformation and new name for the response variable, e.g. `post_transform = c(price = exp)` to undo a log transformation of price.
`size`	numerical value for line width (default: 1)
`alpha`	numerical value in 0 to 1 for transparency (default: 0.8)
`...`	specific values for explantory variables

Examples

## Not run: 
mod1 <- lm(wage ~ age * sex + sector, data = mosaicData::CPS85)
mod_plot(mod1)
mod_plot(mod1, n = Inf, interval = "confidence") 
mod_plot(mod1, ~ sector + sex + age) # not necessarily a good ordering
mod_plot(mod1, ~ age + sex + sector, nlevels = 8) 
mod2 <- lm(log(wage) ~ age + sex + sector, data = mosaicData::CPS85)
mod_plot(mod2, post_transform = c(wage = exp), 
     interval = "confidence") # undo the log in the display
mod3 <- glm(married == "Married" ~ age + sex * sector,
            data = mosaicData::CPS85, family = "binomial")
mod_plot(mod3)
E3 <- mod_ensemble(mod3, 10)
mod_plot(E3)
mod4 <- rpart::rpart(sector ~ age + sex + married, data = mosaicData::CPS85)
mod_plot(mod4)
mod_plot(mod4, class_level = "manag")
mod5 <- randomForest::randomForest(
         sector ~ age + sex + married, data = mosaicData::CPS85)
mod_plot(mod5)
mod_plot(mod5, class_level = "manag")

## End(Not run)
## Not run: 
mod1 <- lm(wage ~ age * sex + sector, data = mosaicData::CPS85)
mod_plot(mod1)
mod_plot(mod1, n = Inf, interval = "confidence") 
mod_plot(mod1, ~ sector + sex + age) # not necessarily a good ordering
mod_plot(mod1, ~ age + sex + sector, nlevels = 8) 
mod2 <- lm(log(wage) ~ age + sex + sector, data = mosaicData::CPS85)
mod_plot(mod2, post_transform = c(wage = exp), 
     interval = "confidence") # undo the log in the display
mod3 <- glm(married == "Married" ~ age + sex * sector,
            data = mosaicData::CPS85, family = "binomial")
mod_plot(mod3)
E3 <- mod_ensemble(mod3, 10)
mod_plot(E3)
mod4 <- rpart::rpart(sector ~ age + sex + married, data = mosaicData::CPS85)
mod_plot(mod4)
mod_plot(mod4, class_level = "manag")
mod5 <- randomForest::randomForest(
         sector ~ age + sex + married, data = mosaicData::CPS85)
mod_plot(mod5)
mod_plot(mod5, class_level = "manag")

## End(Not run)

`mosaicModel` package

Description

Functions for teaching about modeling.

Details

The package offers a handful of high-level functions for evaluating, displaying, and interpreting models that work in a consistent way across model architectures, e.g. lm, glm, rpart, randomForest, knn3, caret-train, and so on.

mod_eval() – evaluate a model, that is, turn inputs into model values. For many model architectures, you can also get prediction or confidence intervals on the outputs.
mod_plot() – produce a graphical display of the "shape" of a model. There can be as many as 4 input variables shown, along with the output.
mod_effect() – calculate effect sizes, that is, how a change in an input variable changes the output
mod_error() – find the mean square prediction error (or the log likelihood)
mod_ensemble() – create an ensemble of bootstrap replications of the model, that is, models fit to resampled data from the original model.
mod_cv() – carry out cross validation on one or more models.
mod_fun() – extract a function from a model that implements the inputs-to-output relationship. mosaicModel stays out of the business of training models. You do that using functions, e.g.

the familiar lm or glm provided by the stats package
train from the caret package for machine learning
rpart, randomForest, rlm, and other functions provided by other packages

Metabolism of zebra mussels

Description

Zebra mussels are a small, fast reproducing species of freshwater mussel native to the lakes of southeast Russia. They have accidentally been introduced in other areas, competing with native species and creating problems for people as they cover the undersides of docks and boats, clog water intakes and other underwater structures. Zebra mussels even attach themselves to other mussels, sometimes starving those mussels.

Usage

data(Mussels)
data(Mussels)

Format

A data frame Mussels with 30 rows and 11 variables.

GroupID ID for the cluster of mussels growing on a substrate.
dry.mass The mass of the mussels (as a group) after dehydration.
count How many mussels were in the cluster.
attachment The substrate to which the mussels were attached.
lipid Percentage of dry mass that is lipid.
protein Percentage of dry mass that is protein.
carbo Percentage of dry mass that is carbohydrate.
ash Percentage of dry mass that is ash.
ammonia Nitrogen excretion measured as ammonia in mg per hour for the group.
Kcal Total calorific value of the tissue in kilo-calories per gram.
O2 Oxygen uptake in mg per hour for the group.

Details

Ecologists Shirley Baker and Daniel Hornbach examined whether zebra mussels gain an advantage by attaching to other mussels rather than to rocks.(baker-hornbach-2008) The ecologists collected samples of small rocks and Amblema plicata mussels, each of which had a collection of zebra mussels attached. The samples were transported to a laboratory where the group of mussels from each individual rock or Amblema were removed and placed in an aquarium equipped to measure oxygen uptake and ammonia excretion. After these physiological measurements were made, the biochemical composition of the mussel tissue was determined: the percentage of protein, lipid, carbohydrate, and ash. Baker and Hornbach found that zebra mussels attached to Amblema had greater physiological activity than those attached to rocks as measured by oxygen uptake and ammonia excretion. But this appears to be a sign of extra effort for the Amblema-attached zebra mussels, since they had lower carbohydrate and lipid levels. In other words, attaching to Amblema appears to be disadvantageous to the zebra mussels compared to attaching to a rock.

Examples

Mussels$ind.mass <- with(Mussels, dry.mass/count)
mod_1 <- lm(O2/count ~ attachment, data = Mussels)
mod_2 <- lm(ammonia/count ~ attachment, data = Mussels)
mod_3 <- lm(O2/count ~ ind.mass + attachment, data = Mussels)
mod_4 <- lm(ammonia/count ~ ind.mass + attachment, data = Mussels)
Mussels$ind.mass <- with(Mussels, dry.mass/count)
mod_1 <- lm(O2/count ~ attachment, data = Mussels)
mod_2 <- lm(ammonia/count ~ attachment, data = Mussels)
mod_3 <- lm(O2/count ~ ind.mass + attachment, data = Mussels)
mod_4 <- lm(ammonia/count ~ ind.mass + attachment, data = Mussels)

Gene expression in cancer cells.

Description

The data come from a National Cancer Institute study of gene expression in cell lines drawn from various sorts of cancer. Each row corresponds to a different cell line. The type of cancer is identified by the first two or three letters of the row names, e.g. CO is colon, ME is melanoma, RE is kidney.

Usage

data(NCI60_snippet)
data(NCI60_snippet)

Format

A data frame NCI60_snippet with 60 rows and 6000 variables.

Details

For each row, there are 6000 measurements of the gene expression as identified by activity on a microarray probe. The variable names are the names of the probes.

Examples

data(NCI60_snippet)
data(NCI60_snippet)

Historical production of crude oil, worldwide 1880-2014

Description

Annual production of crude oil, in millions of barrels (mbbl).

Usage

data(Oil_history)
data(Oil_history)

Format

A data frame with 47 cases, each of which is a US state, with observations on the following variables.

year the year for which production is reported
mbbl oil production in millions of barrels

Source

Assembled from older information from RH Romer (1976) "Energy: An Introduction to Physics" and more recent data from data.oecd.org.

Examples

model <- lm(log(mbbl) ~ year, data = Oil_history)
mod_plot(model)
model <- lm(log(mbbl) ~ year, data = Oil_history)
mod_plot(model)

Function builder for proportions.

Description

Evaluate this and hand the result to df_stats()

Usage

proportion(nm = NULL)
proportion(nm = NULL)

Arguments

`nm`	The level for which to find the proportion

Examples

## Not run: 
df_stats(mtcars, ~ cyl, proportion(6))

## End(Not run)
## Not run: 
df_stats(mtcars, ~ cyl, proportion(6))

## End(Not run)

Compute sensible values from a data set for use as a baseline

Description

Compute sensible values from a data set for use as a baseline

Usage

reference_values(data, n = 1, at = list())
reference_values(data, n = 1, at = list())

Arguments

`data`	a data frame
`n`	number of values for specified variables: could be a single number or a list assigning a number of levels for individual variables
`at`	optional values at which to set values: a list whose names are the variables whose values are to be set.

Details

Variables not listed in at will be assigned levels using these principles: Categorical variables: the most populated levels. Quantitative variables: central quantiles, e.g. median for n=1,

Performance of runners in a ten-mile race as they age

Description

These data are assembled from the records of the "Cherry Blossom Ten Miler" held in Washington, DC each April. The records span the years 1999 to 2008.

Usage

data(Runners)
data(Runners)

Format

A data frame Runners with 24,334 rows and 8 variables.

age The runners age at the time of the race.
net The time (in min.) elapsed from when the runner crossed the start line until the finish.
gun The time (in min.) from when the starter's gun was fired to when the runner finished the race. With so many participants, some runners do not reach the start line until several minutes after the gun.
sex The runner's sex.
previous How many times the runner participated before this year's race.
nruns How many of the 10 years' runs the runner eventually participated in.
start_position A made-up categorical description of where the runner was situated in the line up for the race..

Details

There are about 10,000 participants each year, of whom roughly half have run the race previously. During the ten years of these records, some 25 runners ran in ech of the years, 73 ran in nine of the years, and so on. The data allow you to track the performace of runners as they age. This simplified version of the data does not include personal identifying data other than sex, age, and number of previous runs in any given year. (Runs before 1999 are not considered.)

Examples

mod_1 = lm(net ~ age + sex, data = Runners)
mod_1 = lm(net ~ age + sex, data = Runners)

Simulated data bearing on school vouchers

Description

In the US, there have been controversial proposals to provide vouchers to students in failing public schools. The vouchers would allow the students to attend private schools. There are arguments pro and con that are often rooted in political philosophy (free choice!) and politics. The presumption behind the pro arguments is that attending private schools would create better outcomes for students.

Usage

data(School_data)
data(School_data)

Format

A data frame with 500 cases, each of which is a simulated student, with observations on the following variables.

test_score a simulated test score for the student
school whether the student attended public or private school
lottery whether the student was entered into a lottery for a private-school voucher
group the racial/ethnic group of the student
acad_motivation the overall level of involvement and concern of the student's parents for the student's academic performance
relig_motivation the overall level of interest motivated by religion. This is potentially an issue because a large majority of urban private schools are Catholic.

Details

A reasonable way to test this presumption is to compare test scores for students in public and private schools. One famous analysis (Howell and Peterson, 2003, "The Education Gap: Vouchers and Urban Schools") found that voucher schools are most helpful for African-American students, and not so much for white or Hispanic students.

The School_data data frame comes from a simulation designed by the package author to replicate the overall results but supporting a very different policy recommendation. WARNING: This is just a simulation, reflecting one hypothesis about how the world might work. Don't be tempted to draw conclusions about the actual factors involved in school performance from such simulated data.

Examples

lm(test_score ~ school, data = School_data)
# the simulation mechanism itself:
nstudents <- 500
acad_motivation <- rnorm(nstudents)
group <- sample(c("black", "hispanic", "white"), replace = TRUE, size = nstudents)
relig_motivation <- ifelse( group == "black", -1, ifelse(group == "white", 0, 1))
relig_motivation <- rnorm(nstudents, mean = relig_motivation)
lottery <- (acad_motivation + relig_motivation) > 0
school <- ifelse( (runif(nstudents) + .8* lottery ) > 1, "private", "public")
test_score <- rnorm(nstudents, mean = 100 - 5 * (school == "private") + 20 * acad_motivation)
School_data <- data.frame(test_score, acad_motivation, group, relig_motivation, lottery, school)
lm(test_score ~ school, data = School_data)
# the simulation mechanism itself:
nstudents <- 500
acad_motivation <- rnorm(nstudents)
group <- sample(c("black", "hispanic", "white"), replace = TRUE, size = nstudents)
relig_motivation <- ifelse( group == "black", -1, ifelse(group == "white", 0, 1))
relig_motivation <- rnorm(nstudents, mean = relig_motivation)
lottery <- (acad_motivation + relig_motivation) > 0
school <- ifelse( (runif(nstudents) + .8* lottery ) > 1, "private", "public")
test_score <- rnorm(nstudents, mean = 100 - 5 * (school == "private") + 20 * acad_motivation)
School_data <- data.frame(test_score, acad_motivation, group, relig_motivation, lottery, school)

Swimming speed of tadpoles.

Description

Tim Watkins examined the swimming speed of tadpoles as a function of the water temperature, and the water temperature at which the tadpoles had been raised. Since size is a major determinant of speed, the tadpole's length was measured as well.

Usage

data(Tadpoles)
data(Tadpoles)

Format

A data frame Trucking_jobs with 129 rows and 11 variables.

group Whether the tadpole was raised in cold water ("c") or warm ("w").
rtemp The temperature (C) of the water in which the swimming speed was measured. (The swimming channel is called a "race".)
length The tadpole's length, in mm.
vmax The maximum swimming speed attained in one trial, in mm/sec.

Details

It was hypothesized that tadpoles would swim faster in the temperature of water close to that in which they had been raised.

Examples

mod_1 = lm(vmax ~ poly(rtemp, 2) * group + length, data = Tadpoles)
mod_1 = lm(vmax ~ poly(rtemp, 2) * group + length, data = Tadpoles)

Earnings of workers at a trucking company.

Description

A dataset from a mid-western US trucking company on annual earnings of its employees in 2007. Datasets like this are used in audits by the Federal Government to look for signs of discrimination.

Usage

data(Trucking_jobs)
data(Trucking_jobs)

Format

A data frame Trucking_jobs with 129 rows and 11 variables.

sex The employee's sex: M or F
earnings Annual earnings, in dollars. Hourly wages have been converted to a full-time basis.
age The employee's age, in years.
title The employee's job title.
hiredyears How long the employee has been working for the company.

Examples

mod_1 = lm(earnings ~ age + hiredyears + sex, data = Trucking_jobs)
mod_1 = lm(earnings ~ age + hiredyears + sex, data = Trucking_jobs)

Prices of used Ford automobiles in 2009

Description

These data were compiled by Macalester College students for a class project. Then-undergraduates Elise delMas, Emiliano Urbina, and Candace Groth collected the data from cars.com

Usage

data(Used_Fords)
data(Used_Fords)

Format

A data frame Used_Fords with 635 rows and 7 variables.

Price The asking price for the car in USD.
Year The model year of the car.
Mileage The reported odometer reading.
Location Which of the several regions the car was marketed in.
Color The car's body color.
Age The age of the car at the time the data were collected in 2009. This is directly related to Year

Examples

mod_1 <- lm(Price ~ Mileage, data = Used_Fords)
mod_2 <- lm(Price ~ Mileage + Age, data = Used_Fords)
mod_1 <- lm(Price ~ Mileage, data = Used_Fords)
mod_2 <- lm(Price ~ Mileage + Age, data = Used_Fords)

Package 'mosaicModel'

Help Index

Prices for life insurance

Description

Usage

Format

Details

Source

Examples

Nitrogen fixing by alder plants

Description

Usage

Format

Details

Source

References

Examples

Birth weights and maternal data

Description

Usage

Format

Source

Examples

Function builder for confidence intervals on proportions

Description

Usage

Arguments

Examples

Grades at a small college

Description

Usage

Format

Source

Examples

Calculate measures of collinearity

Description

Usage

Arguments

Examples

Construct a call for refitting a model from the model itself

Description

Usage

Arguments

Details

Interval statistics for use with df_stats()

Description

Usage

Arguments

Examples

Data from the US FBI Uniform Crime Report, 1960

Description

Usage

Format

Source

Examples

Extract training data from model

Description

Usage

Arguments

Details

Formula interface to counts

Description

Usage

Arguments

See Also

Joint and conditional proportions

Description

Usage

Arguments

Details

Examples

Find typical levels of explanatory variables in a model/dataset.

Description

Usage

Arguments

Details

Value

Examples

Get the names of the explanatory or response variables in a model

Description

`mosaicModel` package