Title: | Create Tidy Data Frames of Marginal Effects for 'ggplot' from Model Outputs |
---|---|
Description: | Compute marginal effects and adjusted predictions from statistical models and returns the result as tidy data frames. These data frames are ready to use with the 'ggplot2'-package. Effects and predictions can be calculated for many different models. Interaction terms, splines and polynomial terms are also supported. The main functions are ggpredict(), ggemmeans() and ggeffect(). There is a generic plot()-method to plot the results using 'ggplot2'. |
Authors: | Daniel Lüdecke [aut, cre] , Frederik Aust [ctb] , Sam Crawley [ctb] , Mattan S. Ben-Shachar [ctb] , Sean C. Anderson [ctb] |
Maintainer: | Daniel Lüdecke <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.0.7 |
Built: | 2024-12-19 11:25:59 UTC |
Source: | https://github.com/strengejacke/ggeffects |
After fitting a model, it is useful generate model-based estimates (expected values, or adjusted predictions) of the response variable for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables.
The ggeffects package computes marginal means and adjusted predicted
values for the response, at the margin of specific values or levels from
certain model terms. The package is built around three core functions:
predict_response()
(understanding results), test_predictions()
(testing
results for statistically significant differences) and plot()
(communicate
results).
By default, adjusted predictions or marginal means are by returned on the
response scale, which is the easiest and most intuitive scale to interpret
the results. There are other options for specific models as well, e.g. with
zero-inflation component (see documentation of the type
-argument). The
result is returned as consistent data frame, which is nicely printed by
default. plot()
can be used to easily create figures.
The main function to calculate marginal means and adjusted predictions is
predict_response()
. In previous versions of ggeffects, the functions
ggpredict()
, ggemmeans()
, ggeffect()
and ggaverage()
were used to
calculate marginal means and adjusted predictions. These functions are still
available, but predict_response()
as a "wrapper" around these functions is
the preferred way to do this now.
ggpredict()
calls get_predictions()
(which in turn calls
stats::predict()
)
ggemmeans()
calls emmeans::emmeans()
ggaverage()
calls marginaleffects::avg_predictions()
ggeffect()
calls effects::Effect()
## S3 method for class 'ggeffects' as.data.frame( x, row.names = NULL, optional = FALSE, ..., stringsAsFactors = FALSE, terms_to_colnames = FALSE ) ggaverage( model, terms, ci_level = 0.95, type = "fixed", typical = "mean", condition = NULL, back_transform = TRUE, vcov = NULL, vcov_args = NULL, weights = NULL, verbose = TRUE, ... ) ggeffect( model, terms, ci_level = 0.95, bias_correction = FALSE, verbose = TRUE, ... ) ggemmeans( model, terms, ci_level = 0.95, type = "fixed", typical = "mean", condition = NULL, interval = "confidence", back_transform = TRUE, vcov = NULL, vcov_args = NULL, bias_correction = FALSE, weights = NULL, verbose = TRUE, ... ) ggpredict( model, terms, ci_level = 0.95, type = "fixed", typical = "mean", condition = NULL, interval = "confidence", back_transform = TRUE, vcov = NULL, vcov_args = NULL, bias_correction = FALSE, verbose = TRUE, ... )
## S3 method for class 'ggeffects' as.data.frame( x, row.names = NULL, optional = FALSE, ..., stringsAsFactors = FALSE, terms_to_colnames = FALSE ) ggaverage( model, terms, ci_level = 0.95, type = "fixed", typical = "mean", condition = NULL, back_transform = TRUE, vcov = NULL, vcov_args = NULL, weights = NULL, verbose = TRUE, ... ) ggeffect( model, terms, ci_level = 0.95, bias_correction = FALSE, verbose = TRUE, ... ) ggemmeans( model, terms, ci_level = 0.95, type = "fixed", typical = "mean", condition = NULL, interval = "confidence", back_transform = TRUE, vcov = NULL, vcov_args = NULL, bias_correction = FALSE, weights = NULL, verbose = TRUE, ... ) ggpredict( model, terms, ci_level = 0.95, type = "fixed", typical = "mean", condition = NULL, interval = "confidence", back_transform = TRUE, vcov = NULL, vcov_args = NULL, bias_correction = FALSE, verbose = TRUE, ... )
x |
An object of class |
row.names |
|
optional |
logical. If |
... |
Arguments are passed down to |
stringsAsFactors |
logical: should the character vector be converted to a factor? |
terms_to_colnames |
Logical, if |
model |
A model object, or a list of model objects. |
terms |
Names of those terms from
|
ci_level |
Numeric, the level of the confidence intervals. Use
|
type |
Character, indicating whether predictions should be conditioned
on specific model components or not, or whether population or unit-level
predictions are desired. Consequently, most options only apply for survival
models, mixed effects models and/or models with zero-inflation (and their
Bayesian counter-parts); only exception is Note 1: For Note 2: If Note 3: If
When |
typical |
Character vector, naming the function to be applied to the
covariates (non-focal terms) over which the effect is "averaged". The
default is |
condition |
Named character vector, which indicates covariates that
should be held constant at specific values. Unlike |
back_transform |
Logical, if |
vcov |
Variance-covariance matrix used to compute uncertainty estimates (e.g., for confidence intervals based on robust standard errors). This argument accepts a covariance matrix, a function which returns a covariance matrix, or a string which identifies the function to be used to compute the covariance matrix.
If See details in this vignette. |
vcov_args |
List of arguments to be passed to the function identified by
the |
weights |
This argument is used in two different ways, depending on the
|
verbose |
Toggle messages or warnings. |
bias_correction |
Logical, if |
interval |
Type of interval calculation, can either be |
Please see ?predict_response
for details and examples.
A data frame (with ggeffects
class attribute) with consistent data columns:
"x"
: the values of the first term in terms
, used as x-position in plots.
"predicted"
: the predicted values of the response, used as y-position in plots.
"std.error"
: the standard error of the predictions. Note that the standard
errors are always on the link-scale, and not back-transformed for non-Gaussian
models!
"conf.low"
: the lower bound of the confidence interval for the predicted values.
"conf.high"
: the upper bound of the confidence interval for the predicted values.
"group"
: the grouping level from the second term in terms
, used as
grouping-aesthetics in plots.
"facet"
: the grouping level from the third term in terms
, used to indicate
facets in plots.
The estimated marginal means (or predicted values) are always on the response scale!
For proportional odds logistic regression (see ?MASS::polr
)
resp. cumulative link models (e.g., see ?ordinal::clm
),
an additional column "response.level"
is returned, which indicates
the grouping of predictions based on the level of the model's response.
Note that for convenience reasons, the columns for the intervals
are always named "conf.low"
and "conf.high"
, even though
for Bayesian models credible or highest posterior density intervals
are returned.
There is an as.data.frame()
method for objects of class ggeffects
,
which has an terms_to_colnames
argument, to use the term names as column
names instead of the standardized names "x"
etc.
A sample data set from a course about the analysis of factorial designs, by Mattan S. Ben-Shachar. See following link for more information: https://github.com/mattansb/Analysis-of-Factorial-Designs-foR-Psychologists
The data consists of five variables from 120 observations:
ID
: A unique identifier for each participant
sex
: The participant's sex
time
: The time of day the participant was tested (morning, noon, or afternoon)
coffee
: Group indicator, whether participant drank coffee or not
("coffee"
or "control"
).
alertness
: The participant's alertness score.
# Attach coffee-data data(coffee_data)
# Attach coffee-data data(coffee_data)
This function extracts the raw data points (i.e. the data
that was used to fit the model) and "averages" (i.e. "collapses") the
response variable over the levels of the grouping factor given in
collapse_by
. Only works with mixed models.
collapse_by_group(grid, model, collapse_by = NULL, residuals = FALSE)
collapse_by_group(grid, model, collapse_by = NULL, residuals = FALSE)
grid |
A data frame representing the data grid, or an object of class
|
model |
The model for which to compute partial residuals. The data grid
|
collapse_by |
Name of the (random effects) grouping factor. Data is collapsed by the levels of this factor. |
residuals |
Logical, if |
A data frame with raw data points, averaged over the levels of
the given grouping factor from the random effects. The group level of
the random effect is saved in the column "random"
.
library(ggeffects) data(efc, package = "ggeffects") efc$e15relat <- as.factor(efc$e15relat) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") model <- lme4::lmer(neg_c_7 ~ c161sex + (1 | e15relat), data = efc) me <- predict_response(model, terms = "c161sex") head(attributes(me)$rawdata) collapse_by_group(me, model, "e15relat")
library(ggeffects) data(efc, package = "ggeffects") efc$e15relat <- as.factor(efc$e15relat) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") model <- lme4::lmer(neg_c_7 ~ c161sex + (1 | e15relat), data = efc) me <- predict_response(model, terms = "c161sex") head(attributes(me)$rawdata) collapse_by_group(me, model, "e15relat")
An SPSS sample data set, imported with the sjlabelled::read_spss()
function. Consists of 28 variables from 908 observations. The data set is part
of the EUROFAMCARE project, a cross-national survey on informal caregiving in
Europe.
# Attach EFC-data data(efc) # Show structure str(efc) # show first rows head(efc)
# Attach EFC-data data(efc) # Show structure str(efc) # show first rows head(efc)
A sample data set, used in tests and some examples. Useful for demonstrating count models (with or without zero-inflation component). It consists of nine variables from 250 observations.
A generic print-method for ggeffects
-objects.
## S3 method for class 'ggeffects' format( x, variable_labels = FALSE, value_labels = FALSE, group_name = FALSE, row_header_separator = ", ", digits = 2, collapse_ci = FALSE, collapse_tables = FALSE, n, ... ) ## S3 method for class 'ggcomparisons' format(x, collapse_ci = FALSE, collapse_p = FALSE, combine_levels = FALSE, ...) ## S3 method for class 'ggeffects' print(x, group_name = TRUE, digits = 2, verbose = TRUE, ...) ## S3 method for class 'ggeffects' print_md(x, group_name = TRUE, digits = 2, ...) ## S3 method for class 'ggeffects' print_html( x, group_name = TRUE, digits = 2, theme = NULL, engine = c("tt", "gt"), ... ) ## S3 method for class 'ggcomparisons' print(x, collapse_tables = FALSE, ...) ## S3 method for class 'ggcomparisons' print_html( x, collapse_ci = FALSE, collapse_p = FALSE, theme = NULL, engine = c("tt", "gt"), ... ) ## S3 method for class 'ggcomparisons' print_md(x, collapse_ci = FALSE, collapse_p = FALSE, theme = NULL, ...)
## S3 method for class 'ggeffects' format( x, variable_labels = FALSE, value_labels = FALSE, group_name = FALSE, row_header_separator = ", ", digits = 2, collapse_ci = FALSE, collapse_tables = FALSE, n, ... ) ## S3 method for class 'ggcomparisons' format(x, collapse_ci = FALSE, collapse_p = FALSE, combine_levels = FALSE, ...) ## S3 method for class 'ggeffects' print(x, group_name = TRUE, digits = 2, verbose = TRUE, ...) ## S3 method for class 'ggeffects' print_md(x, group_name = TRUE, digits = 2, ...) ## S3 method for class 'ggeffects' print_html( x, group_name = TRUE, digits = 2, theme = NULL, engine = c("tt", "gt"), ... ) ## S3 method for class 'ggcomparisons' print(x, collapse_tables = FALSE, ...) ## S3 method for class 'ggcomparisons' print_html( x, collapse_ci = FALSE, collapse_p = FALSE, theme = NULL, engine = c("tt", "gt"), ... ) ## S3 method for class 'ggcomparisons' print_md(x, collapse_ci = FALSE, collapse_p = FALSE, theme = NULL, ...)
x |
An object of class |
variable_labels |
Logical, if |
value_labels |
Logical, if |
group_name |
Logical, if |
row_header_separator |
Character, separator between the different subgroups in the table output. |
digits |
Number of digits to print. |
collapse_ci |
Logical, if |
collapse_tables |
Logical, if |
n |
Number of rows to print per subgroup. If |
... |
Further arguments passed down to |
collapse_p |
Logical, if |
combine_levels |
Logical, if |
verbose |
Toggle messages. |
theme |
The theme to apply to the table. One of |
engine |
The engine to use for printing. One of |
format()
return a formatted data frame, print()
prints a
formatted data frame to the console. print_html()
returns a tinytable
object by default (unless changed with engine = "gt"
), which is printed as
HTML, markdown or LaTeX table (depending on the context from which
print_html()
is called, see tinytable::tt()
for details).
The verbose
argument can be used to display or silence messages and
warnings. Furthermore, options()
can be used to set defaults for the
print()
and print_html()
method. The following options are available,
which can simply be run in the console:
ggeffects_ci_brackets
: Define a character vector of length two, indicating
the opening and closing parentheses that encompass the confidence intervals
values, e.g. options(ggeffects_ci_brackets = c("[", "]"))
.
ggeffects_collapse_ci
: Logical, if TRUE
, the columns with predicted
values (or contrasts) and confidence intervals are collapsed into one
column, e.g. options(ggeffects_collapse_ci = TRUE)
.
ggeffects_collapse_p
: Logical, if TRUE
, the columns with predicted
values (or contrasts) and p-values are collapsed into one column, e.g.
options(ggeffects_collapse_p = TRUE)
. Note that p-values are replaced
by asterisk-symbols (stars) or empty strings when ggeffects_collapse_p = TRUE
,
depending on the significance level.
ggeffects_collapse_tables
: Logical, if TRUE
, multiple tables for
subgroups are combined into one table. Only works when there is more than
one focal term, e.g. options(ggeffects_collapse_tables = TRUE)
.
ggeffects_output_format
: String, either "text"
, "markdown"
or "html"
.
Defines the default output format from predict_response()
. If "html"
, a
formatted HTML table is created and printed to the view pane. "markdown"
creates a markdown-formatted table inside Rmarkdown documents, and prints
a text-format table to the console when used interactively. If "text"
or
NULL
, a formatted table is printed to the console, e.g.
options(ggeffects_output_format = "html")
.
ggeffects_html_engine
: String, either "tt"
or "gt"
. Defines the default
engine to use for printing HTML tables. If "tt"
, the tinytable package
is used, if "gt"
, the gt package is used, e.g.
options(ggeffects_html_engine = "gt")
.
Use options(<option_name> = NULL)
to remove the option.
data(efc, package = "ggeffects") fit <- lm(barthtot ~ c12hour + e42dep, data = efc) # default print predict_response(fit, "e42dep") # surround CI values with parentheses print(predict_response(fit, "e42dep"), ci_brackets = c("(", ")")) # you can also use `options(ggeffects_ci_brackets = c("[", "]"))` # to set this globally # collapse CI columns into column with predicted values print(predict_response(fit, "e42dep"), collapse_ci = TRUE) # include value labels print(predict_response(fit, "e42dep"), value_labels = TRUE) # include variable labels in column headers print(predict_response(fit, "e42dep"), variable_labels = TRUE) # include value labels and variable labels print(predict_response(fit, "e42dep"), variable_labels = TRUE, value_labels = TRUE) data(iris) m <- lm(Sepal.Length ~ Species * Petal.Length, data = iris) # default print with subgroups predict_response(m, c("Petal.Length", "Species")) # omit name of grouping variable in subgroup table headers print(predict_response(m, c("Petal.Length", "Species")), group_name = FALSE) # collapse tables into one print(predict_response(m, c("Petal.Length", "Species")), collapse_tables = TRUE, n = 3) # increase number of digits print(predict_response(fit, "e42dep"), digits = 5)
data(efc, package = "ggeffects") fit <- lm(barthtot ~ c12hour + e42dep, data = efc) # default print predict_response(fit, "e42dep") # surround CI values with parentheses print(predict_response(fit, "e42dep"), ci_brackets = c("(", ")")) # you can also use `options(ggeffects_ci_brackets = c("[", "]"))` # to set this globally # collapse CI columns into column with predicted values print(predict_response(fit, "e42dep"), collapse_ci = TRUE) # include value labels print(predict_response(fit, "e42dep"), value_labels = TRUE) # include variable labels in column headers print(predict_response(fit, "e42dep"), variable_labels = TRUE) # include value labels and variable labels print(predict_response(fit, "e42dep"), variable_labels = TRUE, value_labels = TRUE) data(iris) m <- lm(Sepal.Length ~ Species * Petal.Length, data = iris) # default print with subgroups predict_response(m, c("Petal.Length", "Species")) # omit name of grouping variable in subgroup table headers print(predict_response(m, c("Petal.Length", "Species")), group_name = FALSE) # collapse tables into one print(predict_response(m, c("Petal.Length", "Species")), collapse_tables = TRUE, n = 3) # increase number of digits print(predict_response(fit, "e42dep"), digits = 5)
get_predictions()
is the core function to return adjusted
predictions for a model, when calling ggpredict()
or predict_response()
with margin = "mean_reference"
(the default option for margin
).
Basically, the input contains the model object and a data grid that is
typically used for the newdata
argument of the predict()
method.
get_predictions()
can be used as S3-method for own classes, to add support
for new models in ggeffects and is only relevant for package developers.
There are no S3-class definitions for ggemmeans()
or ggaverage()
, because
these functions simply call methods from the emmeans or marginaleffects
packages. Hence, methods should be written for those packages, too, if a
model-object should work with ggemmeans()
or ggaverage()
.
get_predictions(model, ...) ## Default S3 method: get_predictions( model, data_grid = NULL, terms = NULL, ci_level = 0.95, type = NULL, typical = NULL, vcov = NULL, vcov_args = NULL, condition = NULL, interval = "confidence", bias_correction = FALSE, link_inverse = insight::link_inverse(model), model_info = NULL, verbose = TRUE, ... )
get_predictions(model, ...) ## Default S3 method: get_predictions( model, data_grid = NULL, terms = NULL, ci_level = 0.95, type = NULL, typical = NULL, vcov = NULL, vcov_args = NULL, condition = NULL, interval = "confidence", bias_correction = FALSE, link_inverse = insight::link_inverse(model), model_info = NULL, verbose = TRUE, ... )
model , terms , ci_level , type , typical , vcov , vcov_args , condition , interval , bias_correction , verbose
|
Arguments
from the call to |
... |
Further arguments, passed to |
data_grid |
A data frame containing the data grid (or reference grid)
with all relevant values of predictors for which the adjusted predictions
should be made. Typically the data frame that is passed to the |
link_inverse |
The model's family link-inverse function. Can be retrieved
using |
model_info |
An object returned by |
Adding support for ggeffects is quite easy. The user-level function is
predict_response()
, which either calls ggpredict()
, ggemmeans()
or
ggaverage()
. These function, in turn, call predict()
, emmeans::emmeans()
or marginaleffects::avg_predictions()
. Following needs to be done to add
support for new model classes:
emmeans: if your model is supported by emmeans, it is automatically
supported by ggemmeans()
. Thus, you need to add the corresponding methods
to your package so that your model class is supported by **emmeans.
marginaleffects: similar to emmeans, if your package is supported
by the marginaleffects package, it works with ggaverage()
.
predict: in order to make your model class work with ggpredict()
,
you need to add a get_predictions()
method. The here documented arguments
are all passed from predict_response()
to get_predictions()
, no
matter if they are required to calculate predictions or not. Thus, it is
not necessary to process all of those arguments, but they can be used to
modulate certain settings when calculating predictions. Note that if your
method does not define all mentioned arguments, these are still passed via
...
- make sure that further methods in your get_predictions()
method
still work when they process the ...
. It is important that the function
returns a data frame with a specific structure, namely the data grid and
the columns predicted
, conf.low
, and conf.high
. Predictions and
intervals should be on the response scale.
A simple example for an own class-implementation for Gaussian-alike models could look like this:
get_predictions.own_class <- function(model, data_grid, ci_level = 0.95, ...) { predictions <- predict( model, newdata = data_grid, type = "response", se.fit = !is.na(ci_level), ... ) # do we have standard errors? if (is.na(ci_level)) { # copy predictions data_grid$predicted <- as.vector(predictions) } else { # copy predictions data_grid$predicted <- predictions$fit # calculate CI data_grid$conf.low <- predictions$fit - qnorm(0.975) * predictions$se.fit data_grid$conf.high <- predictions$fit + qnorm(0.975) * predictions$se.fit # optional: copy standard errors attr(data_grid, "std.error") <- predictions$se.fit } data_grid }
A simple example for an own class-implementation for non-Gaussian-alike models
could look like this (note the use of the link-inverse function link_inverse()
,
which is passed to the link_inverse
argument):
get_predictions.own_class <- function(model, data_grid, ci_level = 0.95, link_inverse = insight::link_inverse(model), ...) { predictions <- predict( model, newdata = data_grid, type = "link", # for non-Gaussian, return on link-scale se.fit = !is.na(ci_level), ... ) # do we have standard errors? if (is.na(ci_level)) { # copy predictions data_grid$predicted <- link_inverse(as.vector(predictions)) } else { # copy predictions, use link-inverse to back-transform data_grid$predicted <- link_inverse(predictions$fit) # calculate CI data_grid$conf.low <- link_inverse( predictions$fit - qnorm(0.975) * predictions$se.fit ) data_grid$conf.high <- link_inverse( predictions$fit + qnorm(0.975) * predictions$se.fit ) # optional: copy standard errors attr(data_grid, "std.error") <- predictions$se.fit } data_grid }
A data frame that contains
the data grid (from the argument data_grid
)
the columns predicted
, conf.low
, and conf.high
optionally, the attribute "std.error"
with the standard errors.
Note that predictions and confidence intervals should already be transformed
to the response scale (e.g., by using insight::link_inverse()
). The
standard errors are always on the link scale (not transformed).
If values are not available (for example, confidence intervals), set their
value to NA
.
Get variable and value labels from ggeffects
-objects. predict_response()
saves information on variable names and value labels as additional attributes
in the returned data frame. This is especially helpful for labelled data
(see sjlabelled), since these labels can be used to set axis labels and
titles.
get_title(x, case = NULL) get_x_title(x, case = NULL) get_y_title(x, case = NULL) get_legend_title(x, case = NULL) get_legend_labels(x, case = NULL) get_x_labels(x, case = NULL) get_complete_df(x, case = NULL)
get_title(x, case = NULL) get_x_title(x, case = NULL) get_y_title(x, case = NULL) get_legend_title(x, case = NULL) get_legend_labels(x, case = NULL) get_x_labels(x, case = NULL) get_complete_df(x, case = NULL)
x |
An object of class |
case |
Desired target case. Labels will automatically converted into the
specified character case. See |
The titles or labels as character string, or NULL
, if variables
had no labels; get_complete_df()
returns the input list x
as single data frame, where the grouping variable indicates the
predicted values for each term.
library(ggeffects) library(ggplot2) data(efc, package = "ggeffects") efc$c172code <- datawizard::to_factor(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) mydf <- predict_response(fit, terms = c("c12hour", "c161sex", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm") + facet_wrap(~facet, ncol = 2) + labs( x = get_x_title(mydf), y = get_y_title(mydf), colour = get_legend_title(mydf) ) # adjusted predictions, a list of data frames (one data frame per term) eff <- ggeffect(fit) eff get_complete_df(eff) # adjusted predictions for education only, and get x-axis-labels mydat <- eff[["c172code"]] ggplot(mydat, aes(x = x, y = predicted, group = group)) + stat_summary(fun = sum, geom = "line") + scale_x_discrete(labels = get_x_labels(mydat))
library(ggeffects) library(ggplot2) data(efc, package = "ggeffects") efc$c172code <- datawizard::to_factor(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) mydf <- predict_response(fit, terms = c("c12hour", "c161sex", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm") + facet_wrap(~facet, ncol = 2) + labs( x = get_x_title(mydf), y = get_y_title(mydf), colour = get_legend_title(mydf) ) # adjusted predictions, a list of data frames (one data frame per term) eff <- ggeffect(fit) eff get_complete_df(eff) # adjusted predictions for education only, and get x-axis-labels mydat <- eff[["c172code"]] ggplot(mydat, aes(x = x, y = predicted, group = group)) + stat_summary(fun = sum, geom = "line") + scale_x_discrete(labels = get_x_labels(mydat))
This function can be used to install the latest package version of ggeffects, either the development version (from R-universe/GitHub) or the current version from CRAN.
install_latest( source = c("development", "cran"), force = FALSE, verbose = TRUE )
install_latest( source = c("development", "cran"), force = FALSE, verbose = TRUE )
source |
Character. Either |
force |
Logical, if |
verbose |
Toggle messages. |
Invisible NULL
.
# install latest development-version of ggeffects from the # r-universe repository install_latest()
# install latest development-version of ggeffects from the # r-universe repository install_latest()
Function conduct a spotlight-analysis to create so-called
Johnson-Neyman intervals. The plot()
method can be used to visualize the
results of the Johnson-Neyman test.
johnson_neyman(x, precision = 500, p_adjust = NULL, ...) spotlight_analysis(x, precision = 500, p_adjust = NULL, ...) ## S3 method for class 'ggjohnson_neyman' plot( x, colors = c("#f44336", "#2196F3"), show_association = TRUE, show_rug = FALSE, verbose = TRUE, ... )
johnson_neyman(x, precision = 500, p_adjust = NULL, ...) spotlight_analysis(x, precision = 500, p_adjust = NULL, ...) ## S3 method for class 'ggjohnson_neyman' plot( x, colors = c("#f44336", "#2196F3"), show_association = TRUE, show_rug = FALSE, verbose = TRUE, ... )
x |
An object of class |
precision |
Number of values used for the range of the moderator variable
to calculate the Johnson-Neyman interval. This argument is passed down to
|
p_adjust |
Character vector, if not |
... |
Arguments passed down to |
colors |
Colors used for the plot. Must be a vector with two color
values. Only used if |
show_association |
Logical, if |
show_rug |
Logical, if |
verbose |
Show/hide printed message for plots. |
The Johnson-Neyman intervals help to understand where slopes are significant
in the context of interactions in regression models. Thus, the interval is only
useful if the model contains at least one interaction term. The function
accepts the results of a call to predict_response()
. The first and the
last focal term used in the terms
argument of predict_response()
must
be numeric. The function will then test the slopes of the first focal terms
against zero, for different moderator values of the last focal term. If only
one numeric focal term is given, the function will create contrasts by levels
of the categorical focal term. Use plot()
to create a plot of the results.
To avoid misleading interpretations of the plot, we speak of "positive" and "negative" associations, respectively, and "no clear" associations (instead of "significant" or "non-significant"). This should prevent the user from considering a non-significant range of values of the moderator as "accepting the null hypothesis".
A data frame including contrasts of the test_predictions()
for the
given interaction terms; for plot()
, returns a Johnson-Neyman plot.
Note that p-value adjustment for methods supported by p.adjust()
(see also
p.adjust.methods
), each row is considered as one set of comparisons, no
matter which test
was specified. That is, for instance, when test_predictions()
returns eight rows of predictions (when test = NULL
), and p_adjust = "bonferroni"
,
the p-values are adjusted in the same way as if we had a test of pairwise
comparisons (test = "pairwise"
) where eight rows of comparisons are
returned. For methods "tukey"
or "sidak"
, a rank adjustment is done
based on the number of combinations of levels from the focal predictors
in terms
. Thus, the latter two methods may be useful for certain tests
only, in particular pairwise comparisons.
For johnson_neyman()
, the only available adjustment methods are "fdr"
(or "bh"
) (Benjamini & Hochberg (1995)) and "esarey"
(or "es"
)
(Esarey and Sumner 2017). These usually return similar results. The major
difference is that "fdr"
can be slightly faster and more stable in edge
cases, however, confidence intervals are not updated. Only the p-values are
adjusted. "esarey"
is slower, but confidence intervals are updated as well.
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40(3), 373-400. doi: 10.1207/s15327906mbr4003_5
Esarey, J., & Sumner, J. L. (2017). Marginal effects in interaction models: Determining and controlling the false positive rate. Comparative Political Studies, 1–33. Advance online publication. doi: 10.1177/0010414017730080
Johnson, P.O. & Fay, L.C. (1950). The Johnson-Neyman technique, its theory and application. Psychometrika, 15, 349-367. doi: 10.1007/BF02288864
McCabe CJ, Kim DS, King KM. Improving Present Practices in the Visual Display of Interactions. Advances in Methods and Practices in Psychological Science. 2018;1(2):147-165. doi:10.1177/2515245917746792
Spiller, S. A., Fitzsimons, G. J., Lynch, J. G., & McClelland, G. H. (2013). Spotlights, Floodlights, and the Magic Number Zero: Simple Effects Tests in Moderated Regression. Journal of Marketing Research, 50(2), 277–288. doi:10.1509/jmr.12.0420
## Not run: data(efc, package = "ggeffects") efc$c172code <- as.factor(efc$c172code) m <- lm(neg_c_7 ~ c12hour * barthtot * c172code, data = efc) pr <- predict_response(m, c("c12hour", "barthtot")) johnson_neyman(pr) plot(johnson_neyman(pr)) pr <- predict_response(m, c("c12hour", "c172code", "barthtot")) johnson_neyman(pr) plot(johnson_neyman(pr)) # robust standard errors if (requireNamespace("sandwich")) { johnson_neyman(pr, vcov = sandwich::vcovHC) } ## End(Not run)
## Not run: data(efc, package = "ggeffects") efc$c172code <- as.factor(efc$c172code) m <- lm(neg_c_7 ~ c12hour * barthtot * c172code, data = efc) pr <- predict_response(m, c("c12hour", "barthtot")) johnson_neyman(pr) plot(johnson_neyman(pr)) pr <- predict_response(m, c("c12hour", "c172code", "barthtot")) johnson_neyman(pr) plot(johnson_neyman(pr)) # robust standard errors if (requireNamespace("sandwich")) { johnson_neyman(pr, vcov = sandwich::vcovHC) } ## End(Not run)
A sample data set, used in tests and examples for survival models. This dataset is originally included in the survival package, but for convenience reasons it is also available in this package.
Create a data frame for the "newdata"-argument that contains
all combinations of values from the terms in questions. Similar to
expand.grid()
. The terms
-argument accepts all shortcuts
for representative values as in predict_response()
.
new_data(model, terms, typical = "mean", condition = NULL, ...) data_grid(model, terms, typical = "mean", condition = NULL, ...)
new_data(model, terms, typical = "mean", condition = NULL, ...) data_grid(model, terms, typical = "mean", condition = NULL, ...)
model |
A fitted model object. |
terms |
Character vector with the names of those terms from |
typical |
Character vector, naming the function to be applied to the
covariates (non-focal terms) over which the effect is "averaged". The
default is |
condition |
Named character vector, which indicates covariates that
should be held constant at specific values. Unlike |
... |
Currently not used. |
A data frame containing one row for each combination of values of the supplied variables.
data(efc, package = "ggeffects") fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) new_data(fit, c("c12hour [meansd]", "c161sex")) nd <- new_data(fit, c("c12hour [meansd]", "c161sex")) pr <- predict(fit, type = "response", newdata = nd) nd$predicted <- pr nd # compare to predict_response(fit, c("c12hour [meansd]", "c161sex"))
data(efc, package = "ggeffects") fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) new_data(fit, c("c12hour [meansd]", "c161sex")) nd <- new_data(fit, c("c12hour [meansd]", "c161sex")) pr <- predict(fit, type = "response", newdata = nd) nd$predicted <- pr nd # compare to predict_response(fit, c("c12hour [meansd]", "c161sex"))
plot
is a generic plot-method for ggeffects
-objects.
ggeffects_palette()
returns show_palettes()
## S3 method for class 'ggeffects' plot( x, show_ci = TRUE, ci_style = c("ribbon", "errorbar", "dash", "dot"), show_data = FALSE, show_residuals = FALSE, show_residuals_line = FALSE, data_labels = FALSE, limit_range = FALSE, collapse_group = FALSE, show_legend = TRUE, show_title = TRUE, show_x_title = TRUE, show_y_title = TRUE, case = NULL, colors = NULL, alpha = 0.15, dot_size = NULL, dot_alpha = 0.35, dot_shape = NULL, line_size = NULL, jitter = NULL, dodge = 0.25, use_theme = TRUE, log_y = FALSE, connect_lines = FALSE, facets, grid, one_plot = TRUE, n_rows = NULL, verbose = TRUE, ... ) theme_ggeffects(base_size = 11, base_family = "") ggeffects_palette(palette = "metro", n = NULL) show_palettes()
## S3 method for class 'ggeffects' plot( x, show_ci = TRUE, ci_style = c("ribbon", "errorbar", "dash", "dot"), show_data = FALSE, show_residuals = FALSE, show_residuals_line = FALSE, data_labels = FALSE, limit_range = FALSE, collapse_group = FALSE, show_legend = TRUE, show_title = TRUE, show_x_title = TRUE, show_y_title = TRUE, case = NULL, colors = NULL, alpha = 0.15, dot_size = NULL, dot_alpha = 0.35, dot_shape = NULL, line_size = NULL, jitter = NULL, dodge = 0.25, use_theme = TRUE, log_y = FALSE, connect_lines = FALSE, facets, grid, one_plot = TRUE, n_rows = NULL, verbose = TRUE, ... ) theme_ggeffects(base_size = 11, base_family = "") ggeffects_palette(palette = "metro", n = NULL) show_palettes()
x |
An object of class |
show_ci |
Logical, if |
ci_style |
Character vector, indicating the style of the confidence
bands. May be either |
show_data |
Logical, if |
show_residuals |
Logical, if |
show_residuals_line |
Logical, if |
data_labels |
Logical, if |
limit_range |
Logical, if |
collapse_group |
For mixed effects models, name of the grouping variable
of random effects. If |
show_legend |
Logical, shows or hides the plot legend. |
show_title |
Logical, shows or hides the plot title- |
show_x_title |
Logical, shows or hides the plot title for the x-axis. |
show_y_title |
Logical, shows or hides the plot title for the y-axis. |
case |
Desired target case. Labels will automatically converted into the
specified character case. See |
colors |
Character vector with color values in hex-format, valid
color value names (see Following options are valid for
|
alpha |
Alpha value for the confidence bands. |
dot_size |
Numeric, size of the point geoms. |
dot_alpha |
Alpha value for data points, when |
dot_shape |
Shape of data points, when |
line_size |
Numeric, size of the line geoms. |
jitter |
Numeric, between 0 and 1. If not |
dodge |
Value for offsetting or shifting error bars, to avoid overlapping.
Only applies, if a factor is plotted at the x-axis (in such cases, the
confidence bands are replaced by error bars automatically), or if
|
use_theme |
Logical, if |
log_y |
Logical, if |
connect_lines |
Logical, if |
facets , grid
|
Logical, defaults to |
one_plot |
Logical, if |
n_rows |
Number of rows to align plots. By default, all plots are aligned in one row. For facets, or multiple panels, plots can also be aligned in multiiple rows, to avoid that plots are too small. |
verbose |
Logical, toggle warnings and messages. |
... |
Further arguments passed down to |
base_size |
Base font size. |
base_family |
Base font family. |
palette |
Name of a pre-defined color-palette as string. See
|
n |
Number of color-codes from the palette that should be returned. |
For proportional odds logistic regression (see ?MASS::polr
)
or cumulative link models in general, plots are automatically facetted
by response.level
, which indicates the grouping of predictions based on
the level of the model's response.
A ggplot2-object.
For generalized linear models (glms), residualized scores are
computed as inv.link(link(Y) + r)
where Y
are the predicted
values on the response scale, and r
are the working residuals.
For (generalized) linear mixed models, the random effect are also
partialled out.
Load library(ggplot2)
and use theme_set(theme_ggeffects())
to set
the ggeffects-theme as default plotting theme. You can then use further
plot-modifiers, e.g. from sjPlot, like legend_style()
or font_size()
without losing the theme-modifications.
There are pre-defined colour palettes in this package. Use show_palettes()
to show all available colour palettes as plot, or
ggeffects_palette(palette = NULL)
to show the color codes.
library(sjlabelled) data(efc) efc$c172code <- as_label(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) dat <- predict_response(fit, terms = "c12hour") plot(dat) # facet by group, use pre-defined color palette dat <- predict_response(fit, terms = c("c12hour", "c172code")) plot(dat, facet = TRUE, colors = "hero") # don't use facets, b/w figure, w/o confidence bands dat <- predict_response(fit, terms = c("c12hour", "c172code")) plot(dat, colors = "bw", show_ci = FALSE) # factor at x axis, plot exact data points and error bars dat <- predict_response(fit, terms = c("c172code", "c161sex")) plot(dat) # for three variables, automatic facetting dat <- predict_response(fit, terms = c("c12hour", "c172code", "c161sex")) plot(dat) # show color codes of specific palette ggeffects_palette("okabe-ito") # show all color palettes show_palettes()
library(sjlabelled) data(efc) efc$c172code <- as_label(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) dat <- predict_response(fit, terms = "c12hour") plot(dat) # facet by group, use pre-defined color palette dat <- predict_response(fit, terms = c("c12hour", "c172code")) plot(dat, facet = TRUE, colors = "hero") # don't use facets, b/w figure, w/o confidence bands dat <- predict_response(fit, terms = c("c12hour", "c172code")) plot(dat, colors = "bw", show_ci = FALSE) # factor at x axis, plot exact data points and error bars dat <- predict_response(fit, terms = c("c172code", "c161sex")) plot(dat) # for three variables, automatic facetting dat <- predict_response(fit, terms = c("c12hour", "c172code", "c161sex")) plot(dat) # show color codes of specific palette ggeffects_palette("okabe-ito") # show all color palettes show_palettes()
test_predictions()
This function "pools" (i.e. combines) multiple ggcomparisons
objects, returned
by test_predictions()
, in a similar fashion as mice::pool()
.
pool_comparisons(x, ...)
pool_comparisons(x, ...)
x |
A list of |
... |
Currently not used. |
Averaging of parameters follows Rubin's rules (Rubin, 1987, p. 76).
A data frame with pooled comparisons or contrasts of predictions.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) comparisons <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) test_predictions(m, "age") }) pool_comparisons(comparisons)
data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) comparisons <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) test_predictions(m, "age") }) pool_comparisons(comparisons)
This function "pools" (i.e. combines) multiple ggeffects
objects, in
a similar fashion as mice::pool()
.
pool_predictions(x, ...)
pool_predictions(x, ...)
x |
A list of |
... |
Currently not used. |
Averaging of parameters follows Rubin's rules (Rubin, 1987, p. 76).
Pooling is applied to the predicted values on the scale of the linear predictor,
not on the response scale, in order to have accurate pooled estimates and
standard errors. The final pooled predicted values are then transformed to
the response scale, using insight::link_inverse()
.
A data frame with pooled predictions.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
# example for multiple imputed datasets data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) predictions <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) predict_response(m, "age") }) pool_predictions(predictions)
# example for multiple imputed datasets data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) predictions <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) predict_response(m, "age") }) pool_predictions(predictions)
After fitting a model, it is useful generate model-based estimates (expected values, or adjusted predictions) of the response variable for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables.
The ggeffects package computes marginal means and adjusted predicted
values for the response, at the margin of specific values or levels from
certain model terms. The package is built around three core functions:
predict_response()
(understanding results), test_predictions()
(importance
of results) and plot()
(communicate results).
By default, adjusted predictions or marginal means are returned on the
response scale, which is the easiest and most intuitive scale to interpret
the results. There are other options for specific models as well, e.g. with
zero-inflation component (see documentation of the type
-argument). The
result is returned as structured data frame, which is nicely printed by
default. plot()
can be used to easily create figures.
The main function to calculate marginal means and adjusted predictions is
predict_response()
, which returns adjusted predictions, marginal means
or averaged counterfactual predictions depending on value of the
margin
-argument.
In previous versions of ggeffects, the functions ggpredict()
, ggemmeans()
,
ggeffect()
and ggaverage()
were used to calculate marginal means and
adjusted predictions. These functions are still available, but predict_response()
as a "wrapper" around these functions is the preferred way to calculate marginal
means and adjusted predictions now.
predict_response( model, terms, margin = "mean_reference", ci_level = 0.95, type = "fixed", condition = NULL, interval = "confidence", back_transform = TRUE, vcov = NULL, vcov_args = NULL, weights = NULL, bias_correction = FALSE, verbose = TRUE, ... )
predict_response( model, terms, margin = "mean_reference", ci_level = 0.95, type = "fixed", condition = NULL, interval = "confidence", back_transform = TRUE, vcov = NULL, vcov_args = NULL, weights = NULL, bias_correction = FALSE, verbose = TRUE, ... )
model |
A model object. |
terms |
Names of those terms from
|
margin |
Character string, indicating how to marginalize over the
non-focal predictors, i.e. those variables that are not specified in
|
ci_level |
Numeric, the level of the confidence intervals. Use
|
type |
Character, indicating whether predictions should be conditioned
on specific model components or not, or whether population or unit-level
predictions are desired. Consequently, most options only apply for survival
models, mixed effects models and/or models with zero-inflation (and their
Bayesian counter-parts); only exception is Note 1: For Note 2: If Note 3: If
When |
condition |
Named character vector, which indicates covariates that
should be held constant at specific values. Unlike |
interval |
Type of interval calculation, can either be |
back_transform |
Logical, if |
vcov |
Variance-covariance matrix used to compute uncertainty estimates (e.g., for confidence intervals based on robust standard errors). This argument accepts a covariance matrix, a function which returns a covariance matrix, or a string which identifies the function to be used to compute the covariance matrix.
If See details in this vignette. |
vcov_args |
List of arguments to be passed to the function identified by
the |
weights |
This argument is used in two different ways, depending on the
|
bias_correction |
Logical, if |
verbose |
Toggle messages or warnings. |
... |
If |
A data frame (with ggeffects
class attribute) with consistent data columns:
"x"
: the values of the first term in terms
, used as x-position in plots.
"predicted"
: the predicted values of the response, used as y-position in plots.
"std.error"
: the standard error of the predictions. Note that the standard
errors are always on the link-scale, and not back-transformed for non-Gaussian
models!
"conf.low"
: the lower bound of the confidence interval for the predicted values.
"conf.high"
: the upper bound of the confidence interval for the predicted values.
"group"
: the grouping level from the second term in terms
, used as
grouping-aesthetics in plots.
"facet"
: the grouping level from the third term in terms
, used to indicate
facets in plots.
The estimated marginal means (or predicted values) are always on the response scale!
For proportional odds logistic regression (see ?MASS::polr
)
resp. cumulative link models (e.g., see ?ordinal::clm
),
an additional column "response.level"
is returned, which indicates
the grouping of predictions based on the level of the model's response.
Note that for convenience reasons, the columns for the intervals
are always named "conf.low"
and "conf.high"
, even though
for Bayesian models credible or highest posterior density intervals
are returned.
There is an as.data.frame()
method for objects of class ggeffects
,
which has an terms_to_colnames
argument, to use the term names as column
names instead of the standardized names "x"
etc.
A list of supported models can be found at the package website.
Support for models varies by marginalization method (the margin
argument),
i.e. although predict_response()
supports most models, some models are only
supported exclusively by one of the four downstream functions (ggpredict()
,
ggemmeans()
, ggeffect()
or ggaverage()
). This means that not all models
work for every margin
option of predict_response()
.
predict_response()
is a wrapper around ggpredict()
, ggemmeans()
and
ggaverage()
. Depending on the value of the margin
argument,
predict_response()
calls one of those functions. The margin
argument
indicates how to marginalize over the non-focal predictors, i.e. those
variables that are not specified in terms
. Possible values are:
"mean_reference"
and "mean_mode"
: For "mean_reference"
, non-focal
predictors are set to their mean (numeric variables), reference level
(factors), or "most common" value (mode) in case of character vectors.
For "mean_mode"
, non-focal predictors are set to their mean (numeric
variables) or mode (factors, or "most common" value in case of character
vectors).
These predictons represent a rather "theoretical" view on your data, which does not necessarily exactly reflect the characteristics of your sample. It helps answer the question, "What is the predicted (or: expected) value of the response at meaningful values or levels of my focal terms for a 'typical' observation in my data?", where 'typical' refers to certain characteristics of the remaining predictors.
"marginalmeans"
: non-focal predictors are set to their mean (numeric
variables) or averaged over the levels or "values" for factors and
character vectors. Averaging over the factor levels of non-focal terms
computes a kind of "weighted average" for the values at which these terms
are hold constant. Thus, non-focal categorical terms are conditioned on
"weighted averages" of their levels. There are different weighting
options that can be altered using the weights
argument.
These predictions come closer to the sample, because all possible values and levels of the non-focal predictors are taken into account. It would answer the question, "What is the predicted (or: expected) value of the response at meaningful values or levels of my focal terms for an 'average' observation in my data?". It refers to randomly picking a subject of your sample and the result you get on average.
"empirical"
(or "counterfactual"
or "average"
): non-focal predictors
are averaged over the observations in the sample. The response is predicted
for each subject in the data and predicted values are then averaged across
all subjects, aggregated/grouped by the focal terms. In particular,
averaging is applied to counterfactual predictions (Dickerman and Hernan
2020). There is a more detailed description in
this vignette.
Counterfactual predictions are useful, insofar as the results can also be transferred to other contexts. It answers the question, "What is the predicted (or: expected) value of the response at meaningful values or levels of my focal terms for the 'average' observation in the population?". It does not only refer to the actual data in your sample, but also "what would be if" we had more data, or if we had data from a different population. This is where "counterfactual" refers to.
You can set a default-option for the margin
argument via options()
, e.g.
options(ggeffects_margin = "empirical")
, so you don't have to specify your
"default" marginalization method each time you call predict_response()
.
Use options(ggeffects_margin = NULL)
to remove that setting.
The condition
argument can be used to fix non-focal terms to specific
values.
Meaningful values of focal terms can be specified via the terms
argument.
Specifying meaningful or representative values as string pattern is the
preferred way in the ggeffects package. However, it is also possible to
use a list()
for the focal terms if prefer the "classical" R way. terms
can also be a data (or reference) grid provided as data frame. All options
are described in this vignette.
Indicating levels in square brackets allows for selecting only certain
groups or values resp. value ranges. The term name and the start of the
levels in brackets must be separated by a whitespace character, e.g.
terms = c("age", "education [1,3]")
. Numeric ranges, separated with colon,
are also allowed: terms = c("education", "age [30:60]")
. The stepsize for
ranges can be adjusted using by
, e.g. terms = "age [30:60 by=5]"
.
The terms
argument also supports the same shortcuts as the values
argument
in values_at()
. So terms = "age [meansd]"
would return predictions for
the values one standard deviation below the mean age, the mean age and one SD
above the mean age. terms = "age [quart2]"
would calculate predictions at
the value of the lower, median and upper quartile of age.
Furthermore, it is possible to specify a function name. Values for predictions
will then be transformed, e.g. terms = "income [exp]"
. This is useful when
model predictors were transformed for fitting the model and should be
back-transformed to the original scale for predictions. It is also possible
to define own functions (see
this vignette).
Instead of a function, it is also possible to define the name of a variable
with specific values, e.g. to define a vector v = c(1000, 2000, 3000)
and
then use terms = "income [v]"
.
You can take a random sample of any size with sample=n
, e.g
terms = "income [sample=8]"
, which will sample eight values from
all possible values of the variable income
. This option is especially
useful for plotting predictions at certain levels of random effects
group levels, where the group factor has too many levels to be completely
plotted. For more details, see
this vignette.
Finally, numeric vectors for which no specific values are given, a "pretty range"
is calculated (see pretty_range()
), to avoid memory allocation problems
for vectors with many unique values. If a numeric vector is specified as
second or third term (i.e. if this focal term is used for "stratification"),
representative values (see values_at()
) are chosen (unless other values
are specified), which are typically the mean value, as well as one standard
deviation below and above the mean. If all values for a numeric vector should
be used to compute predictions, you may use e.g. terms = "age [all]"
. See
also package vignettes.
To create a pretty range that should be smaller or larger than the default
range (i.e. if no specific values would be given), use the n
tag, e.g.
terms="age [n=5]"
or terms="age [n=12]"
. Larger values for n
return a
larger range of predicted values.
predict_response()
also works with Stan-models from the rstanarm or
brms-packages. The predicted values are the median value of all drawn
posterior samples. Standard errors are the median absolute deviation of the
posterior samples. The confidence intervals for Stan-models are Bayesian
predictive intervals. By default, the predictions are based on
rstantools::posterior_epred()
and hence have the limitations that the
uncertainty of the error term (residual variance) is not taken into account.
The recommendation is to use the posterior predictive distribution
(rstantools::posterior_predict()
), i.e. setting interval = "prediction"
.
For mixed models, following options are possible:
Predictions can be made on the population-level (type = "fixed"
, the
default) or for each level of the grouping variable (unit-level). If
unit-level predictions are requested, you need to set type = "random"`` and specify the grouping variable(s) in the
terms' argument.
Population-level predictions can be either conditional (predictions
for a "typical" group) or marginal (average predictions across all
groups). The default in predict_response()
calculated conditional
predictions. Set margin = "empirical"
for marginal predictions.
Prediction intervals, i.e. when interval = "predictions"
also account
for the uncertainty in the random effects.
See more details in this vignette.
Models of class brmsfit
always condition on the zero-inflation component,
if the model has such a component. Hence, there is no type = "zero_inflated"
nor type = "zi_random"
for brmsfit
-models, because predictions are based
on draws of the posterior distribution, which already account for the
zero-inflation part of the model.
Zero-Inflated and Zero-Inflated Mixed Models with glmmTMB
If model
is of class glmmTMB
, hurdle
, zeroinfl
or zerotrunc
, and
margin
is not set to "empirical
, simulations from a multivariate
normal distribution (see ?MASS::mvrnorm
) are drawn to calculate mu*(1-p)
.
Confidence intervals are then based on quantiles of these results.
For type = "zi_random"
, prediction intervals also take the uncertainty in
the random-effect paramters into account (see also Brooks et al. 2017,
pp.391-392 for details).
An alternative for models fitted with glmmTMB that take all model
uncertainties into account are simulations based on simulate()
, which
is used when type = "simulate"
(see Brooks et al. 2017, pp.392-393 for
details).
Finally, if margin = "empirical"
, the returned predictions are already
conditioned on the zero-inflation part (and possible random effects) of the
model, thus these are most comparable to the type = "simulate"
option. In
other words, if all model components should be taken into account for
predictions, you should consider using margin = "empirical"
.
Predicted values for the fixed effects component (type = "fixed"
or
type = "zero_inflated"
) are based on predict(..., type = "mean_subject")
,
while predicted values for random effects components (type = "random"
or
type = "zi_random"
) are calculated with predict(..., type = "subject_specific")
(see ?GLMMadaptive::predict.MixMod
for details). The latter option
requires the response variable to be defined in the newdata
-argument
of predict()
, which will be set to its typical value (see
values_at()
).
polr
, clm
models, or more generally speaking, models with ordinal or
multinominal outcomes, have an additional column response.level
, which
indicates with which level of the response variable the predicted values are
associated.
For averaged model predictions, i.e. when the input model is an object of
class "averaging"
(MuMIn::model.avg()
), predictions are made with the
full averaged coefficients.
Printing Results
The print()
method gives a clean output (especially for predictions by
groups), and indicates at which values covariates were held constant.
Furthermore, the print()
method has several arguments to customize the
output. See this vignette
for details.
Limitations
The support for some models, for example from package MCMCglmm, is not fully tested and may fail for certain models. If you encounter any errors, please file an issue at Github.
Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378-400.
Johnson PC. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946.
Dickerman BA, Hernan, MA. Counterfactual prediction is not only for causal inference. Eur J Epidemiol 35, 615–617 (2020).
library(sjlabelled) data(efc) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) predict_response(fit, terms = "c12hour") predict_response(fit, terms = c("c12hour", "c172code")) # more compact table layout for printing out <- predict_response(fit, terms = c("c12hour", "c172code", "c161sex")) print(out, collapse_table = TRUE) # specified as formula predict_response(fit, terms = ~ c12hour + c172code + c161sex) # only range of 40 to 60 for variable 'c12hour' predict_response(fit, terms = "c12hour [40:60]") # terms as named list predict_response(fit, terms = list(c12hour = 40:60)) # covariate "neg_c_7" is held constant at a value of 11.84 (its mean value). # To use a different value, use "condition" predict_response(fit, terms = "c12hour [40:60]", condition = c(neg_c_7 = 20)) # to plot ggeffects-objects, you can use the 'plot()'-function. # the following examples show how to build your ggplot by hand. # plot predicted values, remaining covariates held constant library(ggplot2) mydf <- predict_response(fit, terms = "c12hour") ggplot(mydf, aes(x, predicted)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1) # three variables, so we can use facets and groups mydf <- predict_response(fit, terms = c("c12hour", "c161sex", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet, ncol = 2) # select specific levels for grouping terms mydf <- predict_response(fit, terms = c("c12hour", "c172code [1,3]", "c161sex")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet) + labs( y = get_y_title(mydf), x = get_x_title(mydf), colour = get_legend_title(mydf) ) # level indication also works for factors with non-numeric levels # and in combination with numeric levels for other variables data(efc) efc$c172code <- sjlabelled::as_label(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) predict_response(fit, terms = c( "c12hour", "c172code [low level of education, high level of education]", "c161sex [1]" )) # when "terms" is a named list predict_response(fit, terms = list( c12hour = seq(0, 170, 30), c172code = c("low level of education", "high level of education"), c161sex = 1 )) # use categorical value on x-axis, use axis-labels, add error bars dat <- predict_response(fit, terms = c("c172code", "c161sex")) ggplot(dat, aes(x, predicted, colour = group)) + geom_point(position = position_dodge(0.1)) + geom_errorbar( aes(ymin = conf.low, ymax = conf.high), position = position_dodge(0.1) ) + scale_x_discrete(breaks = 1:3, labels = get_x_labels(dat)) # 3-way-interaction with 2 continuous variables data(efc) # make categorical efc$c161sex <- as_factor(efc$c161sex) fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc) # select only levels 30, 50 and 70 from continuous variable Barthel-Index dat <- predict_response(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex")) ggplot(dat, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) + facet_wrap(~facet) + labs( colour = get_legend_title(dat), x = get_x_title(dat), y = get_y_title(dat), title = get_title(dat) ) # or with ggeffects' plot-method plot(dat, show_ci = FALSE) # predictions for polynomial terms data(efc) fit <- glm( tot_sc_e ~ c12hour + e42dep + e17age + I(e17age^2) + I(e17age^3), data = efc, family = poisson() ) predict_response(fit, terms = "e17age")
library(sjlabelled) data(efc) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) predict_response(fit, terms = "c12hour") predict_response(fit, terms = c("c12hour", "c172code")) # more compact table layout for printing out <- predict_response(fit, terms = c("c12hour", "c172code", "c161sex")) print(out, collapse_table = TRUE) # specified as formula predict_response(fit, terms = ~ c12hour + c172code + c161sex) # only range of 40 to 60 for variable 'c12hour' predict_response(fit, terms = "c12hour [40:60]") # terms as named list predict_response(fit, terms = list(c12hour = 40:60)) # covariate "neg_c_7" is held constant at a value of 11.84 (its mean value). # To use a different value, use "condition" predict_response(fit, terms = "c12hour [40:60]", condition = c(neg_c_7 = 20)) # to plot ggeffects-objects, you can use the 'plot()'-function. # the following examples show how to build your ggplot by hand. # plot predicted values, remaining covariates held constant library(ggplot2) mydf <- predict_response(fit, terms = "c12hour") ggplot(mydf, aes(x, predicted)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1) # three variables, so we can use facets and groups mydf <- predict_response(fit, terms = c("c12hour", "c161sex", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet, ncol = 2) # select specific levels for grouping terms mydf <- predict_response(fit, terms = c("c12hour", "c172code [1,3]", "c161sex")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet) + labs( y = get_y_title(mydf), x = get_x_title(mydf), colour = get_legend_title(mydf) ) # level indication also works for factors with non-numeric levels # and in combination with numeric levels for other variables data(efc) efc$c172code <- sjlabelled::as_label(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) predict_response(fit, terms = c( "c12hour", "c172code [low level of education, high level of education]", "c161sex [1]" )) # when "terms" is a named list predict_response(fit, terms = list( c12hour = seq(0, 170, 30), c172code = c("low level of education", "high level of education"), c161sex = 1 )) # use categorical value on x-axis, use axis-labels, add error bars dat <- predict_response(fit, terms = c("c172code", "c161sex")) ggplot(dat, aes(x, predicted, colour = group)) + geom_point(position = position_dodge(0.1)) + geom_errorbar( aes(ymin = conf.low, ymax = conf.high), position = position_dodge(0.1) ) + scale_x_discrete(breaks = 1:3, labels = get_x_labels(dat)) # 3-way-interaction with 2 continuous variables data(efc) # make categorical efc$c161sex <- as_factor(efc$c161sex) fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc) # select only levels 30, 50 and 70 from continuous variable Barthel-Index dat <- predict_response(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex")) ggplot(dat, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) + facet_wrap(~facet) + labs( colour = get_legend_title(dat), x = get_x_title(dat), y = get_y_title(dat), title = get_title(dat) ) # or with ggeffects' plot-method plot(dat, show_ci = FALSE) # predictions for polynomial terms data(efc) fit <- glm( tot_sc_e ~ c12hour + e42dep + e17age + I(e17age^2) + I(e17age^3), data = efc, family = poisson() ) predict_response(fit, terms = "e17age")
Creates an evenly spaced, pretty sequence of numbers for a range of a vector.
pretty_range(x, n = NULL, length = NULL)
pretty_range(x, n = NULL, length = NULL)
x |
A numeric vector. |
n |
Numeric value, indicating the size of how many values are used to
create a pretty sequence. If |
length |
Integer value, as alternative to |
A numeric vector with a range corresponding to the minimum and
maximum values of x
. If x
is missing, a function,
pre-programmed with n
and length
is returned. See examples.
data(iris) # pretty range for vectors with decimal points pretty_range(iris$Petal.Length) # pretty range for large range, increasing by 50 pretty_range(1:1000) # increasing by 20 pretty_range(1:1000, n = 7) # return 10 intervals pretty_range(1:1000, length = 10) # same result pretty_range(1:1000, n = 2.5) # function factory range_n_5 <- pretty_range(n = 5) range_n_5(1:1000)
data(iris) # pretty range for vectors with decimal points pretty_range(iris$Petal.Length) # pretty range for large range, increasing by 50 pretty_range(1:1000) # increasing by 20 pretty_range(1:1000, n = 7) # return 10 intervals pretty_range(1:1000, length = 10) # same result pretty_range(1:1000, n = 2.5) # function factory range_n_5 <- pretty_range(n = 5) range_n_5(1:1000)
This function computes partial residuals based on a data grid,
where the data grid is usually a data frame from all combinations of factor
variables or certain values of numeric vectors. This data grid is usually used
as newdata
argument in predict()
, and can be created with
new_data()
.
residualize_over_grid(grid, model, ...) ## S3 method for class 'data.frame' residualize_over_grid(grid, model, predictor_name, ...) ## S3 method for class 'ggeffects' residualize_over_grid(grid, model, protect_names = TRUE, ...)
residualize_over_grid(grid, model, ...) ## S3 method for class 'data.frame' residualize_over_grid(grid, model, predictor_name, ...) ## S3 method for class 'ggeffects' residualize_over_grid(grid, model, protect_names = TRUE, ...)
grid |
A data frame representing the data grid, or an object of class
|
model |
The model for which to compute partial residuals. The data grid
|
... |
Currently not used. |
predictor_name |
The name of the focal predictor, for which partial residuals are computed. |
protect_names |
Logical, if |
A data frame with residuals for the focal predictor.
For generalized linear models (glms), residualized scores are
computed as inv.link(link(Y) + r)
where Y
are the predicted
values on the response scale, and r
are the working residuals.
For (generalized) linear mixed models, the random effect are also
partialled out.
Fox J, Weisberg S. Visualizing Fit and Lack of Fit in Complex Regression Models with Predictor Effect Plots and Partial Residuals. Journal of Statistical Software 2018;87.
library(ggeffects) set.seed(1234) x <- rnorm(200) z <- rnorm(200) # quadratic relationship y <- 2 * x + x^2 + 4 * z + rnorm(200) d <- data.frame(x, y, z) model <- lm(y ~ x + z, data = d) pr <- predict_response(model, c("x [all]", "z")) head(residualize_over_grid(pr, model))
library(ggeffects) set.seed(1234) x <- rnorm(200) z <- rnorm(200) # quadratic relationship y <- 2 * x + x^2 + 4 * z + rnorm(200) d <- data.frame(x, y, z) model <- lm(y ~ x + z, data = d) pr <- predict_response(model, c("x [all]", "z")) head(residualize_over_grid(pr, model))
Function to test differences of adjusted predictions for
statistical significance. This is usually called contrasts or (pairwise)
comparisons, or "marginal effects". hypothesis_test()
is an alias.
test_predictions(object, ...) hypothesis_test(object, ...) ## Default S3 method: test_predictions( object, terms = NULL, by = NULL, test = "pairwise", test_args = NULL, equivalence = NULL, scale = "response", p_adjust = NULL, df = NULL, ci_level = 0.95, collapse_levels = FALSE, margin = "mean_reference", condition = NULL, engine = "marginaleffects", verbose = TRUE, ... ) ## S3 method for class 'ggeffects' test_predictions( object, by = NULL, test = "pairwise", equivalence = NULL, scale = "response", p_adjust = NULL, df = NULL, collapse_levels = FALSE, engine = "marginaleffects", verbose = TRUE, ... )
test_predictions(object, ...) hypothesis_test(object, ...) ## Default S3 method: test_predictions( object, terms = NULL, by = NULL, test = "pairwise", test_args = NULL, equivalence = NULL, scale = "response", p_adjust = NULL, df = NULL, ci_level = 0.95, collapse_levels = FALSE, margin = "mean_reference", condition = NULL, engine = "marginaleffects", verbose = TRUE, ... ) ## S3 method for class 'ggeffects' test_predictions( object, by = NULL, test = "pairwise", equivalence = NULL, scale = "response", p_adjust = NULL, df = NULL, collapse_levels = FALSE, engine = "marginaleffects", verbose = TRUE, ... )
object |
A fitted model object, or an object of class |
... |
Arguments passed down to To define a heteroscedasticity-consistent variance-covariance matrix, you can
either use the same arguments as for |
terms |
If |
by |
Character vector specifying the names of predictors to condition on.
Hypothesis test is then carried out for focal terms by each level of |
test |
Hypothesis to test, defined as character string, formula, or data frame. Can be one of:
Technical details about the packages used as back-end to calculate contrasts and pairwise comparisons are provided in the section Packages used as back-end to calculate contrasts and pairwise comparisons below. |
test_args |
Optional arguments passed to |
equivalence |
ROPE's lower and higher bounds. Should be |
scale |
Character string, indicating the scale on which the contrasts or comparisons are represented. Can be one of:
Note: If the |
p_adjust |
Character vector, if not |
df |
Degrees of freedom that will be used to compute the p-values and
confidence intervals. If |
ci_level |
Numeric, the level of the confidence intervals. If |
collapse_levels |
Logical, if |
margin |
Character string, indicates the method how to marginalize over
non-focal terms. See |
condition |
Named character vector, which indicates covariates that
should be held constant at specific values, for instance
|
engine |
Character string, indicates the package to use for computing
contrasts and comparisons. Usually, this argument can be ignored, unless you
want to explicitly use another package than marginaleffects to calculate
contrasts and pairwise comparisons. |
verbose |
Toggle messages and warnings. |
A data frame containing predictions (e.g. for test = NULL
),
contrasts or pairwise comparisons of adjusted predictions or estimated
marginal means.
There are many ways to test contrasts or pairwise comparisons. A detailed introduction with many (visual) examples is shown in this vignette.
A simple workflow includes calculating adjusted predictions and passing the
results directly to test_predictions()
, e.g.:
# 1. fit your model model <- lm(mpg ~ hp + wt + am, data = mtcars) # 2. calculate adjusted predictions pr <- predict_response(model, "am") pr # 3. test pairwise comparisons test_predictions(pr)
See also this vignette.
The test
argument is used to define which kind of contrast or comparison
should be calculated. The default is to use the marginaleffects package.
Here are some technical details about the packages used as back-end. When
test
is...
"pairwise"
(default), pairwise comparisons are based on the marginaleffects
package.
"trend"
or "slope"
also uses the marginaleffects package.
"contrast"
uses the emmeans package, i.e. emmeans::contrast(method = "eff")
is called.
"exclude"
relies on the emmeans package, i.e. emmeans::contrast(method = "del.eff")
is called.
"polynomial"
relies on the emmeans package, i.e. emmeans::contrast(method = "poly")
is called.
"interaction"
uses the emmeans package, i.e. emmeans::contrast(interaction = ...)
is called.
"consecutive"
also relies on the emmeans package, i.e.
emmeans::contrast(method = "consec")
is called.
a character string with a custom hypothesis, the marginaleffects package is used.
a data frame with custom contrasts, emmeans is used again.
for formulas, the marginaleffects package is used.
NULL
calls functions from the marginaleffects package with
hypothesis = NULL
.
If all focal terms are only present as random effects in a mixed model, or if predicted probabilities for the zero-inflation component of a model should be tested, functions from the ggeffects package are used. There is an example for pairwise comparisons of random effects in this vignette.
Note that p-value adjustment for methods supported by p.adjust()
(see also
p.adjust.methods
), each row is considered as one set of comparisons, no
matter which test
was specified. That is, for instance, when test_predictions()
returns eight rows of predictions (when test = NULL
), and p_adjust = "bonferroni"
,
the p-values are adjusted in the same way as if we had a test of pairwise
comparisons (test = "pairwise"
) where eight rows of comparisons are
returned. For methods "tukey"
or "sidak"
, a rank adjustment is done
based on the number of combinations of levels from the focal predictors
in terms
. Thus, the latter two methods may be useful for certain tests
only, in particular pairwise comparisons.
For johnson_neyman()
, the only available adjustment methods are "fdr"
(or "bh"
) (Benjamini & Hochberg (1995)) and "esarey"
(or "es"
)
(Esarey and Sumner 2017). These usually return similar results. The major
difference is that "fdr"
can be slightly faster and more stable in edge
cases, however, confidence intervals are not updated. Only the p-values are
adjusted. "esarey"
is slower, but confidence intervals are updated as well.
ggeffects_test_engine
can be used as option to either use the marginaleffects
package for computing contrasts and comparisons (default), or the emmeans
package (e.g. options(ggeffects_test_engine = "emmeans")
). The latter is
useful when the marginaleffects package is not available, or when the
emmeans package is preferred. You can also provide the engine directly, e.g.
test_predictions(..., engine = "emmeans")
. Note that using emmeans as
backend is currently not as feature rich as the default (marginaleffects).
If engine = "emmeans"
, the test
argument can also be "interaction"
to calculate interaction contrasts (difference-in-difference contrasts),
"consecutive"
to calculate contrasts between consecutive levels of a predictor,
or a data frame with custom contrasts. If test
is one of the latter options,
and engine
is not specified, the engine
is automatically set to "emmeans"
.
Additionally, the test_args
argument can be used to specify further options
for those contrasts. See 'Examples' and documentation of test_args
.
If the marginaleffects package is not installed, the emmeans package is
used automatically. If this package is not installed as well,
engine = "ggeffects"
is used.
The verbose
argument can be used to display or silence messages and
warnings. Furthermore, options()
can be used to set defaults for the
print()
and print_html()
method. The following options are available,
which can simply be run in the console:
ggeffects_ci_brackets
: Define a character vector of length two, indicating
the opening and closing parentheses that encompass the confidence intervals
values, e.g. options(ggeffects_ci_brackets = c("[", "]"))
.
ggeffects_collapse_ci
: Logical, if TRUE
, the columns with predicted
values (or contrasts) and confidence intervals are collapsed into one
column, e.g. options(ggeffects_collapse_ci = TRUE)
.
ggeffects_collapse_p
: Logical, if TRUE
, the columns with predicted
values (or contrasts) and p-values are collapsed into one column, e.g.
options(ggeffects_collapse_p = TRUE)
. Note that p-values are replaced
by asterisk-symbols (stars) or empty strings when ggeffects_collapse_p = TRUE
,
depending on the significance level.
ggeffects_collapse_tables
: Logical, if TRUE
, multiple tables for
subgroups are combined into one table. Only works when there is more than
one focal term, e.g. options(ggeffects_collapse_tables = TRUE)
.
ggeffects_output_format
: String, either "text"
, "markdown"
or "html"
.
Defines the default output format from predict_response()
. If "html"
, a
formatted HTML table is created and printed to the view pane. "markdown"
creates a markdown-formatted table inside Rmarkdown documents, and prints
a text-format table to the console when used interactively. If "text"
or
NULL
, a formatted table is printed to the console, e.g.
options(ggeffects_output_format = "html")
.
ggeffects_html_engine
: String, either "tt"
or "gt"
. Defines the default
engine to use for printing HTML tables. If "tt"
, the tinytable package
is used, if "gt"
, the gt package is used, e.g.
options(ggeffects_html_engine = "gt")
.
Use options(<option_name> = NULL)
to remove the option.
Esarey, J., & Sumner, J. L. (2017). Marginal effects in interaction models: Determining and controlling the false positive rate. Comparative Political Studies, 1–33. Advance online publication. doi: 10.1177/0010414017730080
There is also an equivalence_test()
method in the parameters
package (parameters::equivalence_test.lm()
), which can be used to
test contrasts or comparisons for practical equivalence. This method also
has a plot()
method, hence it is possible to do something like:
library(parameters) predict_response(model, focal_terms) |> equivalence_test() |> plot()
data(efc) efc$c172code <- as.factor(efc$c172code) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") m <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) # direct computation of comparisons test_predictions(m, "c172code") # passing a `ggeffects` object pred <- predict_response(m, "c172code") test_predictions(pred) # test for slope test_predictions(m, "c12hour") # interaction - contrasts by groups m <- lm(barthtot ~ c12hour + c161sex * c172code + neg_c_7, data = efc) test_predictions(m, c("c161sex", "c172code"), test = NULL) # interaction - pairwise comparisons by groups test_predictions(m, c("c161sex", "c172code")) # equivalence testing test_predictions(m, c("c161sex", "c172code"), equivalence = c(-2.96, 2.96)) # equivalence testing, using the parameters package pr <- predict_response(m, c("c161sex", "c172code")) parameters::equivalence_test(pr) # interaction - collapse unique levels test_predictions(m, c("c161sex", "c172code"), collapse_levels = TRUE) # p-value adjustment test_predictions(m, c("c161sex", "c172code"), p_adjust = "tukey") # not all comparisons, only by specific group levels test_predictions(m, "c172code", by = "c161sex") # specific comparisons test_predictions(m, c("c161sex", "c172code"), test = "b2 = b1") # interaction - slope by groups m <- lm(barthtot ~ c12hour + neg_c_7 * c172code + c161sex, data = efc) test_predictions(m, c("neg_c_7", "c172code")) # Interaction and consecutive contrasts ----------------- # ------------------------------------------------------- data(coffee_data, package = "ggeffects") m <- lm(alertness ~ time * coffee + sex, data = coffee_data) # consecutive contrasts test_predictions(m, "time", by = "coffee", test = "consecutive") # same as (using formula): pr <- predict_response(m, c("time", "coffee")) test_predictions(pr, test = difference ~ sequential | coffee) # interaction contrasts - difference-in-difference comparisons pr <- predict_response(m, c("time", "coffee"), margin = "marginalmeans") test_predictions(pr, test = "interaction") # Ratio contrasts --------------------------------------- # ------------------------------------------------------- test_predictions(test = ratio ~ reference | coffee) # Custom contrasts -------------------------------------- # ------------------------------------------------------- wakeup_time <- data.frame( "wakeup vs later" = c(-2, 1, 1) / 2, # make sure each "side" sums to (+/-)1! "start vs end of day" = c(-1, 0, 1) ) test_predictions(m, "time", by = "coffee", test = wakeup_time) # Example: marginal effects ----------------------------- # ------------------------------------------------------- data(iris) m <- lm(Petal.Width ~ Petal.Length + Species, data = iris) # we now want the marginal effects for "Species". We can calculate # the marginal effect using the "marginaleffects" package marginaleffects::avg_slopes(m, variables = "Species") # finally, test_predictions() returns the same. while the previous results # report the marginal effect compared to the reference level "setosa", # test_predictions() returns the marginal effects for all pairwise comparisons test_predictions(m, "Species")
data(efc) efc$c172code <- as.factor(efc$c172code) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") m <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) # direct computation of comparisons test_predictions(m, "c172code") # passing a `ggeffects` object pred <- predict_response(m, "c172code") test_predictions(pred) # test for slope test_predictions(m, "c12hour") # interaction - contrasts by groups m <- lm(barthtot ~ c12hour + c161sex * c172code + neg_c_7, data = efc) test_predictions(m, c("c161sex", "c172code"), test = NULL) # interaction - pairwise comparisons by groups test_predictions(m, c("c161sex", "c172code")) # equivalence testing test_predictions(m, c("c161sex", "c172code"), equivalence = c(-2.96, 2.96)) # equivalence testing, using the parameters package pr <- predict_response(m, c("c161sex", "c172code")) parameters::equivalence_test(pr) # interaction - collapse unique levels test_predictions(m, c("c161sex", "c172code"), collapse_levels = TRUE) # p-value adjustment test_predictions(m, c("c161sex", "c172code"), p_adjust = "tukey") # not all comparisons, only by specific group levels test_predictions(m, "c172code", by = "c161sex") # specific comparisons test_predictions(m, c("c161sex", "c172code"), test = "b2 = b1") # interaction - slope by groups m <- lm(barthtot ~ c12hour + neg_c_7 * c172code + c161sex, data = efc) test_predictions(m, c("neg_c_7", "c172code")) # Interaction and consecutive contrasts ----------------- # ------------------------------------------------------- data(coffee_data, package = "ggeffects") m <- lm(alertness ~ time * coffee + sex, data = coffee_data) # consecutive contrasts test_predictions(m, "time", by = "coffee", test = "consecutive") # same as (using formula): pr <- predict_response(m, c("time", "coffee")) test_predictions(pr, test = difference ~ sequential | coffee) # interaction contrasts - difference-in-difference comparisons pr <- predict_response(m, c("time", "coffee"), margin = "marginalmeans") test_predictions(pr, test = "interaction") # Ratio contrasts --------------------------------------- # ------------------------------------------------------- test_predictions(test = ratio ~ reference | coffee) # Custom contrasts -------------------------------------- # ------------------------------------------------------- wakeup_time <- data.frame( "wakeup vs later" = c(-2, 1, 1) / 2, # make sure each "side" sums to (+/-)1! "start vs end of day" = c(-1, 0, 1) ) test_predictions(m, "time", by = "coffee", test = wakeup_time) # Example: marginal effects ----------------------------- # ------------------------------------------------------- data(iris) m <- lm(Petal.Width ~ Petal.Length + Species, data = iris) # we now want the marginal effects for "Species". We can calculate # the marginal effect using the "marginaleffects" package marginaleffects::avg_slopes(m, variables = "Species") # finally, test_predictions() returns the same. while the previous results # report the marginal effect compared to the reference level "setosa", # test_predictions() returns the marginal effects for all pairwise comparisons test_predictions(m, "Species")
This function calculates representative values of a vector, like minimum/maximum values or lower, median and upper quartile etc., which can be used for numeric vectors to plot adjusted predictions at these representative values.
values_at(x, values = "meansd") representative_values(x, values = "meansd")
values_at(x, values = "meansd") representative_values(x, values = "meansd")
x |
A numeric vector. |
values |
Character vector, naming a pattern for which representative values should be calculcated.
|
A numeric vector, representing the required values from x
, like
minimum/maximum value or mean and +/- 1 SD. If x
is missing, a function,
pre-programmed with n
and length
is returned. See examples.
data(efc) values_at(efc$c12hour) values_at(efc$c12hour, "quartiles2") mean_sd <- values_at(values = "meansd") mean_sd(efc$c12hour)
data(efc) values_at(efc$c12hour) values_at(efc$c12hour, "quartiles2") mean_sd <- values_at(values = "meansd") mean_sd(efc$c12hour)
Returns the variance-covariance matrix for the predicted values from object
.
## S3 method for class 'ggeffects' vcov(object, vcov = NULL, vcov_args = NULL, verbose = TRUE, ...)
## S3 method for class 'ggeffects' vcov(object, vcov = NULL, vcov_args = NULL, verbose = TRUE, ...)
object |
An object of class |
vcov |
Variance-covariance matrix used to compute uncertainty estimates (e.g., for confidence intervals based on robust standard errors). This argument accepts a covariance matrix, a function which returns a covariance matrix, or a string which identifies the function to be used to compute the covariance matrix.
If See details in this vignette. |
vcov_args |
List of arguments to be passed to the function identified by
the |
verbose |
Toggle messages or warnings. |
... |
Currently not used. |
The returned matrix has as many rows (and columns) as possible combinations
of predicted values from the predict_response()
call. For example, if there
are two variables in the terms
-argument of predict_response()
with 3 and 4
levels each, there will be 3*4 combinations of predicted values, so the returned
matrix has a 12x12 dimension. In short, nrow(object)
is always equal to
nrow(vcov(object))
. See also 'Examples'.
The variance-covariance matrix for the predicted values from object
.
data(efc) model <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) result <- predict_response(model, c("c12hour [meansd]", "c161sex")) vcov(result) # compare standard errors sqrt(diag(vcov(result))) as.data.frame(result) # only two predicted values, no further terms # vcov() returns a 2x2 matrix result <- predict_response(model, "c161sex") vcov(result) # 2 levels for c161sex multiplied by 3 levels for c172code # result in 6 combinations of predicted values # thus vcov() returns a 6x6 matrix result <- predict_response(model, c("c161sex", "c172code")) vcov(result)
data(efc) model <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) result <- predict_response(model, c("c12hour [meansd]", "c161sex")) vcov(result) # compare standard errors sqrt(diag(vcov(result))) as.data.frame(result) # only two predicted values, no further terms # vcov() returns a 2x2 matrix result <- predict_response(model, "c161sex") vcov(result) # 2 levels for c161sex multiplied by 3 levels for c172code # result in 6 combinations of predicted values # thus vcov() returns a 6x6 matrix result <- predict_response(model, c("c161sex", "c172code")) vcov(result)