The main function in stylest, stylest_fit fits a model using a corpus of texts labeled by speaker.

stylest_fit(
  x,
  speaker,
  terms = NULL,
  filter = NULL,
  smooth = 0.5,
  term_weights = NULL,
  fill_method = "value",
  fill_weight = 0,
  weight_varname = "mean_distance"
)

Arguments

x

Text vector. May be a corpus_frame object

speaker

Vector of speaker labels. Should be the same length as x

terms

If not NULL, terms to be used in the model. If NULL, use all terms

filter

If not NULL, a text filter to specify the tokenization. See corpus for more information about specifying filter

smooth

Numeric value used smooth term frequencies instead of the default of 0.5

term_weights

Dataframe of distances (or any weights) per word in the vocab. This dataframe should have one column $word and a second column $weight_var containing the weight for the word. See the vignette for details.

fill_method

if "value" (default), fill_weight is used to fill any terms with NA weight. If "mean", the mean term_weight should be used as the fill value

fill_weight

numeric value to fill in as weight for any term which does not have a weight specified in term_weights, default=0.0 (drops any words without weights)

weight_varname

Name of the column in term_weights containing the weights, default="mean_distance"

Value

A S3 stylest_model object containing: speakers Vector of unique speakers, filter text_filter used, terms terms used in fitting the model, ntoken Vector of number of tokens per speaker, smooth Smoothing value, weights If not NULL, a named matrix of weights for each term in the vocab, rate Matrix of speaker rates for each term in vocabulary

Details

The user may specify only one of terms or cutoff. If neither is specified, all terms will be used.

Examples

data(novels_excerpts) speaker_mod <- stylest_fit(novels_excerpts$text, novels_excerpts$author)