Get or measure the set of types (unique token values).

text_types(x, filter = NULL, collapse = FALSE, ...)

text_ntype(x, filter = NULL, collapse = FALSE, ...)

Arguments

x

a text or character vector.

filter

if non-NULL, a text filter to to use instead of the default text filter for x.

collapse

a logical value indicating whether to collapse the aggregation over all rows of the input.

...

additional properties to set on the text filter.

Details

text_ntype counts the number of unique types in each text; text_types returns the set of unique types, as a character vector. Types are determined according to the filter argument.

Value

If collapse = FALSE, then text_ntype produces a numeric vector with the same length and names as the input text, with the elements giving the number of units in the corresponding texts. For text_types, the result is a list of character vector with each vector giving the unique types in the corresponding text, ordered according to the sort function.

If collapse = TRUE, then we aggregate over all rows of the input. In this case, text_ntype produces a scalar indicating the number of unique types in x, and text_types produces a character vector with the unique types.

See also

Examples

text <- c("I saw Mr. Jones today.", "Split across\na line.", "What. Are. You. Doing????", "She asked 'do you really mean that?' and I said 'yes.'") # count the number of unique types text_ntype(text)
#> [1] 6 5 6 14
text_ntype(text, collapse = TRUE)
#> [1] 25
# get the type sets text_types(text)
#> [[1]] #> [1] "." "i" "jones" "mr" "saw" "today" #> #> [[2]] #> [1] "." "a" "across" "line" "split" #> #> [[3]] #> [1] "." "?" "are" "doing" "what" "you" #> #> [[4]] #> [1] "'" "." "?" "and" "asked" "do" "i" "mean" #> [9] "really" "said" "she" "that" "yes" "you" #>
text_types(text, collapse = TRUE)
#> [1] "'" "." "?" "a" "across" "and" "are" "asked" #> [9] "do" "doing" "i" "jones" "line" "mean" "mr" "really" #> [17] "said" "saw" "she" "split" "that" "today" "what" "yes" #> [25] "you"