Searching for Terms

Look for instances of one or more terms in a set of texts.

text_locate(x, terms, filter = NULL, ...)

text_count(x, terms, filter = NULL, ...)

text_detect(x, terms, filter = NULL, ...)

text_match(x, terms, filter = NULL, ...)

text_sample(x, terms, size = NULL, filter = NULL, ...)

text_subset(x, terms, filter = NULL, ...)

Arguments

x	a text or character vector.
terms	a character vector of search terms.
filter	if non-`NULL`, a text filter to to use instead of the default text filter for `x`.
size	the maximum number of results to return, or `NULL`.
...	additional properties to set on the text filter.

Details

text_locate finds all instances of the search terms in the input text, along with their contexts.

text_count counts the number of search term instances in each element of the text vector.

text_detect indicates whether each text contains at least one of the search terms.

text_match reports the matching instances as a factor variable with levels equal to the terms argument.

text_subset returns the texts that contain the search terms.

text_sample returns a random sample of the results from text_locate, in random order. This is this is useful for hand-inspecting a subset of the text_locate matches.

Value

text_count and text_detect return a numeric vector and a logical vector, respectively, with length equal to the number of input texts and names equal to the text names.

text_locate and text_sample both return a data frame with one row for each search result and columns named ‘text’, ‘before’, ‘instance’, and ‘after’. The ‘text’ column gives the name of the text containing the instance; ‘before’ and ‘after’ are text vectors giving the text before and after the instance. The ‘instance’ column gives the token or tokens matching the search term.

text_match returns a data frame for one row for each search result, with columns names ‘text’ and ‘term’. Both columns are factors. The ‘text’ column has levels equal to the text labels, and the ‘term’ column has levels equal to terms argument.

text_subset returns the subset of texts that contain the given search terms. The resulting has its text_filter set to the passed-in filter argument.

Examples

text <- c("Rose is a rose is a rose is a rose.",
          "A rose by any other name would smell as sweet.",
          "Snow White and Rose Red")

text_count(text, "rose")
#> [1] 4 1 1
text_detect(text, "rose")
#> [1] TRUE TRUE TRUE
text_locate(text, "rose")
#>   text             before              instance              after              
#> 1 1                                      Rose    is a rose is a rose is a rose. 
#> 2 1                         Rose is a    rose    is a rose is a rose.           
#> 3 1               Rose is a rose is a    rose    is a rose.                     
#> 4 1     Rose is a rose is a rose is a    rose   .                               
#> 5 2                                 A    rose    by any other name would smell …
#> 6 3                    Snow White and    Rose    Red                            
text_match(text, "rose")
#>   text term
#> 1 1    rose
#> 2 1    rose
#> 3 1    rose
#> 4 1    rose
#> 5 2    rose
#> 6 3    rose
text_sample(text, "rose", 3)
#>   text             before              instance              after              
#> 1 2                                 A    rose    by any other name would smell …
#> 2 1     Rose is a rose is a rose is a    rose   .                               
#> 3 1               Rose is a rose is a    rose    is a rose.                     
text_subset(text, "a rose")
#> [1] "Rose is a rose is a rose is a rose."           
#> [2] "A rose by any other name would smell as sweet."

# search for multiple terms
text_locate(text, c("rose", "rose red", "snow white"))
#>   text             before              instance               after             
#> 1 1                                      Rose     is a rose is a rose is a rose…
#> 2 1                        Rose is a     rose     is a rose is a rose.          
#> 3 1              Rose is a rose is a     rose     is a rose.                    
#> 4 1    …ose is a rose is a rose is a     rose    .                              
#> 5 2                                A     rose     by any other name would smell…
#> 6 3                                   Snow White  and Rose Red                  
#> 7 3                   Snow White and     Rose     Red                           
#> 8 3                   Snow White and   Rose Red

Arguments

Details

Value

See also

Examples