NAME

Koha::SearchEngine::Elasticsearch::QueryBuilder - constructs elasticsearch query objects from user-supplied queries

DESCRIPTION

This provides the functions that take a user-supplied search query, and provides something that can be given to elasticsearch to get answers.

SYNOPSIS

    use Koha::SearchEngine::Elasticsearch::QueryBuilder;
    $builder = Koha::SearchEngine::Elasticsearch->new({ index => $index });
    my $simple_query = $builder->build_query("hello");
    # This is currently undocumented because the original code is undocumented
    my $adv_query = $builder->build_advanced_query($indexes, $operands, $operators);

METHODS

get_index_field_convert

    my @index_params = Koha::SearchEngine::Elasticsearch::QueryBuilder->get_index_field_convert();

Converts zebra-style search index notation into elasticsearch-style.

@indexes is an array of index names, as presented to build_query_compat, and it returns something that can be sent to build_query.

TODO: this will pull from the elasticsearch mappings table to figure out types.

build_query

    my $simple_query = $builder->build_query("hello", %options)

This will build a query that can be issued to elasticsearch from the provided string input. This expects a lucene style search form (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax for details.)

It'll make an attempt to respect the various query options.

Additional options can be provided with the %options hash.

sort: This should be an arrayref of hashrefs, each containing a field and an direction (optional, defaults to asc.) The results will be sorted according to these values. Valid values for direction are 'asc' and 'desc'.

build_query_compat

    my (
        $error,             $query, $simple_query, $query_cgi,
        $query_desc,        $limit, $limit_cgi,    $limit_desc,
        $stopwords_removed, $query_type
      )
      = $builder->build_query_compat( \@operators, \@operands, \@indexes,
        \@limits, \@sort_by, $scan, $lang, $params );

This handles a search using the same api as C4::Search::buildQuery does.

A very simple query will go in with $operands set to ['query'], and $sort_by set to ['pubdate_dsc']. This simple case will return with $query set to something that can perform the search, $simple_query set to just the search term, $query_cgi set to something that can reproduce this search, and $query_desc set to something else.

build_authorities_query

    my $query = $builder->build_authorities_query(\%search);

This takes a nice description of an authority search and turns it into a black-box query that can then be passed to the appropriate searcher.

The search description is a hashref that looks something like:

    {
        searches => [
            {
                where    => 'Heading',    # search the main entry
                operator => 'exact',        # require an exact match
                value    => 'frogs',        # the search string
            },
            {
                where    => '',             # search all entries
                operator => '',             # default keyword, right truncation
                value    => 'pond',
            },
        ],
        sort => {
            field => 'Heading',
            order => 'desc',
        },
        authtypecode => 'TOPIC_TERM',
    }

build_authorities_query_compat

    my ($query) =
      $builder->build_authorities_query_compat( \@marclist, \@and_or,
        \@excluding, \@operator, \@value, $authtypecode, $orderby );

This builds a query for searching for authorities, in the style of C4::AuthoritiesMarc::SearchAuthorities.

Arguments:

marclist: An arrayref containing where the particular term should be searched for. Options are: mainmainentry, mainentry, match, match-heading, see-from, and thesaurus. If left blank, any field is used.
and_or: Totally ignored. It is never used in C4::AuthoritiesMarc::SearchAuthorities.
excluding: Also ignored.
operator: What form of search to do. Options are: is (phrase, no truncation, whole field must match), = (number exact match), exact (phrase, no truncation, whole field must match). If left blank, then word list, right truncated, anywhere is used.
value: The actual user-provided string value to search for.
authtypecode: The authority type code to search within. If blank, then all will be searched.
orderby: The order to sort the results by. Options are Relevance, HeadingAsc, HeadingDsc, AuthidAsc, AuthidDsc.

marclist, operator, and value must be the same length, and the values at index /i/ all relate to each other.

This returns a query, which is a black box object that can be passed to the appropriate search object.

_build_scan_query

    my ($query, $query_str) = $builder->_build_scan_query(\@operands, \@indexes)

This will build an aggregation scan query that can be issued to elasticsearch from the provided string input.

_create_regex_filter

    my $filter = $builder->_create_regex_filter('term')

This will create a regex filter that can be used with an aggregation query.

_convert_sort_fields

    my @sort_params = _convert_sort_fields(@sort_by)

Converts the zebra-style sort index information into elasticsearch-style.

@sort_by is the same as presented to build_query_compat, and it returns something that can be sent to build_query.

_convert_index_strings

    my @searches = $self->_convert_index_strings(@searches);

Similar to _convert_index_fields, this takes strings of the form field:search term and rewrites the field from zebra-style to elasticsearch-style. Anything it doesn't understand is returned verbatim.

_convert_index_strings_freeform

    my $search = $self->_convert_index_strings_freeform($search);

This is similar to _convert_index_strings, however it'll search out the things to change within the string. So it can handle strings such as (su:foo) AND (su:bar), converting the su appropriately.

If there is something of the form "su,complete-subfield" or something, the second part is stripped off as we can't yet handle that. Making it work will have to wait for a real query parser.

_modify_string_by_type

    my $str = $self->_modify_string_by_type(%index_field);

If you have a search term (operand) and a type (phrase, right-truncated), this will convert the string to have the function in lucene search terms, e.g. wrapping quotes around it.

_join_queries

    my $query_str = $self->_join_queries(@query_parts);

This takes a list of query parts, that might be search terms on their own, or booleaned together, or specifying fields, or whatever, wraps them in parentheses, and ANDs them all together. Suitable for feeding to the ES query string query.

Note: doesn't AND them together if they specify an index that starts with "mc" as that was a special case in the original code for dealing with multiple choice options (you can't search for something that has an itype of A and and itype of B otherwise.)

_make_phrases

    my @phrased_queries = $self->_make_phrases(@query_parts);

This takes the supplied queries and forces them to be phrases by wrapping quotes around them. It understands field prefixes, e.g. 'subject:' and puts the quotes outside of them if they're there.

_create_query_string

    my @query_strings = $self->_create_query_string(@queries);

Given a list of hashrefs, it will turn them into a lucene-style query string. The hash should contain field, type (both for the indexes), operator, and operand.

clean_search_term

    my $term = $self->clean_search_term($term);

This cleans a search term by removing any funny characters that may upset ES and give us an error. It also calls _convert_index_strings_freeform to ensure those parts are correct.

_query_regex_escape_process

    my $query = $self->_query_regex_escape_process($query);

Processes query in accordance with current "QueryRegexEscapeOptions" system preference setting.

_fix_limit_special_cases

    my $limits = $self->_fix_limit_special_cases($limits);

This converts any special cases that the limit specifications have into things that are more readily processable by the rest of the code.

The argument should be an arrayref, and it'll return an arrayref.

_sort_field

    my $field = $self->_sort_field($field);

Given a field name, this works out what the actual name of the field to sort on should be. A '__sort' suffix is added for fields with a sort version, and for text fields either '.phrase' (for sortable versions) or '.raw' is appended to avoid sorting on a tokenized value.

_truncate_terms

    my $query = $self->_truncate_terms($query);

Given a string query this function appends '*' wildcard to all terms except operands and double quoted strings.

_split_query

    my @token = $self->_split_query($query_str);

Given a string query this function splits it to tokens taking into account any field prefixes and quoted strings.

_search_fields my $weighted_fields = $self->_search_fields({ is_opac => 0, weighted_fields => 1, subfield => 'raw' });

Generate a list of searchable fields to be used for Elasticsearch queries applied to multiple fields.

Returns an arrayref of field names for either OPAC or staff interface, with possible weights and subfield appended to each field name depending on the options provided.

$params: Hashref with options. The parameter is_opac indicates whether the searchable fields for OPAC or staff interface should be retrieved. If weighted_fields is set fields weights will be applied on returned fields. subfield can be used to provide a subfield that will be appended to fields as "field_name.subfield".

_is_safe_to_auto_truncate

_is_safe_to_auto_truncate($index_field, $oand);

Checks if it is safe to auto truncate a search term within a given search field.

The search term should not be auto truncated when searching for identifiers, e.g. koha-auth-number, record-control-number, local-number etc. Also, non-text fields must not be auto truncated (doing so would generate ES exception).