Query describes what documents you want to extract from Summa for further processing. The stream of matched documents are then passed to collectors responsible for final processing. Therefore, it is important to understand the difference between queries and collectors. Queries tell which documents to take from the database and collectors describe how to process them and what to return to the users.

There are different types of possible queries. One type of query simply scores and returns documents. For example, TermQuery matches all documents containing the specified term and associates a score with every matched document. These scores can be used to rank documents by relevance. Other queries, such as BooleanQuery or DisjunctionMaxQuery, combine documents and scores matched by multiple sub-queries.

TermQuery

The most basic kind of query is a TermQuery. This type of query matches all documents that contain the specified term (word) within the specified field. Every matched document is also associated with a BM25 score that is relevant to the query.

{
  "term": {
    "field": "title", 
    "value": "astronomy"
  }
}

BooleanQuery

Allows combining multiple queries into a single one. Every sub-query has a property named occur that describes how to combine them.

  • must tells that all matched documents must match this sub-query as well.
  • must_not tells that all matched documents must not contain documents that match this sub-query.
  • should tells that matched documents may contain documents that match this sub-query.
    {
    "boolean": {
      "subqueries": [{
        "occur": "should",
        "query": {
          "term": {
            "field": "title",
            "value": "astronomy"
          }
        }
      }, {
        "occur": "must", 
        "query": {
          "term": {
            "field": "title",
            "value": "nebula"
          }
        }
      }, {
        "occur": "must_not", 
        "query": {
          "phrase": {
            "field": "author",
            "value": "tony igy"
          }
        }
      }]
    }
    }
    

DisjunctionMaxQuery

Allows to combine multiple queries into a single one. It is similar to BooleanQuery but scores are calculated in other way. Instead of summarizing scores of all sub-queries, it takes maximum score of a single sub-query. Such approach may be useful in specific cases like searching documents with synonyms.

{
  "disjunction_max": {
    "subqueries": [{
      "occur": "should",
      "query": {
        "term": {
          "field": "title",
          "value": "astronomy"
        }
      }
    }, {
      "occur": "must", 
      "query": {
        "term": {
          "field": "title",
          "value": "astronomia"
        }
      }
    }, {
      "occur": "must_not", 
      "query": {
        "phrase": {
          "field": "author",
          "value": "tony igy"
        }
      }
    }]
  }
}

BoostQuery

Modifies scores produced by a nested query. Useful in BooleanQuery to penalize or boost parts of the query.

{
  "boolean": {
    "subqueries": [{
      "occur": "should",
      "query": {
       "boost": {
         "query": {
           "term": {
             "field": "title",
             "value": "astronomy"
           }
         },
         "score": "2.0"
       }
      }
    }, {
      "occur": "must", 
      "query": {
        "term": {
          "field": "title",
          "value": "nebula"
        }
      }
    }]
  }
}

MatchQuery

MatchQuery is a special query. Summa takes the value written in SummaQL format, parses it and produces tree of queries that may be executed by search engine. MatchQuery may be used for parsing queries written in natural language. For example, following query

{
  "match": {
      "value": "astronomy +nebula -\"tony igy\""
  }
}

will be parsed into

{
  "boolean": {
    "subqueries": [{
      "occur": "should",
      "query": {
        "term": {
          "field": "title",
          "value": "astronomy"
        }
      }
    }, {
      "occur": "must", 
      "query": {
        "term": {
          "field": "title",
          "value": "nebula"
        }
      }
    }, {
      "occur": "must_not", 
      "query": {
        "phrase": {
          "field": "author",
          "value": "tony igy"
        }
      }
    }]
  }
}

SummaQL supports

  • RegexQuery: phone_number://7916.*//
  • ExistsQuery: phone_number:*

PhraseQuery

Documents containing exact occurrence of the phrase

{
  "phrase": {
    "field": "title",
    "value": "general astronomy"
  }
}

RegexQuery

Documents that have field value matched against the regular expression

{
  "regex": {
    "field": "category",
    "value": "book.*"
  }
}

RangeQuery

Documents where the requested field lays between the range

{
  "range": {
    "field": "create_timestamp",
    "range": {
      "left": "2021-01-01",
      "right": "2022-01-01"
    }  
  }
}

MoreLikeThisQuery

Documents that look like passed document

{
  "more_like_this": {
    "document": "{\"title\": \"astronomy\"}"
  }
}

AllQuery

All documents

{
  "all": {}
}