| |
- builtins.object
-
- QryParser
class QryParser(builtins.object) |
|
QryParser is an embarrassingly simplistic query parser. It has
two primary methods: getQuery and tokenizeString. getQuery
converts a query string into an optimized Qry tree. tokenizeString
converts a flat (unstructured) query string into a string array; it
is used for creating learning-to-rank feature vectors.
Add new operators to the query parser by modifying the following
methods:
createOperator: Use a string (e.g., #and) to create a node
(e.g., QrySopAnd).
parseString: If the operator supports term weights
(e.g., #wsum (0.5 apple 1 pie)), you must modify this method.
For these operators, two substrings (weight and term) are
popped from the query string at each step, instead of one.
Add new document fields to the parser by modifying createTerms. |
|
Static methods defined here:
- getQuery(queryString)
- Parse a query string into a query tree.
queryString: The query string, in an Indri-style query language.
Returns: The query tree for the parsed query.
throws IOException: Error accessing the Lucene index.
throws IllegalArgumentException: Query syntax error.
- optimizeQuery(q)
- Optimize the query by removing degenerate nodes produced during
query parsing, for example '#NEAR/1 (of the)' which turns into
'#NEAR/1 ()' after stopwords are removed; and unnecessary nodes
or subtrees, such as #AND (#AND (a)), which can be replaced by
'a'.
- parseString(queryString)
- Parse a query string into a query tree.
queryString: The query string, in an Indri-style query language.
Returns The query tree for the parsed query.
throws IOException: Error accessing the Lucene index.
throws IllegalArgumentException: Query syntax error.
- tokenizeString(query)
- Given part of a query string, returns an array of terms with
stopwords removed and the terms stemmed using the Krovetz
stemmer. Use this method to process raw query terms.
query: String containing query.
Returns a list of query tokens
throws IOException: Error accessing the Lucene index.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |