Home Downloads FAQ Documentation Development About

FAQ

Contents

  1. What service properties are available?
  2. What index properties are available?
  3. How do I perform queries using UTF-8 character encoding?

What service properties are available?

service.title
The title of the web service
service.defaultfield
The service-wide default field used when performing searches.
service.defaultoperator
The service-wide default operator used when performing searches ('OR', 'AND').
service.debugging
A boolean property which determines whether or not the service is in debugging mode.
query.expand
A boolean property determining whether or not the expansion mechanism should be enabled for providing related search queries. Defaults to "false".
query.spellcheck
A boolean property determining whether or not the spell checking mechanism should be enabled for providing related search queries. Defaults to "false".
query.suggest
A boolean property determining whether or not the similar document mechanism should be enabled for providing related search queries. Defaults to "false".

What index properties are available?

index.analyzer
The class name of the preferred analyzer for the index.
index.author
Given any document from this index, this field will contain the name of its author.
index.defaultoperator
The preferred default operator when performing search requests on this index. ('OR', 'AND')
index.image
The url of an image representing the index. Typically a small icon.
index.image.height
The pixel height of the image representing the index.
index.image.width
The pixel width of the image representing the index.
index.image.type
The content type of the image representing the index.
index.readonly
Specifies whether or not the index may only be read.
index.title
The title of the index. This is typically set if the index name is not satisfactory.
document.defaultfield
The default field used when searching this index.
document.identifier
Given any document (within this index), this field contains its unique identifier.
document.identifier.validator
A regular expression for the purposes of validating identifiers being associated with documents.
document.author
A template for building the title of a particular document. (i.e. - '[last_name], [first_name]')
document.title
A template for building the title of a particular document. (i.e. - '[item_id]: [title]')
document.updated
The name of a document field containing an epoch number corresponding to the last time it was updated.

How do I perform queries using UTF-8 character encoding?

Because of the limitations of the Java Servlet API, the Lucene Web Service is not responsible for parsing the incoming HTTP requests into meaningful text. This is handled by whatever deployment platform (i.e. - Tomcat) it has been deployed with. Please see your deployment platform's documentation to see how to enable UTF-8 encoding on incoming requests.

How to enable UTF-8 encoding with Apache Tomcat

By default, Tomcat will not parse your GET requests properly, but here is how we make it so:

  1. Open the file at TomcatDirectory/conf/server.xml. This is the configuration file for the Tomcat server. Scan down to around line 77 (in my copy of the file) until you see an XML tag called "Connnector". It should already have several attributes such as "port", "maxHttpHeaderSize", "maxThreads", etc. You must add another attribute called "URIEncoding", setting its value to "UTF-8". My copy looks something like this:
    <Connector port="8080" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/>
  2. Save the file and restart your Tomcat server

Now, whenever Tomcat receives a request via HTTP, it will attempt to parse UTF-8 encoded strings from it. What you must do is take whatever query you're searching for and break it down into its underlying bytes (the UTF-8 bytes representing the string). Your requested query must now be the HTML encoded representation of those bytes (not necessarily the character codes).

For example, suppose I want to search for "gâteau". The non-ASCII character in question is the "â". According to the Unicode standard, this character's code is 0xE2 (226 in our number system). According to the UTF-8 standard, this character is stored as two bytes: 0xC3 0xA2 (195 162 in our number system). The query that gets sent to the server must look as follows:

GET /lucene/some_index?query=g%C3%A2teau

This way, Tomcat will understand that what you're submitting to it are UTF-8 encoded strings and the Lucene Web Service will behave correctly.