Understanding Full-text Indexing With Verity

This topic covers these issues:

Full Text Indexing of XML and HTML

The new full-text search option specific to HTML form templates and XML documents has been added to the Verity search engine. The XML filter supports indexing and viewing well-formed XML documents. Stellent Content Server uses the universal filter which includes the XML filter as a helper filter.

Tech Tip: If you are updating from a pre-6.0 release of the Stellent Content Server, you will need to rebuild the index.

style.xml File

By default, the XML filter indexes regions of the document delimited by XML tags as zones, with the zones given the same name as the XML tag. META tags are automatically indexed as fields unless they are in a suppressed region.

The style.xml file enables administrators to change the default behavior of the indexer for XML documents. Administrators can specify field and zone indexing for regions of the document delimited by XML tags and skip regions of the document delimited by XML tags.

The sample style.xml contains code examples that are commented out.

style.xml Command Syntax

<command attribute="value"/>

style.xml Command Summary

field

Indexes the content between the pair of specified XML tags as field values. By default, the field name is the same as the xmltag value, unless otherwise specified by the fieldname attribute. Attributes: xmltag fieldname index

ignore

Skips indexing of xmltag but indexes the content between the pair of specified XML tags. Attributes: xmltag

preserve

Indexes specified xmltag as a zone if preceded by ignore xmltag="*". Attributes: xmltag

suppress

Suppresses every xmltag embedded within the specified xmltag. Attributes: xmltag

style.xml Command Examples

The following command ignores all XML tags in the document, indexing only the content:

<ignore xmltag = "*"/>

The following command skips indexing the specified xmltag but indexes the content between the start and end tags of the specified xmltag:

<ignore xmltag = "section_1"/>

The following command indexes xmltag as a zone if there is also an ignore xmltag ="*" command:

<preserve xmltag = "section_1">

The following command suppresses the entire element identified by xmltag. The tag, attribute, and content are not indexed:

<suppress xmltag = "section_1"/>

The following command indexes the content between the start and end tags of the specified xmltag as a field, which is given the same name as xmltag:

<field xmltag = "colum_1"/>

The following command indexes the content between the start and end tags of the specified xmltag as a field, which is given the name specified in the fieldname attribute:

<field xmltag = "column_2" fieldname = "vdk_field_2"/>

The following command indexes the content between the start and end tags of the specified xmltag as a field, overriding any existing value of the field:

<field xmltag = "column_2" index = "override"/>

Note: Both fieldname and index attributes can be used in a field command.

style.ufl File

If administrators have defined custom fields to be populated in the style.xml file, the fields must also be defined in the style.ufl file or style.sfl file, using standard syntax.

Using Query Language Examples

You can now narrow your search query syntax by specifying an XML tag, (term) <IN> tag.

(query) <IN> zone

or

(query) <IN> (zone1, zone2, . . .)

where query represents any query expression. To preclude ambiguity, the query expression must be places within parentheses. The zone variables represent the zone names. The zone name supplied must match the zone names defined in your collections. If more than one zone is to be searched, they must appear in a comma-separated list with parentheses surrounding them as shown.

Example

The following example illustrates the proper use of the IN operator.

To search in the zone names "summary" using the topic names "safety," use the following query expression:

(safety) <IN> summary

Using Full-text Indexing for PASSTHRU Files

If you define a file format to PASSTHRU (in its native format), and you desire full-text indexing, then ensure that the name of the format contains one of the following strings:

Example

If you want Excel files to remain in their native format and still be fully indexed, then perform these tasks:

  1. Select Allow format override on check in on the System Properties Options Configuration tab.

  2. Define a new format called application/ms-excel.native.

  3. Set the format to PASSTHRU.