How to Review the Purpose of an Index in Oracle

2
Indexing

The affiliate is an introduction to Oracle Text indexing. The following topics are covered:

About Oracle Text Indexes
Considerations For Indexing
Index Creation
Index Maintenance
Managing DML Operations for a CONTEXT Index

Well-nigh Oracle Text Indexes

An Oracle Text index is an Oracle domain index.To build your query application, y'all can create an index of type CONTEXT and query it with the CONTAINS operator.

You create an index from a populated text table. In a query application, the tabular array must incorporate the text or pointers to where the text is stored. Text is unremarkably a drove of documents, merely can too exist small text fragments.

For better performance for mixed queries, you can create a CTXCAT index. Use this index type when your application relies heavily on mixed queries to search minor documents or descriptive text fragments based on related criteria such equally dates or prices. You query this index with the CATSEARCH operator.

To build a certificate classification application, you create an index of type CTXRULE. With such an alphabetize, you tin allocate plainly text, HTML, or XML documents using the MATCHES operator. Y'all shop your defining query fix in the text table you alphabetize.

If you are working with XMLtype columns, yous can create a CTXXPATH alphabetize to speed up queries with ExistsNode.

You create a text index as a blazon of extensible index to Oracle using standard SQL. This means that an Oracle Text alphabetize operates like an Oracle index. It has a name by which it is referenced and can exist manipulated with standard SQL statements.

The benefits of a creating an Oracle Text index include fast response time for text queries with the CONTAINS, CATSEARCH, and MATCHES Oracle Text operators. These operators query the CONTEXT, CTXCAT, and CTXRULE index types respectively.

Structure of the Oracle Text CONTEXT Index

Oracle Text indexes text by converting all words into tokens. The general structure of an Oracle Text CONTEXT index is an inverted alphabetize where each token contains the list of documents (rows) that contain that token.

For example, later on a single initial indexing functioning, the word DOG might have an entry equally follows:

Dog DOC1 DOC3 DOC5

This ways that the word Canis familiaris is contained in the rows that store documents i, 3 and 5.

For more than information, run across optimizing the alphabetize in this chapter.

Merged Word and Theme Alphabetize

By default in English and French, Oracle Text indexes theme information with word information. You can query theme information with the Nigh operator. You can optionally enable and disable theme indexing.

The Oracle Text Indexing Process

This section describes the Oracle Text indexing process.Y'all initiate the indexing process with the CREATE Alphabetize statement. The goal is to create an Oracle Text index of tokens co-ordinate to the parameters and preferences you specify.

Effigy 2-ane shows the indexing process. This process is a data stream that is acted upon by the dissimilar indexing objects. Each object corresponds to an indexing preference type or section grouping yous tin specify in the parameter cord of CREATE INDEX or Alter INDEX. The sections that follow depict these objects.

Figure 2-1 Oracle Text Indexing procedure

Text description of ccapp011.gif follows
Text description of the illustration ccapp011.gif

Datastore Object

The stream starts with the datastore reading in the documents as they are stored in the system according to your datastore preference. For instance, if you have defined your datastore equally FILE_DATASTORE, the stream starts by reading the files from the operating organisation. Y'all can also store yous documents on the net or in the Oracle database.

Filter Object

The stream then passes through the filter. What happens here is determined by your FILTER preference. The stream tin can be acted upon in one of the post-obit ways:

No filtering takes place. This happens when you specify the NULL_FILTER preference type. Documents that are patently text, HTML, or XML need no filtering.
Formatted documents (binary) are filtered to marked-up text. This happens when you specify the INSO_FILTER preference type.
Text is converted from a non-database character set to the database grapheme set. This happens when you specify CHARSET_FILTER preference type.

Sectioner Object

Later on being filtered, the marked-up text passes through the sectioner that separates the stream into text and section information. Section data includes where sections begin and cease in the text stream. The type of sections extracted is determined by your section group blazon.

The section information is passed direct to the indexing engine which uses it later. The text is passed to the lexer.

Lexer Object

The lexer breaks the text into tokens according to your language. These tokens are normally words. To extract tokens, the lexer uses the parameters as defined in your lexer preference. These parameters include the definitions for the characters that carve up tokens such as whitespace, and whether to convert the text to all uppercase or to exit information technology in mixed example.

When theme indexing is enabled, the lexer analyses your text to create theme tokens for indexing.

Indexing Engine

The indexing engine creates the inverted alphabetize that maps tokens to the documents that contain them. In this phase, Oracle uses the stoplist yous specify to exclude stopwords or stopthemes from the alphabetize. Oracle likewise uses the parameters defined in your WORDLIST preference, which tell the system how to create a prefix index or substring index, if enabled.

Partitioned Tables and Indexes

You can create a partitioned CONTEXT alphabetize on a partitioned text table. The table must be partitioned by range. Hash, composite and list partitions are not supported.

Yous might create a partitioned text table to partition your data by date. For example, if your application maintains a large library of dated news articles, you tin division your information by month or year. Division simplifies the manageability of large databases since querying, DML, and backup and recovery can human activity on single partitions.

Querying Partitioned Tables

To query a partitioned table, you use CONTAINS in the SELECT statement no differently as y'all query a regular tabular array. You lot tin query the entire table or a unmarried division. Nevertheless, if you are using the ORDER BY SCORE clause, Oracle recommends that you query unmarried partitions unless y'all include a range predicate that limits the query to a single partition.

Creating an Index Online

When information technology is not applied to lock upwards your base table for indexing because of ongoing updates, you tin create your index online with the ONLINE parameter of CREATE Alphabetize. This way an awarding with heavy DML demand non stop updating the base table for indexing.

There are short periods, however, when the base tabular array is locked at the kickoff and end of the indexing process.

Parallel Indexing

Oracle Text supports parallel indexing with CREATE INDEX.

When you upshot a parallel indexing command on a non-partitioned table, Oracle splits the base tabular array into partitions, spawns slave processes, and assigns a unlike segmentation to each slave. Each slave indexes the rows in its segmentation. The method of slicing the base of operations tabular array into partitions is determined by Oracle and is not nether your direct control. This is true also for the number of slave processes actually spawned, which depends on machine capabilities, system load, your init.ora settings, and other factors. The actual parallel degree may not match the degree of parallelism requested.

Since indexing is an I/O intensive functioning, parallel indexing is most effective in decreasing your indexing time when you accept distributed disk admission and multiple CPUs. Parallel indexing tin only bear upon the functioning of an initial index with CREATE INDEX. It does non impact DML functioning with Change INDEX, and has minimal impact on query operation.

Since parallel indexing decreases the initial indexing fourth dimension, it is useful for

data staging, when your product includes an Oracle Text index
rapid initial startup of applications based on large data collections
application testing, when you need to test different index parameters and schemas while developing your application

Limitations for Indexing

Columns with Multiple Indexes

A column can have no more than a unmarried domain index attached to information technology, which is in keeping with Oracle standards. However, a unmarried Text alphabetize can contain theme information in improver to give-and-take information.

Indexing Views

Oracle SQL standards does not back up creating indexes on views. Therefore, if you demand to index documents whose contents are in different tables, you can create a data storage preference using the USER_DATASTORE object. With this object, y'all can ascertain a procedure that synthesizes documents from different tables at index time.

Considerations For Indexing

You use the CREATE INDEX statement to create an Oracle Text index. When you create an index and specify no parameter string, an index is created with default parameters.

You can also override the defaults and customize your index to arrange your query application. The parameters and preference types you employ to customize your alphabetize with CREATE INDEX fall into the following general categories.

Type of Index

With Oracle Text, you can create i of four index types with CREATE Index. The following table describes each blazon, its purpose, and what features it supports:

Index Type	Description	Supported Preferences and Parameters	Query Operator	Notes
`CONTEXT`	Apply this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as MS Give-and-take, HTML or apparently text. With a context index, you tin customize your index in a variety of ways. This index type requires CTX_DDL.SYNC_INDEX later on DML to base table.	All `CREATE` `INDEX` preferences and parameters supported except for `INDEX` `Prepare`. These supported parameters include the index partition clause, and the format, charset, and language columns.	`CONTAINS` Grammar is chosen the CONTEXT grammar, which supports a rich ready of operations. The CTXCAT grammer can be used with query templating.	Supports all documents services and query services. Supports indexing of partitioned text tables.
`CTXCAT`	Use this index type for better mixed query performance. Typically, with this index blazon, you lot index pocket-sized documents or text fragments. Other columns in the base table, such as item names, prices and descriptions can be included in the index to improve mixed query performance. This index type is transactional, automatically updating itself after DML to base of operations table. No CTX_DDL.SYNC is necessary.	`INDEX` `Fix` `LEXER` (theme indexing not supported) `STOPLIST` `STORAGE` `WORDLIST` (only prefix_index attribute supported for Japanese data) Format, charset, and language columns non supported. Tabular array and index sectionalisation not supported.	`CATSEARCH` Grammar is chosen CTXCAT, which supports logical operations, phrase queries, and wildcarding. The CONTEXT grammar can be used with query templating.	The size of a `CTXCAT` alphabetize is related to the total amount of text to exist indexed, number of indexes in the index set, and number of columns indexed. Advisedly consider your queries and your resources before adding indexes to the alphabetize prepare. The `CTXCAT` alphabetize does not support tabular array and index division, documents services (highlighting, markup, themes, and gists) or query services (explain, query feedback, and browse words.)
`CTXRULE`	Apply `CTXRULE` index to build a certificate classification or routing awarding. The `CTXRULE` index is an index created on a tabular array of queries, where the queries define the classification or routing criteria.	Merely the `BASIC_LEXER` type supported for indexing your query ready. Queries in your query gear up can include `ABOUT`, `STEM`, `AND`, `NEAR`, `NOT`, and OR operators. The post-obit operators are not supported: `ACCUM, EQUIV, WITHIN, WILDCARD, FUZZY, SOUNDEX, MINUS, WEIGHT, THRESHOLD`. The `CREATE Alphabetize` storage clause supported for creating the index on the queries. Section group supported for when you use the `MATCHES` operator to classify documents. Wordlist supported for stemming operations on your query set. Filter, retentiveness, datastore, and populate parameters are not applicable to alphabetize type `CTXRULE`.	MATCHES	Single documents (plain text, HTML, or XML) tin can be classified using the `MATCHES` operator, which turns a certificate into a fix of queries and finds the matching rows in the `CTXRULE` alphabetize.
CTXXPATH	Create this index when yous need to speed upward ExistsNode() queries on an XMLType column.	`STORAGE`	Use with ExistsNode()	Can only create this index on XMLType column. See Oracle9i Application Programmer'due south Guide - XML for information.

Location of Text

Your document text can reside in ane of three places, the text table, the file system, or the earth-wide web. When yous index with CREATE Index, you lot specify the location using the datastore preference. Utilise the appropriate datastore co-ordinate to your application.

The following table describes all the different means you tin shop your text with the datastore preference type.

Datastore Blazon	Use When
`DIRECT_DATASTORE`	Data is stored internally in a text column. Each row is indexed as a unmarried certificate. Your text cavalcade can be `VARCHAR2, CLOB, BLOB, CHAR, or BFILE`. `XMLType` columns are supported for the context index type.
`MULTI_COLUMN_DATASTORE`	Information is stored in a text tabular array in more than one column. Columns are concatenated to create a virtual certificate, ane certificate per row.
`DETAIL_DATASTORE`	Data is stored internally in a text column. Document consists of 1 or more rows stored in a text cavalcade in a detail table, with header information stored in a master tabular array.
`FILE_DATASTORE`	Data is stored externally in operating system files. Filenames are stored in the text column, one per row.
`NESTED_DATASTORE`	Information is stored in a nested tabular array.
`URL_DATASTORE`	Data is stored externally in files located on an intranet or the Internet. Compatible Resource Locators (URLs) are stored in the text column.
`USER_DATASTORE`	Documents are synthesized at index time by a user-defined stored procedure.

Indexing time and document retrieval time will be increased for indexing URLs since the organisation must retrieve the certificate from the network.

Document Formats and Filtering

Formatted documents such as Microsoft Word and PDF must be filtered to text to be indexed. The blazon of filtering the system uses is adamant by the FILTER preference blazon. By default the system uses the INSO_FILTER filter blazon which automatically detects the format of your documents and filters them to text.

Oracle can index most formats. Oracle can also alphabetize columns that comprise documents with mixed formats.

No Filtering for HTML

If y'all are indexing HTML or evidently text files, practise not employ the INSO_FILTER type. For all-time results, utilize the NULL_FILTER preference type.

Filtering Mixed Formatted Columns

If you accept a mixed format column such equally one that contains Microsoft Word, plain text, and HTML documents, y'all can bypass filtering for evidently text or HTML by including a format cavalcade in your text table. In the format cavalcade, you tag each row TEXT or BINARY. Rows that are tagged TEXT are not filtered.

For instance, yous can tag the HTML and evidently text rows equally TEXT and the Microsoft Word rows equally BINARY. You lot specify the format column in the CREATE Alphabetize parameter clause.

Custom Filtering

Yous can create your own custom filter to filter documents for indexing. You can create either an external filter that is executed from the file system or an internal filter equally a PL/SQL or Java stored procedure.

For external custom filtering, use the USER_FILTER filter preference blazon.

For internal filtering, use the PROCEDURE_FILTER filter type.

Bypassing Rows for Indexing

You can featherbed rows in your text table that are not to be indexed, such every bit rows that comprise image information. To do and then, create a format column in your tabular array and set information technology to IGNORE. You proper name the format cavalcade in the parameter clause of CREATE Alphabetize.

Document Character Prepare

The indexing engine expects filtered text to be in the database character gear up. When you apply the INSO_FILTER filter blazon, formatted documents are converted to text in the database character set.

If your source is text and your certificate graphic symbol set is not the database graphic symbol set, you lot can apply the INSO_FILTER or CHARSET_FILTER filter type to convert your text for indexing.

Mixed Character Set Columns

If your document fix contains documents with different character sets, such every bit JA16EUC and JA16SJIS, you can index the documents provided you create a charset column. You populate this cavalcade with the name of the document character set on a per-row basis. Y'all name the cavalcade in the parameter clause of the CREATE Alphabetize argument.

Document Linguistic communication

Oracle can index most languages. Past default, Oracle assumes the language of text to index is the linguistic communication you specify in your database setup.

Y'all use the BASIC_LEXER preference type to alphabetize whitespace-delimited languages such every bit English, French, High german, and Spanish. For some of these languages you tin can enable alternate spelling, blended word indexing, and base of operations letter conversion.

Y'all can also alphabetize Japanese, Chinese, and Korean.

Languages Features Outside BASIC_LEXER

With the BASIC_LEXER, Japanese, Chinese and Korean lexers, Oracle Text provides a lexing solution for most languages. For other languages such equally Thai and Arabic, yous tin can create your own lexing solution using the user-defined lexer interface. This interface enables you lot to create a PL/SQL or Coffee procedure to process your documents during indexing and querying.

You can too use the user-divers lexer to create your own theme lexing solution or linguistic processing engine.

Indexing Multi-language Columns

Oracle can index text columns that comprise documents of different languages, such as a cavalcade that contains documents written in English, German, and Japanese. To alphabetize a multi-linguistic communication cavalcade, you need a linguistic communication cavalcade in your text table. Use the MULTI_LEXER preference blazon.

You lot can likewise comprise a multi-language stoplist when you lot index multi-language columns.

Indexing Special Characters

When you use the BASIC_LEXER preference type, yous tin specify how non-alphanumeric characters such as hyphens and periods are indexed with respect to the tokens that contain them. For example, you tin specify that Oracle include or exclude hyphen character (-) when indexing a word such as web-site.

These characters fall into BASIC_LEXER categories according to the behavior you require during indexing. The way the you lot set the lexer to behave for indexing is the style information technology behaves for query parsing.

Some of the special characters you can set are as follows:

Printjoins Grapheme

Define a non-alphanumeric character every bit printjoin when you want this character to exist included in the token during indexing.

For example, if you want your alphabetize to include hyphens and underscore characters, ascertain them as printjoins. This means that words such equally web-site are indexed as web-site. A query on website does not find web-site.

Skipjoins Graphic symbol

Define a non-alphanumeric character equally a skipjoin when you do not want this character to be indexed with the token that contains information technology.

For example, with the hyphen (-) graphic symbol defined equally a skipjoin, the give-and-take web-site is indexed as website. A query on web-site finds documents containing website and web-site.

Other Characters

Other characters can be specified to control other tokenization beliefs such every bit token separation (startjoins, endjoins, whitespace), punctuation identification (punctuations), number tokenization (numjoins), and word continuation after line-breaks (continuation). These categories of characters take defaults, which yous can alter.

Case-Sensitive Indexing and Querying

By default, all text tokens are converted to capital and and then indexed. This results in case-insensitive queries. For case, separate queries on each of the three words true cat, CAT, and True cat all return the aforementioned documents.

You can modify the default and take the index tape tokens as they appear in the text. When you create a instance-sensitive index, y'all must specify your queries with exact example to friction match documents. For example, if a document contains Cat, you must specify your query as True cat to match this document. Specifying true cat or Cat does not return the document.

To enable or disable example-sensitive indexing, utilise the mixed_case attribute of the BASIC_LEXER preference.

Language Specific Features

You can enable the following language specific features at index fourth dimension:

Indexing Themes

For English and French, you can index document theme information. A certificate theme is a chief document concept. Themes can be queried with the Almost operator.

You can index theme information in other languages provided you lot take loaded and compiled a knowledge base for the linguistic communication.

By default themes are indexed in English and French. You lot can enable and disable theme indexing with the index_themes attribute of the BASIC_LEXER preference type.

Base of operations-Alphabetic character Conversion for Characters with Diacritical Marks

Some languages contain characters with diacritical marks such as tildes, umlauts, and accents. When your indexing operation converts words containing diacritical marks to their base letter class, queries need non contain diacritical marks to score matches. For example in Spanish with a base of operations-letter of the alphabet index, a query of energía matches energía and energia in the index.

However, with base of operations-letter indexing disabled, a query of energía matches only energía.

You can enable and disable base-alphabetic character indexing for your language with the base_letter attribute of the BASIC_LEXER preference type.

Alternating Spelling

Languages such as High german, Danish, and Swedish contain words that have more than ane accustomed spelling. For example, in German, the ä character can be substituted for the ae character. The ae character is known as the base letter form.

By default, Oracle indexes words in their base-letter form for these languages. Query terms are also converted to their base-alphabetic character course. The result is that these words tin can be queried with either spelling.

You can enable and disable alternate spelling for your language using the alternate_spelling aspect in the BASIC_LEXER preference type.

Blended Words

German language and Dutch text contain blended words. Past default, Oracle creates composite indexes for these languages. The result is that a query on a term returns words that contain the term as a sub-composite.

For example, in High german, a query on the term Bahnhof (train station) returns documents that incorporate Bahnhof or whatsoever word containing Bahnhof as a sub-composite, such equally Hauptbahnhof, Nordbahnhof, or Ostbahnhof.

You can enable and disable the cosmos of composite indexes with the composite attribute of the BASIC_LEXER preference.

Korean, Japanese, and Chinese Indexing

Yous index these languages with specific lexers:

Language	Lexer
Korean	`KOREAN_MORPH_LEXER`
Japanese	`JAPANESE_LEXER`
Chinese	`CHINESE_VGRAM_LEXER`

The KOREAN_MORPH_LEXER has its own fix of attributes to control indexing. Features include blended word indexing.

Fuzzy Matching and Stemming

Fuzzy matching enables you to match similarly spelled words in queries. Stemming enables you to match words with the same linguistic root.

Fuzzy matching and stemming are automatically enabled in your alphabetize if Oracle Text supports this feature for your language.

Fuzzy matching is enabled with default parameters for its similarity score lower limit and for its maximum number of expanded terms. At alphabetize time you can change these default parameters.

To better the performance of stem queries, you lot tin can create a stem index by enabling the index_stems aspect of the BASIC_LEXER.

Ameliorate Wildcard Query Performance

Wildcard queries enable you lot to result left-truncated, correct-truncated and doubly truncated queries, such as %ing, cos%, or %benz%. With normal indexing, these queries tin can sometimes expand into big discussion lists, degrading your query performance.

Wildcard queries have better response time when token prefixes and substrings are recorded in the index.

Past default, token prefixes and substrings are not recorded in the Oracle Text index. If your query application makes heavy utilise of wildcard queries, consider indexing token prefixes and substrings. To exercise so, use the wordlist preference type. The trade-off is a bigger index for improved wildcard searching.

Document Section Searching

For documents that have internal structure such as HTML and XML, you tin can define and index document sections. Indexing document sections enables you to narrow the telescopic of your queries to within pre-defined sections. For instance, yous can specify a query to detect all documents that comprise the term dog inside a department you define as Headings.

Sections must be defined prior to indexing and specified with the section group preference.

Oracle Text provides section groups with organisation-defined section definitions for HTML and XML. You tin also specify that the organization automatically create sections from XML documents during indexing.

Stopwords and Stopthemes

A stopword is a word that is non to exist indexed. Ordinarily stopwords are low information words in a given language such as this and that in English language.

Past default, Oracle provides a list of stopwords called a stoplist for indexing a given language. You can alter this list or create your ain with the CTX_DDL package. You lot specify the stoplist in the parameter string of CREATE Index.

A stoptheme is a word that is prevented from existence theme-indexed or prevented from contributing to a theme. Yous can add together stopthemes with the CTX_DDL package.

You tin can search document themes with the ABOUT operator. You can retrieve certificate themes programatically with the CTX_DOC PL/SQL package.

Multi-Linguistic communication Stoplists

You can as well create multi-linguistic communication stoplists to agree language-specific stopwords. A multi-language stoplist is useful when you apply the MULTI_LEXER to alphabetize a tabular array that contains documents in dissimilar languages, such as English, German, and Japanese.

At indexing fourth dimension, the language column of each document is examined, and only the stopwords for that linguistic communication are eliminated. At query time, the session language setting determines the active stopwords, similar it determines the active lexer when using the multi-lexer.

Index Operation

At that place are factors that influence indexing functioning including memory allocation, document format, degree of parallelism, and partitioned tables.

Query Performance and Storage of LOB Columns

If your table contains LOB structured columns that are often accessed in queries just rarely updated, you can improve query performance by storing these columns out of line.

Alphabetize Cosmos

You can create three types of indexes with Oracle Text: CONTEXT, CTXCAT, and CTXRULE.

Procedure for Creating a CONTEXT Alphabetize

By default, the system expects your documents to exist stored in a text column. One time this requirement is satisfied, you lot can create a text index using the CREATE Index SQL command every bit an extensible index of type context, without explicitly specifying any preferences. The system automatically detects your language, the datatype of the text column, format of documents, and sets indexing preferences appropriately.

To create an Oracle Text index, exercise the post-obit:

Optionally, determine your custom indexing preferences, section groups, or stoplists if non using defaults. The following table describes these indexing classes:

Class	Description
Datastore	How are your documents stored?
Filter	How can the documents exist converted to plaintext?
Lexer	What linguistic communication is beingness indexed?
Wordlist	How should stem and fuzzy queries be expanded?
Storage	How should the alphabetize information be stored?
Finish Listing	What words or themes are not to exist indexed?
Section Group	How are documents sections defined?

Optionally, create your ain custom preferences, section groups, or stoplists. Run into "Creating Preferences" in this chapter.
Create the Text index with the SQL command CREATE INDEX, naming your index and optionally specifying preferences. Encounter "Creating an Index" in this chapter.

Creating Preferences

You tin optionally create your own custom alphabetize preferences to override the defaults. Use the preferences to specify index data such as where your files are stored and how to filter your documents. You create the preferences and then set the attributes.

Datastore Examples

The following sections give examples for setting direct, multi-column, URL, and file datastores.

Specifying DIRECT_DATASTORE

The following instance creates a table with a CLOB cavalcade to shop text data. It then populates two rows with text data and indexes the tabular array using the system-defined preference CTXSYS.DEFAULT_DATASTORE.

create table mytable(id number primary key, docs clob);   insert into mytable values(111555,'this text will exist indexed'); insert into mytable values(111556,'this is a direct_datastore instance'); commit;  create alphabetize myindex on mytable(docs)    indextype is ctxsys.context    parameters ('DATASTORE CTXSYS.DEFAULT_DATASTORE');

Specifying MULTI_COLUMN_DATASTORE

The post-obit example creates a multi-cavalcade datastore preference called my_multi on the 3 text columns to be concatenated and indexed:

begin ctx_ddl.create_preference('my_multi', 'MULTI_COLUMN_DATASTORE'); ctx_ddl.set_attribute('my_multi', 'columns', 'column1, column2, column3'); terminate;

Specifying URL Data Storage

This instance creates a URL_DATASTORE preference called my_url to which the http_proxy, no_proxy, and timeout attributes are set. The defaults are used for the attributes that are not set.

begin  ctx_ddl.create_preference('my_url','URL_DATASTORE');  ctx_ddl.set_attribute('my_url','HTTP_PROXY','www-proxy.us.oracle.com');  ctx_ddl.set_attribute('my_url','NO_PROXY','us.oracle.com');  ctx_ddl.set_attribute('my_url','Timeout','300'); end;

Specifying File Data Storage

The following example creates a information storage preference using the FILE_DATASTORE. This tells the organization that the files to be indexed are stored in the operating system. The case uses CTX_DDL.SET_ATTRIBUTE to gear up the PATH attribute of to the directory /docs.

begin ctx_ddl.create_preference('mypref', 'FILE_DATASTORE'); ctx_ddl.set_attribute('mypref', 'PATH', '/docs');  cease;

NULL_FILTER Case: Indexing HTML Documents

If your document set up is entirely HTML, Oracle recommends that yous use the NULL_FILTER in your filter preference, which does no filtering.

For example, to index an HTML document set, you lot tin specify the system-divers preferences for NULL_FILTER and HTML_SECTION_GROUP as follows:

create index myindex on docs(htmlfile) indextype is ctxsys.context    parameters('filter ctxsys.null_filter   section group ctxsys.html_section_group');

PROCEDURE_FILTER Instance

Consider a filter procedure CTXSYS.NORMALIZE that you define with the following signature:

Procedure NORMALIZE(id IN ROWID, charset IN VARCHAR2, input IN CLOB,  output IN OUT NOCOPY VARCHAR2);

To use this process as your filter, you gear up your filter preference equally follows:

begin   ctx_ddl.create_preference('myfilt', 'procedure_filter'); ctx_ddl.set_attribute('myfilt', 'procedure', 'normalize'); ctx_ddl.set_attribute('myfilt', 'input_type', 'clob'); ctx_ddl.set_attribute('myfilt', 'output_type', 'varchar2'); ctx_ddl.set_attribute('myfilt', 'rowid_parameter', 'True'); ctx_ddl.set_attribute('myfilt', 'charset_parameter', 'True');

end;

BASIC_LEXER Instance: Setting Printjoins Characters

Printjoin characters are non-alphanumeric characters that are to be included in index tokens, so that words such every bit web-site are indexed as web-site.

The following example sets printjoin characters to exist the hyphen and underscore with the BASIC_LEXER:

begin ctx_ddl.create_preference('mylex', 'BASIC_LEXER'); ctx_ddl.set_attribute('mylex', 'printjoins', '_-'); end;

To create the index with printjoins characters ready as above, upshot the following statement:

create alphabetize myindex on mytable ( docs )    indextype is ctxsys.context    parameters ( 'LEXER mylex' );

MULTI_LEXER Case: Indexing a Multi-Linguistic communication Tabular array

You lot apply the MULTI_LEXER preference type to alphabetize a column containing documents in different languages. For instance, yous can use this preference blazon when your text column stores documents in English, German, and French.

The first step is to create the multi-language table with a primary key, a text column, and a language cavalcade as follows:

create table globaldoc (    doc_id number primary central,    lang varchar2(three),    text clob );

Presume that the table holds mostly English documents, with some German and Japanese documents. To handle the three languages, you must create three sub-lexers, i for English, one for German, and one for Japanese:

ctx_ddl.create_preference('english_lexer','basic_lexer'); ctx_ddl.set_attribute('english_lexer','index_themes','yep'); ctx_ddl.set_attribute('english_lexer','theme_language','english');  ctx_ddl.create_preference('german_lexer','basic_lexer'); ctx_ddl.set_attribute('german_lexer','blended','german'); ctx_ddl.set_attribute('german_lexer','mixed_case','yep'); ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');  ctx_ddl.create_preference('japanese_lexer','japanese_vgram_lexer');

Create the multi-lexer preference:

ctx_ddl.create_preference('global_lexer', 'multi_lexer');

Since the stored documents are mostly English language, make the English lexer the default using CTX_DDL.ADD_SUB_LEXER:

ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');

Now add together the German and Japanese lexers in their corresponding languages with CTX_DDL.ADD_SUB_LEXER procedure. As well assume that the language cavalcade is expressed in the standard ISO 639-ii language codes, so add those as alternate values.

ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger'); ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');

Now create the alphabetize globalx, specifying the multi-lexer preference and the language column in the parameter clause every bit follows:

create index globalx on globaldoc(text) indextype is ctxsys.context parameters ('lexer global_lexer language column lang');

BASIC_WORDLIST Instance: Enabling Substring and Prefix Indexing

The following example sets the wordlist preference for prefix and substring indexing. Having a prefix and sub-cord component to your index improves functioning for wildcard queries.

For prefix indexing, the instance specifies that Oracle create token prefixes between 3 and four characters long:

begin    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');  ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','Truthful'); ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',3); ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 4); ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');

end;

Creating Section Groups for Department Searching

When documents accept internal construction such as in HTML and XML, y'all can define document sections using embedded tags before you lot alphabetize. This enables you lot to query within the sections using the WITHIN operator. Y'all ascertain sections as office of a section group.

Example: Creating HTML Sections

The following lawmaking defines a section group chosen htmgroup of type HTML_SECTION_GROUP. It and so creates a zone section in htmgroup chosen heading identified past the <H1> tag:

begin ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1'); end;

Using Stopwords and Stoplists

A stopword is a give-and-take that is not to exist indexed. A stopword is unremarkably a low information give-and-take such equally this or that in English.

The arrangement supplies a list of stopwords called a stoplist for every linguistic communication. By default during indexing, the system uses the Oracle Text default stoplist for your language.

You tin can edit the default stoplist CTXSYS.DEFAULT_STOPLIST or create your own with the following PL/SQL procedures:

CTX_DDL.CREATE_STOPLIST
CTX_DDL.ADD_STOPWORD
CTX_DDL.REMOVE_STOPWORD

You lot specify your custom stoplists in the parameter clause of CREATE Alphabetize.

You can likewise dynamically add stopwords afterwards indexing with the ALTER INDEX statement.

Multi-Linguistic communication Stoplists

Y'all tin can create multi-language stoplists to hold language-specific stopwords. A multi-linguistic communication stoplist is useful when you utilise the MULTI_LEXER to alphabetize a table that contains documents in different languages, such as English, German language, and Japanese.

To create a multi-language stoplist, use the CTX_DLL.CREATE_STOPLIST procedure and specify a stoplist type of MULTI_STOPLIST. Yous add linguistic communication specific stopwords with CTX_DDL.ADD_STOPWORD.

Stopthemes and Stopclasses

In add-on to defining your own stopwords, you lot can define stopthemes, which are themes that are not to be indexed. This characteristic is bachelor for English only.

You can also specify that numbers are non to exist indexed. A class of alphanumeric characters such a numbers that is not to exist indexed is a stopclass.

Yous tape your own stopwords, stopthemes, stopclasses by creating a single stoplist, to which you add the stopwords, stopthemes, and stopclasses. Y'all specify the stoplist in the paramstring for CREATE INDEX.

PL/SQL Procedures for Managing Stoplists

You utilise the post-obit procedures to manage stoplists, stopwords, stopthemes, and stopclasses:

CTX_DDL.CREATE_STOPLIST
CTX_DDL.ADD_STOPWORD
CTX_DDL.ADD_STOPTHEME
CTX_DDL.ADD_STOPCLASS
CTX_DDL.REMOVE_STOPWORD
CTX_DDL.REMOVE_STOPTHEME
CTX_DDL.REMOVE_STOPCLASS
CTX_DDL.DROP_STOPLIST

Creating an Index

Y'all create an Oracle Text index every bit an extensible index using the CREATE INDEX SQL control.

You tin can create three types of indexes:

CONTEXT
CTXCAT
CTXRULE

Creating a CONTEXT Index

The context index type is well-suited for indexing big coherent documents such as MS Discussion, HTML or plain text. With a context alphabetize, you can also customize your alphabetize in a variety of ways.

The documents must be loaded in a text table.

Default CONTEXT Index Example

The following command creates a default context alphabetize chosen myindex on the text column in the docs table:

CREATE INDEX myindex ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT;

When you apply CREATE Alphabetize without explicitly specifying parameters, the system does the following for all languages past default:

Assumes that the text to be indexed is stored direct in a text column. The text column tin exist of type CLOB, Hulk, BFILE, VARCHAR2, or CHAR.
Detects the column blazon and uses filtering for binary column types. Most document formats are supported for filtering. If your column is plain text, the organization does not use filtering.
Note:
For certificate filtering to piece of work correctly in your organisation, yous must ensure that your surroundings is gear up up correctly to support the Inso filter.

To learn more about configuring your surround to apply the Inso filter, run into the Oracle Text Reference.

Assumes the linguistic communication of text to index is the language you specify in your database setup.
Uses the default stoplist for the language you specify in your database setup. Stoplists place the words that the system ignores during indexing.
Enables fuzzy and stemming queries for your language, if this feature is available for your language.

You can always change the default indexing beliefs by creating your ain preferences and specifying these custom preferences in the parameter string of CREATE INDEX.

Custom CONTEXT Index Example: Indexing HTML Documents

To index an HTML certificate set located by URLs, you lot can specify the system-defined preference for the NULL_FILTER in the CREATE Index argument.

You can also specify your section group htmgroup that uses HTML_SECTION_GROUP and datastore my_url that uses URL_DATASTORE as follows:

brainstorm  ctx_ddl.create_preference('my_url','URL_DATASTORE');  ctx_ddl.set_attribute('my_url','HTTP_PROXY','www-proxy.us.oracle.com');  ctx_ddl.set_attribute('my_url','NO_PROXY','us.oracle.com');  ctx_ddl.set_attribute('my_url','Timeout','300'); terminate;  brainstorm ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1'); end;

You tin can then index your documents as follows:

create index myindex on docs(htmlfile) indextype is ctxsys.context  parameters('datastore my_url filter ctxsys.null_filter section group htmgroup');

Creating a CTXCAT Alphabetize

The CTXCAT indextype is well-suited for indexing small text fragments and related data. If created correctly, this type of index can give better structured query performance over a CONTEXT index.

CTXCAT Index and DML

A CTXCAT alphabetize is transactional. When you perform DML (inserts, updates, and deletes) on the base of operations table, Oracle automatically synchronizes the index. Unlike a CONTEXT alphabetize, no CTX_DDL.SYNC_INDEX is necessary.

Note:

Applications that insert without invoking triggers such as SQL*Loader will not result in automatic index synchronization as described in a higher place.

Near CTXCAT Sub-Indexes and Their Costs

A CTXCAT index is comprised of sub-indexes that you ascertain as office of your index gear up. You create a sub-index on one or more columns to improve mixed query performance.

However, adding sub-indexes to the index fix has its costs. The time Oracle takes to create a CTXCAT index depends on its full size, and the full size of a CTXCAT alphabetize is directly related to

full text to be indexed
number of sub-indexes in the index gear up
number of columns in the base table that make up the sub-indexes

Having many component indexes in your index set also degrades DML performance since more indexes must be updated.

Because of the added alphabetize time and disk space costs for creating a CTXCAT index, carefully consider the query performance benefit each component alphabetize gives your application before adding it to your alphabetize fix.

Creating CTXCAT Sub-indexes

An online auction site that must store detail descriptions, prices and bid-close dates for ordered expect-up provides a proficient case for creating a CTXCAT index.

Figure 2-2 Sale table schema and CTXCAT index

Text description of ccapp010.gif follows
Text clarification of the analogy ccapp010.gif

Figure 2-2 shows a tabular array called Auction with the post-obit schema:

create table auction(   item_id number, championship varchar2(100), category_id number, price number, bid_close date);

To create your sub-indexes, create an index gear up to incorporate them:

begin   ctx_ddl.create_index_set('auction_iset');

finish;

Next, make up one's mind the structured queries your awarding is probable to result. The CATSEARCH query operator takes a mandatory text clause and optional structured clause.

In our case, this ways all queries include a clause for the title cavalcade which is the text cavalcade.

Assume that the structured clauses autumn into the following categories:

Structured Clauses	Sub-index Definition to Serve Query	Category
'cost < 200' 'price = 150' 'order by price'	'price'	A
'price = 100 order by bid_close' 'order by price, bid_close'	'price, bid_close'	B

Structured Clauses

Sub-index Definition to Serve Query

Category

'cost < 200'

'price = 150'

'order by price'

'price'

'price = 100 order by bid_close'

'order by price, bid_close'

'price, bid_close'

Structured Query Clause Category A

The structured query clause contains a expression for only the price cavalcade as follows:

SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'cost < 200')> 0; SELECT FROM auction WHERE CATSEARCH(title, 'photographic camera', 'toll = 150')> 0; SELECT FROM sale WHERE CATSEARCH(title, 'camera', 'order past price')> 0;

These queries can be served using sub-alphabetize B, but for efficiency y'all can also create a sub-index only on toll, which we call sub-alphabetize A:

begin   ctx_ddl.add_index('auction_iset','toll'); /* sub-alphabetize A */

cease;

Structured Query Clause Category B

The structured query clause includes an equivalence expression for price ordered past bid_close, and an expression for ordering by price and bid_close in that guild:

SELECT FROM sale WHERE CATSEARCH(title, 'camera','price = 100 order by bid_ close')> 0; SELECT FROM sale WHERE CATSEARCH(title, 'camera','order by price, bid_ close')> 0;

These queries can be served with a sub-alphabetize defined every bit follows:

begin   ctx_ddl.add_index('auction_iset','price, bid_close'); /* sub-index B */

terminate;

Like a combined b-tree index, the column order you specify with CTX_DDL.ADD_INDEX affects the efficiency and viability of the index scan Oracle uses to serve specific queries. For example, if two structured columns p and q have a b-tree index specified equally 'p,q', Oracle cannot browse this index to sort 'lodge past q,p'.

Creating CTXCAT Index

The following case combines the examples higher up and creates the index set preference with the three sub-indexes:

begin   ctx_ddl.create_index_set('auction_iset'); ctx_ddl.add_index('auction_iset','cost'); /* sub-index A */ ctx_ddl.add_index('auction_iset','price, bid_close'); /* sub-index B */

terminate;

Figure 2-ii shows how the sub-indexes A and B are created from the auction table. Each sub-index is a b-tree alphabetize on the text cavalcade and the named structured columns. For case, sub-alphabetize A is an index on the championship column and the bid_close column.

You lot create the combined catalog alphabetize with CREATE INDEX every bit follows:

CREATE INDEX auction_titlex ON Auction(title) INDEXTYPE IS CTXCAT PARAMETERS  ('index set up auction_iset');

Creating a CTXRULE Alphabetize

You lot employ the CTXRULE index to build a document classification application. You create a table of queries and then index them. With a CTXRULE index, yous can utilise the MATCHES operator to classify single documents.

Create a Table of Queries

The first stride is to create a table of queries that define your classifications. We create a table myqueries to concur the category name and query text:

CREATE TABLE myqueries (   queryid NUMBER PRIMARY Primal, category VARCHAR2(30) query VARCHAR2(2000)

);

Populate the table with the classifications and the queries that ascertain each. For example, consider a classification for the subjects US Politics, Music, and Soccer.:

INSERT INTO myqueries VALUES(i, 'US Politics', 'democrat or republican'); INSERT INTO myqueries VALUES(2, 'Music', 'Near(music)'); INSERT INTO myqueries VALUES(3, 'Soccer', 'ABOUT(soccer)');

Using CTX_CLS.TRAIN

You can as well generate a tabular array of rules (queries) with the CTX_CLS.TRAIN process, which takes every bit input a document training set.

Create the CTXRULE Index

Use CREATE Index to create the CTXRULE index. You tin can specify lexer, storage, section grouping, and wordlist parameters if needed:

CREATE INDEX ON myqueries(query) INDEXTYPE IS CTXRULE PARAMETERS('lexer lexer_ pref storage storage_pref section group section_pref wordlist wordlist_pref');

Annotation:

The filter, memory, datastore, stoplist, and [no]populate parameters do not utilize to the CTXRULE index type.

Classifying a Document

With a CTXRULE alphabetize created on query set, yous can use the MATCHES operator to classify a document.

Assume that incoming documents are stored in the table news:

CREATE TABLE news (    newsid NUMBER, author VARCHAR2(30), source VARCHAR2(30), article CLOB);

Y'all can create a before insert trigger with MATCHES to road each document to another table news_route based on its nomenclature:

Begin   -- find matching queries   FOR c1 IN (select category                from myqueries               where MATCHES(query, :new.commodity)>0)    LOOP     INSERT INTO news_route(newsid, category)       VALUES (:new.newsid, c1.category);   End LOOP; Cease;

Alphabetize Maintenance

This department describes maintaining your index in the event of an error or indexing failure.

Viewing Alphabetize Errors

Sometimes an indexing operation might fail or not complete successfully. When the organisation encounters an fault indexing a row, it logs the error in an Oracle Text view.

You can view errors on your indexes with CTX_USER_INDEX_ERRORS. View errors on all indexes as CTXSYS with CTX_INDEX_ERRORS.

For example to view the near recent errors on your indexes, you can effect:

SELECT err_timestamp, err_text FROM ctx_user_index_errors Lodge BY err_timestamp  DESC;

To clear the view of errors, yous tin issue:

DELETE FROM ctx_user_index_errors;

Dropping an Alphabetize

Yous must drop an existing index before you can re-create it with CREATE Alphabetize.

Yous drop an index using the Drop Index command in SQL.

For example, to drop an index called newsindex, issue the post-obit SQL command:

DROP INDEX newsindex;

If Oracle cannot make up one's mind the state of the index, for example as a result of an indexing crash, y'all cannot drop the alphabetize as described to a higher place. Instead utilize:

DROP Index newsindex FORCE;

Resuming Failed Alphabetize

Y'all can resume a failed index creation operation using the ALTER Index command. You typically resume a failed alphabetize after you have investigated and corrected the index failure.

Index optimization commits at regular intervals. Therefore if an optimization operation fails, all optimization work has already been saved.

Example: Resuming a Failed Index

The following control resumes the indexing operation on newsindex with 2 megabytes of retentiveness:

ALTER Alphabetize newsindex REBUILD PARAMETERS('resume memory 2M');

Rebuilding an Index

You lot tin can rebuild a valid index using ALTER INDEX. You might rebuild an index when you lot desire to index with a new preference.

Example: Rebuilding and Index

The following command rebuilds the index, replacing the lexer preference with my_lexer.

Change Alphabetize newsindex REBUILD PARAMETERS('replace lexer my_lexer');

Dropping a Preference

You might drop a custom index preference when you no longer need it for indexing.

Y'all drop index preferences with the procedure CTX_DDL.DROP_PREFERENCE.

Dropping a preference does not affect the alphabetize created from the preference.

Encounter Also:

Oracle Text Reference to learn more than almost the syntax for the CTX_DDL.DROP_PREFERENCE process.

Example

The following code drops the preference my_lexer.

begin ctx_ddl.drop_preference('my_lexer'); end;

Managing DML Operations for a CONTEXT Index

DML operations to the base table refer to when documents are inserted, updated or deleted from the base tabular array. This section describes how you lot can monitor, synchronize, and optimize the Oracle Text CONTEXT index when DML operations occur.

Annotation:

CTXCAT indexes are transactional and thus updated immediately when there is an update to the base of operations table. Transmission synchronization every bit described in this department is not necessary for a CTXCAT index.

Viewing Awaiting DML

When documents in the base tabular array are inserted, updated, or deleted, their ROWIDs are held in a DML queue until y'all synchronize the index. You tin can view this queue with the CTX_USER_PENDING view.

For example, to view pending DML on all your indexes, outcome the following statement:

SELECT pnd_index_name, pnd_rowid, to_char(pnd_timestamp, 'dd-monday-yyyy  hh24:mi:ss') timestamp FROM ctx_user_pending;

This statement gives output in the form:

PND_INDEX_NAME                 PND_ROWID          TIMESTAMP ------------------------------ ------------------ -------------------- MYINDEX                        AAADXnAABAAAS3SAAC 06-oct-1999 15:56:l

Synchronizing the Index

Synchronizing the index involves processing all pending updates, inserts, and deletes to the base table. Yous can practise this in PL/SQL with the CTX_DDL.SYNC_INDEX procedure.

The following case synchronizes the index with two megabytes of retentiveness:

begin   ctx_ddl.sync_index('myindex', '2M');

cease;

Setting Groundwork DML

You tin can prepare CTX_DDL.SYNC_INDEX to run automatically at regular intervals using the DBMS_JOB.SUBMIT procedure. Oracle Text includes a SQL script you can apply to practise this. The location of this script is:

$ORACLE_HOME/ctx/sample/script/drjobdml.sql

To use this script, you must exist the index possessor and you must have execute privileges on the CTX_DDL package. Yous must also set the job_queue_processes parameter in your Oracle initialization file.

For example, to set the index synchronization to run every 360 minutes on myindex, you can issue the following in SQL*Plus:

SQL> @drjobdml myindex 360

Index Optimization

Frequent index synchronization tin can fragment your CONTEXT index. Alphabetize fragmentation tin can adversely affect query response time. You tin can optimize your CONTEXT index to reduce fragmentation and index size and and so amend query operation.

To understand index optimization, you must sympathise the structure of the index and what happens when it is synchronized.

CONTEXT Index Structure

The CONTEXT alphabetize is an inverted alphabetize where each word contains the list of documents that contain that word. For example, afterwards a single initial indexing performance, the word Dog might take an entry as follows:

Dog DOC1 DOC3 DOC5

Index Fragmentation

When new documents are added to the base tabular array, the index is synchronized by adding new rows. Thus if you add together a new document (DOC 7) with the word dog to the base table and synchronize the alphabetize, yous now have:

DOG DOC1 DOC3 DOC5 Domestic dog DOC7

Subsequent DML will likewise create new rows:

Dog DOC1 DOC3 DOC5 DOG DOC7 Domestic dog DOC9 Dog DOC11

Adding new documents and synchronizing the alphabetize causes index fragmentation. In item, groundwork DML which synchronizes the index frequently generally produces more fragmentation than synchronizing in batch.

Less frequent batch processing results in longer document lists, reducing the number of rows in the index and hence reducing fragmentation.

You lot tin can reduce index fragmentation by optimizing the index in either FULL or FAST mode with CTX_DDL.OPTIMIZE_INDEX.

Document Invalidation and Garbage Collection

When documents are removed from the base table, Oracle Text marks the document equally removed but does not immediately alter the index.

Because the former data takes up infinite and can cause extra overhead at query time, you lot must remove the old data from the index by optimizing it in FULL way. This is called garbage drove. Optimizing in Full style for garbage drove is necessary when you take frequent updates or deletes to the base of operations table.

Single Token Optimization

In addition to optimizing the entire index, you lot can optimize unmarried tokens. You can use token style to optimize index tokens that are ofttimes searched, without spending fourth dimension on optimizing tokens that are rarely referenced.

For case, y'all tin can specify that simply the token DOG be optimized in the index, if you lot know that this token is updated and queried frequently.

An optimized token tin improve query response time for the token.

To optimize an alphabetize in token fashion, you can use CTX_DDL.OPTIMIZE_INDEX.

Viewing Alphabetize Fragmentation and Garbage Data

With the CTX_REPORT.INDEX_STATS process, you can create a statistical written report on your index. The report includes information on optimal row fragmentation, list of most fragmented tokens, and the amount of garbage data in your index. Although this report might take long to run for large indexes, it tin can help yous decide whether to optimize your index.

Examples: Optimizing the Alphabetize

To optimize an alphabetize, Oracle recommends that y'all utilize CTX_DDL.OPTIMIZE_INDEX.

orralhas1999.blogspot.com

Source: https://docs.oracle.com/cd/B10501_01/text.920/a96517/ind.htm

How to Review the Purpose of an Index in Oracle

2 Indexing

Well-nigh Oracle Text Indexes

Structure of the Oracle Text CONTEXT Index

Merged Word and Theme Alphabetize

The Oracle Text Indexing Process

Figure 2-1 Oracle Text Indexing procedure

Datastore Object

Filter Object

Sectioner Object

Lexer Object

Indexing Engine

Partitioned Tables and Indexes

Querying Partitioned Tables

Creating an Index Online

Parallel Indexing

Limitations for Indexing

Columns with Multiple Indexes

Indexing Views

Considerations For Indexing

Type of Index

Location of Text

Document Formats and Filtering

No Filtering for HTML

Filtering Mixed Formatted Columns

Custom Filtering

Bypassing Rows for Indexing

Document Character Prepare

Mixed Character Set Columns

Document Linguistic communication

Languages Features Outside BASIC_LEXER

Indexing Multi-language Columns

Indexing Special Characters

Printjoins Grapheme

Skipjoins Graphic symbol

Other Characters

Case-Sensitive Indexing and Querying

Language Specific Features

Indexing Themes

Base of operations-Alphabetic character Conversion for Characters with Diacritical Marks

Alternating Spelling

Blended Words

Korean, Japanese, and Chinese Indexing

Fuzzy Matching and Stemming

Ameliorate Wildcard Query Performance

Document Section Searching

Stopwords and Stopthemes

Multi-Linguistic communication Stoplists

Index Operation

Query Performance and Storage of LOB Columns

Alphabetize Cosmos

Procedure for Creating a CONTEXT Alphabetize

Creating Preferences

Datastore Examples

Specifying DIRECT_DATASTORE

Specifying MULTI_COLUMN_DATASTORE

Specifying URL Data Storage

Specifying File Data Storage

NULL_FILTER Case: Indexing HTML Documents

PROCEDURE_FILTER Instance

BASIC_LEXER Instance: Setting Printjoins Characters

MULTI_LEXER Case: Indexing a Multi-Linguistic communication Tabular array

BASIC_WORDLIST Instance: Enabling Substring and Prefix Indexing

Creating Section Groups for Department Searching

Example: Creating HTML Sections

Using Stopwords and Stoplists

Multi-Linguistic communication Stoplists

Stopthemes and Stopclasses

PL/SQL Procedures for Managing Stoplists

Creating an Index

Creating a CONTEXT Index

Default CONTEXT Index Example

Custom CONTEXT Index Example: Indexing HTML Documents

Creating a CTXCAT Alphabetize

CTXCAT Index and DML

Near CTXCAT Sub-Indexes and Their Costs

Creating CTXCAT Sub-indexes

Figure 2-2 Sale table schema and CTXCAT index

Structured Query Clause Category A

Structured Query Clause Category B

2
Indexing