g-Solutions

Wildcard Searching with Oracle Text

 

A number of the queries in the application have a LIKE condition like the WHERE clause. It has been observed these search contains a double wildcard character (such as *name*). This causes severe performance issues in the application.

  

SELECT /* ORDERED */ TEXT FROM WHERE COL like '%013%'

 
 

Searching with a prefixed or post wildcard character (name or name*) causes major performance issues. This is more of an issue if the name is common. The majority of the text searches that caused system issues have at least one wildcard character "*". The root cause of the performance issues is that a FULL table or index scan is needed in most LIKE searches to find the appropriate records.

 
 

The recommendation is to use Oracle Text. Oracle Text allows these types of the queries to use special indexes.

 The application converts the "*" character to a "%" at runtime. It then executes a LIKE SQL. The following is the converted SQL for a "TODO*" search.
 
Trailing Position
 
select * from TABLE
where COLUMN like 'TODO%';
 
 
 

A query with a LIKE condition that compares a string with a trailing wildcard can make use of an index with INDEX RANGE SCAN access. This is efficient for answer sets that are a small percentage (< 1%) of the queried table and is not considered problematic. Anything greater than 1% will cause performance issues.

 
Center Position
 
select * from TABLE
where COLUMN like '%TODO%';
 
 

 A query with a LIKE condition that compares a string with a wildcard at both ends cannot make efficient use of an index, and will do either a FULL TABLE SCAN or a FULL INDEX SCAN. If a FULL TABLE SCAN is used, every block in the table is read whether it contains data or not.

 
Leading Position
 

A query with a "LIKE" condition compared to a string with a leading wildcard also uses a FULL TABLE SCAN or FULL INDEX SCAN.

 
select * from TABLE
where COLUMN like '%TODO';
 
 
 

An Oracle Text index can be added to a column to dramatically improve the application's wild card searching performance.

 

CREATE INDEX SCHEMA.INDEX_NAME

 on SCHEMA.TABLE ( COLUMN )

INDEXTYPE IS CTXSYS.TYPE;

 
 

Note that the query is more efficient when fewer rows are returned (resource consumption is static for an INDEX FULL SCAN). Also, special operators like "near" can be used for more complex conditions.

 

Oracle Text indexes can also perform case insensitive searches. Thus this eliminates the need to convert each row's column data to upper or lower case.

 
select * from TABLE
where UPPER(COLUMN) like '%TODO'
 
 
 
Where to Implement
 

Making use of the text option in a query requires a change in the syntax of the WHERE clause. The application can use a CONTAINS or CATSEARCH function rather than a LIKE condition with wildcards in a SELECT statement.

 

Sample CATSEARCH

 
select * from TABLE
where CATSEARCH(COLUMN, '%TODO' , null) > 0
 
 

These changes would need to be introduced into the application code. The potential benefit of the Oracle Text option in terms of performance would probably be greater for tables with columns using CLOB and VARCHAR2 data types with larger volumes of text data.

 
 

Oracle Text Index

Oracle Text supports indexes of several types, using CREATE INDEX.

 

The CTXCAT index type can be used for the needed text columns.

This index type is transactional, automatically updating itself after DML modifications on the base table. No scheduled synchronization jobs are needed (CTX_DDL.SYNC_INDEX). The index data will be automatically updated.

 

A benefit of automatic updates is that there will be no delay between the times the data is inserted/updated until it is available for searching. Also, there will be no need for additional jobs to be run nightly and or over the weekend to maintain the indexed data.

 

Inserts and updates of this type of data may have a longer execution (secs), but if it is assumed that the type of data changes very infrequently, then this should be acceptable. This index is larger and takes longer to build than a CONTEXT index.

 

With normal indexing, wildcard queries can sometimes expand into large word lists, degrading your query performance. Wildcard queries have better response time when token prefixes and substrings are recorded in the index. By default, token prefixes and substrings are not recorded in the Oracle Text index. Indexing tokens with prefixes and substrings will be needed for wildcard searches.

 

ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');

ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');

ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');

 

CREATE INDEX name ON AUCTION(column) INDEXTYPE IS CTXSYS.CTXCAT PARAMETERS (' WORDLIST 'mywordlist'');

 

Note that a bigger index is needed to improve wildcard searching.

 
 
Pros
 
Cons