Feedback
Did this article resolve your question/issue?

   

Article

CONTAINS operator on WORD-INDEX using full stop '.' finds more results than expected

Information

 
TitleCONTAINS operator on WORD-INDEX using full stop '.' finds more results than expected
URL NameCONTAINS-operator-on-WORD-INDEX-using-full-stop-finds-more-results-than-expected
Article Number000110737
EnvironmentProduct: OpenEdge
Version: 11.x
OS: All supported platforms
Question/Problem Description

CONTAINS operator on WORD-INDEX using full stop '.' finds more results than expected

The contains operator can be used with wildcards, like the . and *.  A customer has reported that when a query contains multiple points (period or full stop characters), the results are unexpected.  Why are both records a match for the word-index?

The period (.) should only match one character, so why are both records found?

DEF TEMP-TABLE ttest NO-UNDO
    FIELD veld AS CHAR
    INDEX keyveld IS WORD-INDEX veld .

CREATE ttest.
ttest.veld = 'Ruiter.J.de'.

CREATE ttest.
ttest.veld = 'Ruiter.A.H.J.de'.

FOR EACH ttest WHERE ttest.veld CONTAINS 'Ruiter.j.de':U:
    DISPLAY ttest.veld FORMAT "X(20)".
END.
User-added image


 

Steps to Reproduce
Clarifying Information
Error Message
Defect/Enhancement Number
Cause
This is expected behavior.  The period, full stop (.) character is not a wildcard character.  In the default word break files the definition of this character attribute is:

'.', BEFORE_DIGIT, /* part of a word only if followed by a digit */

Which means that this character is a normal character that is only part of a word if it is followed by a digit. If it isn't followed by a digit then it is a word delimiter. In the example code the '.' is never followed by a digit so is never considered part of a word, and therefore is a word delimiter.  Because the words 'Ruiter', 'j', and 'de' appear in both records of the temp table veld field, both records are displayed.  

To illustrate further in the following code the last record is not selected by the FOR EACH statement because the '.' is followed immediately by a digit and is therefore part of the word 'J.3': 
 
DEF TEMP-TABLE ttest NO-UNDO
    FIELD veld AS CHAR 
    INDEX keyveld IS WORD-INDEX veld .

CREATE ttest.
ttest.veld = "Ruiter.J". /* 2 words */

CREATE ttest.
ttest.veld = "Ruiter A H J de". /* 5 words */

CREATE ttest.
ttest.veld = "Ruiter.A.H.J.de". /* 5 words */

CREATE ttest.
ttest.veld = "Ruiter.J.de". /* 3 words */

CREATE ttest.
ttest.veld = "Ruiter.J.". /* 2 words */

CREATE ttest.
ttest.veld = "Ruiter.J.3". /* 2 words - Ruiter & J.3 */

FOR EACH ttest WHERE ttest.veld CONTAINS "Ruiter.j":U:
    DISPLAY ttest.veld FORMAT "X(20)".
END.
User-added image
Resolution
The period (.) character is not a wildcard.  It is a word delimiter unless it is followed immediately by a digit, in which case it becomes part of a word.  
Workaround
Notes
Last Modified Date4/9/2018 11:00 AM
Attachment 
Files
Disclaimer The origins of the information on this site may be internal or external to Progress Software Corporation (“Progress”). Progress Software Corporation makes all reasonable efforts to verify this information. However, the information provided is for your information only. Progress Software Corporation makes no explicit or implied claims to the validity of this information.

Any sample code provided on this site is not supported under any Progress support program or service. The sample code is provided on an "AS IS" basis. Progress makes no warranties, express or implied, and disclaims all implied warranties including, without limitation, the implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample code is borne by the user. In no event shall Progress, its employees, or anyone else involved in the creation, production, or delivery of the code be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample code, even if Progress has been advised of the possibility of such damages.