Knowledge Foundation – Search Tuning
In my last post I talked about the common issues in implementing a knowledgebase and how some of these pitfalls can be avoided using Oracle B2C Service Knowledge Management. This article will be looking at some of the ways that words can influence how the knowledgebase can be used, by both your staff and your customers.
Words, script, text, data, info, term
The words and the language used in your answers are an important part of the knowledgebase. It is just as important as the content.
As I mentioned in my last post, a common problem with knowledgebases is that content writers use internal or industry terms - which are not the same as the terms that are used by their customers.
As well as the range of analytics included in Oracle B2C Service, there are multiple methods that can be used to enhance the search.
Keywords
In the Answer workspace, the Content Manager can add a list of keywords - see the highlighted section in the screenshot above. These words are given the greatest weight when returning the search results.
Typical keywords are synonyms and abbreviations of the salient point of the knowledge article, but can also include common misspellings which could distort the returned content.
For example, if you have an Answer for the question “How do I flag a technical issue?” keywords could include:
problem, support, help, fault, glitch, doesn’t work
When the end user searches for “My flux capacitor is faulty” they will be presented with the technical issue Answer.
Keywords also look at the stem of the words used therefore you do not need to add multiple keywords with different suffixes. in the case above the word ‘fault’ covers:
fault, faulty, faults
Keywords are also limited to 25 characters, therefore it would be worthless using ‘floccinaucinihilipilification’ as a keyword.
Thesaurus Words
The thesaurus.txt file is a text file containing a list of synonyms which means that you do not have to add as many keywords to your answers. This file applies to your entire knowledgebase.
The format of the thesaurus file is:
Searched Word, Alternative 1, Alternative 2, etc.
There must be one entry per line, comma delimited, no punctuation and be entered in block capitals.
Continuing with the Technical Issues Answer above you can add an entry in the thesaurus file for ‘Fault’:
FAULT, ISSUE, PROBLEM, ERROR, BROKEN
This will return all answers in the system that have the words, fault, faults, faulty, issue, issues, problem, problems, error, errors, broken, etc. within the answer.
However, thesaurus keywords do have some limitations:
- If the user has searched using the word ‘Faulty’ the synonyms list is only defined for the word ‘fault’ and not for this specific word, so will only return answers containing ‘fault, faulty, faults’.
- The synonym has also not been defined for the word PROBLEM, meaning that if the user searches using this word only Answers containing Problem and its stem are returned. You would need to add another line in the thesaurus.txt file:
PROBLEM, FAULT, ISSUE, ERROR, BROKEN
Alias Words
Like the thesaurus, these are words which can be specific to your organisation or have multiple variants.
As with the thesaurus the format of the alias file is:
Searched Word, Alternative 1, Alternative 2, etc.
Again, there must be one entry per line, comma delimited, no punctuation and be entered in block capitals.
Using the example from my earlier post with mobile phone terms, we can use the word NETWORK as an alias for the telephony terms:
NETWORK, GSM900, GSM1800, GSM1900, EDGE, 3G, LTE, LTE-A, LTE-U, 4G, 5G
This would mean that searching for network issues would return answers under GSM900, 5G, etc.
The Alias file is also used where different answers can contain different spellings of the same word:
E-MAIL, EMAIL
EMAIL, E-MAIL
The alias file would pull all answers for emails irrespective if ‘e-mail’ or ‘email’ were used in the answer or search term.
Word Exclusions
The exclude_answers.txt prevents the system from searching words that are common in a lot of your Answers, thus diluting the search.
The format of this file is one word per row, entered in block capitals:
In our example above, the words ‘My’ and ‘Is’ are excluded from the search as they are common words.
The Answer Itself
In addition to the keywords and the wordlists, the answer itself is included in the search with the weighting being applied according to the following order:
- Keywords (Highest)
- Summary Field
- Question Field
- Answer Field (Lowest)
The more times words used in the search appear within the components of the answer (as shown above), the higher it will be placed in the search results.
Conclusion
This blog has shown how Answer keywords are best used, and how the thesaurus, aliases, and exclusion files can be used across your entire knowledgebase, helping your customers and staff to find what they need. If you’d like more detail then please get in touch with us.