Translating Manual Scorecards into AI-Driven Auto Scorecards: Expert Advice and Tips

6 min read
August 1, 2024 at 11:56 AM
Translating Manual Scorecards into AI-Driven Auto Scorecards: Expert Advice and Tips
9:14

When evaluating your contact center agents' performance, you have to tread a fine line between judging whether or not someone has done something well and remaining objective. This can sometimes lead to supervisor bias (we tend to less harshly judge the people we like) or skewed objectivity as we become tired after evaluating dozens of calls.

The beauty of AI-driven Quality Management is that it never has a bias, never gets tired, and can score all your calls automatically, making it extremely scalable. However, it cannot answer subjective and qualitative questions as accurately as a human can. It also has no insight outside of the information it is provided with. AI-driven scorecards break down actions into measurable steps, making evaluations standardized and unbiased.

To achieve highly accurate results, we need to consider a few things and carefully translate the manual evaluation questionnaire into an AI-based scorecard. Therefore, transitioning from a manual evaluation questionnaire to an AI-based scorecard involves several strategic adjustments to ensure objectivity, precision, and effective automation.

Here are key pieces of expert advice to guide this process:

1. Quantify and Standardize Evaluation Criteria

Manual scorecards often contain qualitative and subjective assessments. To translate them into an AI-based scorecard, you need to break your evaluation into quantifiable and standardized steps.

Focus on specificity over generality by replacing general questions (e.g., "Did the agent introduce themselves properly?") with specific, measurable actions (e.g., "Did the agent state their first and last name?" and "Did the agent mention the company’s name?").

Also, frame questions to yield binary responses wherever possible to eliminate ambiguity. Often, this means turning a How question (e.g., "How effectively did the agent restate the issue to confirm understanding?") into a question that can be answered with Yes or No ("Did the agent restate the caller's issue to confirm understanding?").

2. Rewrite Questions for Clarity and Precision

As anyone who has ever worked with ChatGPT, Gemini, or another Generative AI tool knows, it matters how you formulate your questions and requests. Any prompt needs to be written clearly and precisely. The same applies here. Ensure your AI scorecard questions are clear and precise by removing any potential for misinterpretation. Let's have a look at these two questions:

  • "Did the agent confirm the reason for calling?"
  • "Did the agent confirm the reason why the customer was calling?"

The second prompt is better than the first because it is more specific, clearer, and contextually understood. "Why the customer was calling" focuses on the customer's intent. It helps the AI to look for specific interactions where the agent directly addresses the customer's problem. It also indicates that the agent should confirm the customer's specific reason for contacting the center. It reduces ambiguity by explicitly stating that the agent needs to understand the customer's particular issue or need. "The reason for calling," on the other hand, is more general and can be interpreted in various ways, potentially leading to confusion or misinterpretation by the AI.

3. Ask One Question At A Time

Manual evaluation questionnaires often include checklist-type questions, like "Did the agent verify the customer's date of birth, street address, account number, and reason for calling?" While the question itself is clear and objective, AI will often provide inconsistent results with questions like these because it needs to verify four different things, and the agent will fail if even one is not completed.

However, in reality, the AI will often respond with something like this: "Yes, the agent verified the customer's date of birth but did not verify their street address, account number, or reason for calling." OR "No, the agent did not verify the customer's date of birth or street address. However, the agent did verify the customer's account number and date of birth." and awarded points, resulting in a false positive.

Questions like these should be broken down into separate questions for AI. For example:

  • Question 1: Did the agent verify the customer's date of birth?
  • Question 2: Did the agent verify the customer's street address?
  • Question 3: Did the agent verify the customer's account number?
  • Question 4: Did the agent verify the reason why the customer was calling in?

This way, you guarantee that the AI checks for every point on your checklist and significantly increases the probability that you get a correct result.

4. Show the AI What "Good" Looks Like

Artificial Intelligence can identify a specific dog breed because it has seen and classified millions of pictures of dogs. Similarly, AI needs clear indicators to evaluate performance. By adding specific examples of phrases or keywords that align with desired behaviors, you can help the AI make the right decision on whether your agent meets your quality criteria or not.

For example, the question "Did the agent greet the customer in a professional and friendly manner?" is highly subjective. Provide the AI with phrases associated with specific actions. Here is an example: "Did the agent use phrases indicating politeness or positive engagement, such as 'please,' 'thank you,' 'glad to help,' 'happy to assist,' etc.?"

Another example is whether or not the agent expressed empathy. By adding examples such as "I’m sorry to hear that happened to you" or "I can see why this situation is frustrating for you" to your prompt, you provide clear decision guidelines. 

5. Incorporate Context and Process Details

Many scorecards include questions like "Did the agent follow the processes provided?" The AI Scorecard can only accurately answer if it has all the information. In this case, it cannot make that decision based on the call recording transcript alone; it needs to know what the specific processes relevant to the evaluation look like. 

If possible, describe the step in an example or description section to give the AI the required context, leading to more accurate and relevant assessments.

Image: Screenshot of the MiaRec Form Designer. This shows how you can easily create a scorecard question and provide the AI with additional information for context in the description.

6. Allow for Contextual Flexibility

Not all customer conversations follow an exact script. In fact, most don't. Some are disconnected, interrupted, or based on a misunderstanding. In any case, to accurately evaluate all relevant customer interactions, we need to account for some questions that are not applicable to avoid false positives or negatives. In other words, we need to include N/A options where necessary. 

7. Optimize and Customize Prompts

Great prompts are rarely written in the first iteration. It often needs some optimizing and customizing to get it exactly right. Here is where a Prompt Designer is incredibly useful. A Prompt Designer allows you to write and test a prompt in your live environment without impacting your reporting or analytics. It is like a sandbox environment to try out the effectiveness of your prompts. The neat thing is that you can apply all the tips above and run one prompt, tweak it a bit, run it again, and see how the results vary. This will yield much more accurate results tailored to your organization’s context.

In the video below, I walk you through the steps on how you can optimize and customize your prompts within Miarec so you are getting the most accurate and consistent results:

 

8. Regularly Review and Update

Just as your manual scorecards should be reviewed and updated regularly, so should your AI-based ones. Based on feedback and evolving AI capabilities, continuously improve your AI prompts and processes. Our tip: Start with one section, refine it, and measure effectiveness before moving on to the next.

Conclusion

Although translating manual evaluation questionnaires into AI-based scorecards is relatively easy and straightforward, it does require a somewhat systematic approach to ensure accuracy and objectivity. You can create effective automated scorecards by quantifying actions, defining specific keywords, rewriting questions for clarity, and leveraging AI insights. At the same time, regular reviews and updates ensure continuous improvement and alignment with evolving AI capabilities.

Stay tuned for one of our upcoming articles covering how to run scorecards conditionally against calls based on specific metadata. For example, if you just created a sales scorecard, you can run it against ONLY my sales calls longer than two minutes.  This way, you have optimized your entire QA process end-to-end.

If you are currently transitioning your manual evaluation questionnaires into AI-based scorecards, download the comprehensive guide below. It will take you through each section of a scorecard and show you by example how to translate it easily and efficiently. 

Get Email Notifications