Call Scoring: Manual vs Keyword-Based vs Generative AI-based

12 min read
October 7, 2023 at 4:43 PM

"Did the agent say their name?", "Did they get consent for recording?", "Did the agent resolve the matter on the first call?" and many other questions are standard call scoring questions that contact center supervisors try to answer every day.

With hundreds or even thousands of calls to get through, making call scoring for quality assurance as efficient as possible is key. With advances in generative AI and Voice Analytics, automated call scoring is quickly becoming the preferred method for many contact center managers. But there still is a place and time for manual call scoring. In fact, the best contact center managers use all three methods in conjunction, with each playing to their strengths.

In this blog post, we'll explore three main call scoring methods for quality assurance (manual, keyword-based, and  Generative AI-based automatic call scoring), explore their pros and cons, and provide best practices when using any of the approaches so that you can make an informed decision on which one works best for your organization's needs.

Topics Covered In This Article:

New call-to-action

Manual Call Scoring for Quality Assurance

Manual call scoring is the most basic form of quality assurance and refers to the process of evaluating customer service calls by listening to call recordings, assessing the performance of the agent, and assigning scores based on predetermined criteria.

It involves human judgment and expertise and does not require any Voice Analytics or AI-based tools. Manual call scoring requires trained personnel familiar with the criteria used for evaluation, such as customer satisfaction, politeness, problem resolution accuracy, etc.

Use Cases:

  • Suitable for organizations with a low volume of calls to review.
  • Used for specific call scenarios or issues when a detailed, human touch is required (these can be identified by Auto QA)

Benefits & Challenges Of Manual Call Scoring

There are two distinct benefits of doing manual call scoring compared to more automated methods: First, direct human judgment provides higher levels of accuracy for the specific calls reviewed. Secondly, there is no reliance on technology, which means there's no risk of technical errors in the assessment.

But manual call scoring also has its disadvantages. It is time-consuming, as each recording must be carefully listened to to assess. Additionally, there is potential bias when assessing calls due to personal preferences or other factors influencing an evaluator’s opinion about a particular interaction. Lastly, scaling up this process can be difficult if a company has many customers or large volumes of calls every day.

Using MiaRec Agent Evaluation To Support Your Manual Call Scoring

While it could be done on a clipboard or in an Excel Sheet in its simplest form, MiaRec's Agent Evaluation functionality, which is part of the Quality Management module of MiaRec's Conversational Intelligence Platform, allows contact center managers to evaluate call recordings using customizable forms. These forms are based on predetermined criteria and integrated into the call recording detail screen.

The supervisor is supported by various software features, e.g., they can speed up the call while listening. After the supervisor evaluates the call, MiaRec will automatically calculate the score and create an evaluation report for this call, including a score expressed in percentages and color-coded reasoning for the scores. This allows you to instantly understand which sections had problems and which were done well. In addition, you can track and report on your agents' and teams' performance over time.

Evaluation_Manual Sample Form

Screenshot of a customizable MiaRec Agent Evaluation form filled out

In summary, manual call scoring for quality assurance is very time-consuming and prone to human error, making it necessary to explore automated solutions. However, manual call scoring won't disappear. The idea of Auto QA is to score or analyze calls en masse, freeing up humans to focus on more detailed reviews on specific calls.

Keyword-Based Call Scoring for Quality Assurance

Keyword-based call scoring for quality assurance is a process that uses artificial intelligence (AI) algorithms to analyze calls for specific keywords or phrases based on preset syntax expressions. 

Use Cases:

  • Identifying simple call scenarios that can easily be identified based on trigger words, e.g., if a voicemail answered
  • Evaluating calls for criteria that require the use of specific phrases or words, such as reading compliance statements and script training

Screenshots (6)

Screenshot of  side-by-side view the customizable MiaRec Agent Evaluation form with outcomes and transcript with sentiment and topic analysis


Benefits & Challenges Of Keyword-Based Call Scoring For Quality Assurance

Keyword-based call scoring offers many benefits over manual evaluations as it can evaluate 100% of your calls effectively for simple use cases where specific keywords are indicative of the call's content. This is especially useful to check if compliance statements have been read or call scripts are adhered to.

For example, to answer the question "Did the agent notify the caller that the call is being recorded?", the system would search the following key phrases in a call transcript: "This call may be monitored", "This call is being recorded", "You are on a recorded line", etc. Such a list of all variants of key phrases can be long.

The more advanced systems, like MiaRec, support advanced query expressions, like "call NEAR (recorded OR monitored)",  which can match many variants of the key phrase at once. 

Despite its many advantages over traditional methods, there are some potential challenges associated with automated call scoring systems that should be taken into consideration before implementing one in your organization.

  • Human language is complex, ambiguous, and context-dependent. The same meaning can be conveyed through a variety of phrases and word choices. The keyword-based auto-evaluation requires constant maintenance to ensure it encompasses all potential key phrases and keywords.
  • The keyword-based matching technology possesses a rather narrow lens when delving into call transcripts. Its comprehension skills are confined to a single sentence or, at most, a couple of sentences. This implies that it fails short in addressing inquiries such as "Was the agent friendly and professional?", which demands and understanding of the entire conversation.
  • Accuracy of keyword-based evaluation is significantly depend upon the accuracy of the transcript. Even minor transcription errors can cause inaccuracy in scoring, thereby undermining the reliability of the assessment.
  • Incorrect setup or data entry could lead to errors in the system's evaluations, which could have negative impacts on overall quality assurance efforts if not addressed promptly and correctly.
  • Depending on how they are designed and implemented, these systems may also contain certain biases that could affect the accuracy of their results if not adequately monitored by management teams regularly.
  • Automated Call Scoring requires an investment in Voice Analytics software, such as MiaRec as well as highly accurate Speech-to-Text transcription.

Therefore, it is important to ensure proper setup and monitoring of automated call scoring systems to avoid any potential issues arising from incorrect implementation or bias.

Most of these shortcomings can be resolved by using Generative AI-based auto evaluation, discussed below.

Despite the noted drawbacks, the keyword-based evaluation performs effectively in scenarios where agents adhere to a script and calls exhibit a high degree of uniformity.

Using MiaRec Auto Scorecard For Automated Quality Assurance 

One example of keyword-based call scoring is MiaRec's Auto Score Card, an AI-driven automatic call scoring feature powered by MiaRec's Voice Analytics module within its Conversational Intelligence platform.

It is very easy and quick to set up, although it does require some work and diligence to set up the keywords and phrases using our well-documented expression syntax:

MiaRec Auto Score Card Admin

Screenshot of how to set up the Auto Score Card within the MiaRec Form Designer.

Based on the predefined criteria, MiaRec will look for keywords and phrases associated with the call scoring criteria. For example, one criterion is thanking the caller for calling today. As you can see from the screenshot below, the agent used the prescribed script and got a good score for that segment. Each criterion can be weighted, allowing you to emphasize more important aspects over less important ones.


Screenshot of the MiaRec Auto Score Card with custom script adherence configurations showing the evaluation results

MiaRec's Auto Score Card is highly effective when used with sentiment scores. The screenshot below depicts a different example where the agent had to place the customer on hold. The positive language used is reflected in a good agent sentiment score. 

Screenshots (1)

Screenshot of the Voice Analytics tab of a call record with positive agent sentiment due to the positive language used

In summary, keyword-based call scoring may still be useful in some scenarios, but it has limitations due to the unstructured nature of conversations.

Generative AI-Based Call Scoring for Quality Assurance (Auto QA)

The third method of Auto QA employs Generative AI, particularly large language models, to analyze telephone conversations. Large language models demonstrate superior comprehension skills, enabling the system to assess the call it is entirely. This recent advancement in technology marks a significant step in automated quality assurance and analysis of customer interactions.

Use Cases:

  • Analyzing long and complex conversations where context is crucial.
  • Answering questions that require understanding the entire conversation, such as gauging customer satisfaction or agent performance.

Compared to keyword-based auto evaluation, Generative AI-based technology has 

A key difference between the Generative AI-based and keyword-based auto evaluation is the following:

Instead of scanning for pre-defined keywords or keyphrases within a call transcript, the Generative AI model is provided with a transcript alongside a list of scoring questions phrased naturally. For example, a question might be, "Based on the provided call transcript, did the agent ask for the name of the caller at the beginning of the conversation?". The AI is capable of answering this question based on the essence of the conversation rather than relying on the presence of specific words or phrases. In some cases, the AI can even respond with "Not applicable, the caller mentioned their name at the outset of the conversation, saying 'Hi, this is David Schmidt.'"

Benefits & Challenges Of Generative AI-Based Call Scoring (Auto QA) For Quality Assurance

Automatic call scoring offers many benefits: 

  • Can analyze the entire context of a conversation, not just specific keywords or phrases.
  • Configuration is easier as it uses natural language without the need for complex programming or keyword configurations.
  • Can answer more complex questions about a call, like whether an issue was resolved or if an agent introduced themselves properly.
  • Improved visibility into your direct customer interactions and the capability to score calls efficiently and at scale. 
  • Automatic call scoring reduces costs associated with manual evaluation and reduces the need for staff members and resources dedicated to manual QA workflows.
  • With the exception of some possible lag caused by the speech-to-text transcription process, which can take a few minutes, scores are generated within seconds. 
  • Automated systems are typically more accurate than manual methods due to their reliance on algorithms rather than human judgment, which may be biased or subjective at times.
  • Auto QA saves time and resources but also generally helps make the supervisor's job much more enjoyable because there is no need for an individual person or team to tediously listen through each conversation.

There are a few challenges to consider with Auto QA. For example, like keyword-based QA, it requires an investment in Voice Analytics software, such as MiaRec, as well as highly accurate Speech-to-Text transcription. 

Using MiaRec's Auto QA For Automated Quality Assurance 

Imagine being able to write an easy prompt using natural language to get an assessment of whether the issue has been resolved on the first call or if the agent was professional and courteous. That's where MiaRec Auto QA comes in.

MiaRec's Auto QA, which takes advantage of Generative AI, gives companies the opportunity to capture the maximum benefit from their customer interactions. With the addition of Sentiment Analysis and Topical Analysis, Auto QA enhances the quantity and quality of insights that can be gained from conversations, providing a more thorough view of customer service agent performance and customer encounters.


In summary, generative AI is the future of QA in MiaRec. It comes ready to use and is equipped with basic questionnaires/scorecards to get you started. Because it is based on generative AI, it is much easier to customize. 

New call-to-action

How They Work Together

While the historical progression of the three methods represents drastic functional advancements for each step, these tools are the most powerful if they are used complementary.

While AI, especially with advancements in Generative AI and natural language processing, showcases promising strides in accurately evaluating agent performance, the human touch embodies an irreplaceable nuance and understanding of contextual subtleties. AI can evaluate 100% of interactions in contact center, which can significantly optimize the evaluation process. However, the empathetic understanding, experiential judgement, and adaptive feedback that human evaluators offer are crucial for the holistic development of agents.

Moreover, humans can discern the emotional tone and underlying customer sentiments in a way that AI might not fully grasp. The integration of AI could augment the evaluation process, making it more data-driven and timely, yet the comprehensive insight provided by human evaluators remains pivotal.

The AI tools, especially the generative AI, can pre-score calls, allowing human evaluators to focus on calls that need the most attention. Also, MiaRec put feedback loops in place, allowing users to provide feedback on the AI's assessments, which can be used to improve the system's accuracy.


In conclusion, when it comes to manual vs. keyword-based vs. automatic call scoring for quality assurance, there are pros and cons to each approach.

Manual call scoring is time-consuming but sometimes provides more accurate results as the agent's performance can be evaluated in detail. While keyword-based call scoring offers a more efficient way of assessing agents and is able to provide detailed feedback on their performance, it requires experience working with syntax expressions and is limited to certain scenarios. Generative AI-based call scoring is the most advanced and efficient option, requiring minimal effort from managers and agents. This allows contact centers to evaluate agent performance at scale.

MiaRec offers a range of QA tools, from manual to AI-driven, each with its strengths and use cases. However, rather than choosing one approach, the vision is to have these tools work in harmony, leveraging the speed and breadth of AI with the depth and nuance of human judgment.

Call Scoring for Quality Assurance FAQs

How Is Quality Assurance Measured In A Contact Center?

Quality assurance in a call center is measured by analyzing customer interactions to identify areas of improvement. This can be done through automated quality management systems, which record calls and use voice analytics to score the conversation using pre-defined criteria.

These systems also provide feedback on agent performance, allowing managers to monitor and improve the overall quality of service provided by their contact center. Additionally, compliance officers can use these recordings for audit purposes, and customer service teams can review them to ensure that agents are following company policies and procedures.

What Makes A Great Quality Assurance Scorecard?

Creating a robust call scorecard for agent evaluation in a contact center is essential for maintaining a high level of customer satisfaction and agent performance. Here are several key elements that contribute to an effective call scorecard:

  1. Clear Objectives: Have well-defined objectives for what you want to measure and improve, be it customer satisfaction, resolution rate, adherence to protocols, or other operational metrics.
  2. Behavioral Indicators: Include behavioral metrics such as tone, courtesy, empathy, and the ability to understand and address customer needs effectively.
  3. Consistency: Ensure that evaluation criteria are consistent across all agents to maintain fairness and transparency in the evaluation process.
  4. Feedback Mechanism: Include a structured feedback mechanism to provide agents with constructive feedback, aiming to promote continuous improvement and development.
  5. Accessibility: Make sure the scorecard is easily accessible and understandable to all agents so they know what is expected of them and how they are being evaluated.
  6. Technology Utilization: Employ technology such as call recording, analytics, and AI to provide a comprehensive view of agent performance, while also saving time and resources in the evaluation process.
  7. Customization: Tailor the scorecard to reflect the unique needs and objectives of your contact center, keeping in mind the industry standards, customer expectations, and company goals.
  8. Continuous Improvement: Regularly review and update the scorecard based on feedback from agents, supervisors, and changing business needs to ensure it remains relevant and effective.

What Percentage Of Calls Should Be Scored For Quality Assurance?

The exact percentage of calls that should be evaluated depends on the size and scope of your contact center, as well as the nature of customer interactions. Generally speaking, a good starting point is to have at least 5 calls reviewed for quality assurance purposes for each agent weekly. This can help ensure that customer service standards are being met and that any issues are identified quickly so they can be addressed promptly.

However, evaluating 100% of your calls gives you a complete and, therefore, much more accurate picture. Additionally, having automated Voice Analytics tools in place can help identify trends or conversation patterns that may require further investigation or additional training for agents.


New call-to-action

New call-to-action

Get Email Notifications