hello@audit9.com

Agentforce Testing Centre (Beta)

By Claire Adams

The Agentforce Testing Centre (Beta) offers new testing features that replace the legacy testing approach.

In this post, we’ll explore the latest enhancements to Agentforce Testing Center (Beta), including custom evaluation criteria, inline editing, AI output visibility, and new ways to monitor agent quality over time.

Note: Salesforce provides the following statement 'Note Agentforce Testing Center in Agentforce Studio is a pilot or beta service that is subject to the Beta Services Terms at Agreements - Salesforce.com or a written Unified Pilot Agreement if executed by Customer, and the Non-GA Credit Consumption, Non-GA Gen AI, and the Non-GA Open AI LLM Provider terms in the Product Terms Directory. Use of this pilot or beta service is at the Customer's sole discretion.'

The new  features available are:

  • Define your own Evaluation Criteria.
  • Make changes to tests with inline editing.
  • View agent inputs and outputs
  • Assess the health of AI Agents
  • Clone Test Suites

Let's look at each one in turn, which you can access through the App Launcher and search for Agentforce studio. If there is not an option to select the Agentforce Studio, you will need to log a support case.

Define Evaluation Criteria

For Salesforce help regarding this feature, the following link provides further information.

In summary, you can now define your own evaluations which enables you to ensure your AI outputs reflect your brand, meet quality standards and convey the correct point of view.

Default Evaluations include:

  • Response Evaluation - Evaluates if the agent's response matches the desired response
  • Subagent Evaluation - Evaluates the agent's ability to select the correct subagent.
  • Actions Evaluation - Evaluates the agent's ability to select the correct actions.

Response Quality Evaluation:

  • Completeness - Evaluates if the response includes all necessary information.
  • Coherence - Evaluates if the response is easy to read and free of grammatical errors.
  • Conciseness - Evaluates if the response is short but accurate.
  • Latency - Measure the length of time, in milliseconds, it takes for the agent to generate a response.

You can also setup your own custom Evaluation. Here's how it works:

  1. Create a new Test Suite.
  2. Enter the Basic Information, Test Conditions and Test Data.
  3. Within Evaluations, select the 'Add Custom' button.
  4. Select the LLM Judge. (An LLM judge (or LLM-as-judge) is when one large language model (LLM) evaluates the outputs of another.)
  5. Give the Evaluation a name and then click 'Next'.
  6. On the next step you can select an existing evaluation prompt, or create a new one.
  7. You can then set the threshold from 0 - 5. Any value greater than or equal to this threshold will be marked as PASS during testing.
  8. Select Save.
Agentforce Testing Centre - Evaluations

Inline Editing

For Salesforce help regarding this feature, the following link provides further information.

Previously, if you wanted to make any edits to the test suites, you would need to download the csv file, edit and then re-upload. Now, you can make changes in real time using inline editing.

The following provides an example of how you can do this:

  1. Navigate to Agentforce Studio (from App Launcher, search for Agentforce Studio).
  2. Select a Test Suite.
  3. Double click within a cell.
  4. Make any amendments you want to make.
  5. Click away from the cell. The changes will be saved.
Agentforce Testing Centre - Inline Editing

View AI Agent Inputs and Outputs

For Salesforce help regarding this feature, the following link provides further information.

One of the biggest challenges when testing AI agents is understanding how an output was generated. Agentforce Testing Center now provides visibility into both the agent inputs and outputs, making it easier to troubleshoot unexpected responses and validate behaviour.

The following provides an example of how to do this:

  1. Navigate to Agentforce Studio (from App Launcher, search for Agentforce Studio).
  2. Select a Test Suite.
  3. Within a Test Case, select the 'Agent Response' column.
  4. The Output Preview displays how the Agent comes up with the test result.

Assess the health of AI Agents

For Salesforce help regarding this feature, the following link provides further information.

You can compare the results of your tests over time, ensuring the AI agents are providing the same level of quality. This is a great way of ensuring the accuracy and quality of the tests being run. When a Test Suite is run, the Run History is provided by a link in a separate tab.

The following provides an example of how to do this:

  1. Navigate to Agentforce Studio.
  2. Select a Test Suite and then select to Run the test.
  3. Once complete, select the Run History tab.

Clicking on the link provides a summary of the Tests executed, the pass and fails and the reasons for the results. These can be used to compare with future test runs.

Agentforce Testing Centre -Run History

Clone Test Suite

Within the Test Suite, you can select the dropdown, top right, and then clone the Test Suite.

Further Resources

The following provides links to further Salesforce resources that discuss how to create Test Cases and Test Suites in the Agentforce Testing Centre (Beta).

Final Thoughts

The latest enhancements to Agentforce Testing Center (Beta) provide Salesforce teams with more control, visibility, and flexibility when testing AI agents.

Features such as custom evaluation criteria, inline editing, and historical run comparisons make it significantly easier to validate AI behaviour and maintain response quality over time.

As Agentforce continues to evolve, these testing capabilities will become increasingly important for organisations looking to deploy reliable and scalable AI experiences within Salesforce.

Related Posts