AI in Software Testing: Why Thinking Still Matters More Than Tools

AI in Software Testing: Why Thinking Still Matters More Than Tools

Every QA team has that one person now. In today’s AI in software testing landscape, it’s the one who pastes a feature description into ChatGPT, copies the output into Jira, and calls it done. Their test coverage looks great on paper. Their sprint metrics are green. And yet, the payment flow broke in production because nobody thought about what “test the checkout” actually means when your app supports three payment providers, two currencies, and a proration logic that kicks in mid-billing-cycle.

AI is a genuinely powerful tool for testers. But powerful tools misused just break things faster — and in QA, that means bugs in production.

According to the World Quality Report 2025–2026, published by Sogeti, Capgemini, and OpenText, the industry’s largest annual study on quality engineering, the adoption of Gen AI in testing has accelerated sharply, yet most organisations still lack a structured approach to using it effectively. That gap is where the problems live. This guide is about bridging it.

1. How to Write Better AI Prompts for Test Case Generation

AI in Software Testing

AI doesn’t know your system. It knows patterns from everything it was trained on. Ask it something generic, and you get something generic back.

Bad prompt:

“Write test cases for a notifications feature.”

You’ll get the standard five: notification appears, notification is dismissed, notification redirects correctly, unread count updates, and maybe one about permission denied. Technically not wrong. Almost certainly incomplete.

Better prompt:

“Write test cases for an in-app notification system. Users receive notifications for: comment replies, status changes on test cases, and billing events. Notifications are real-time via WebSocket and are also sent as emails. Users can configure per-notification preferences. The app has both free and paid tiers. Paid users get email digests, free users don’t. Include edge cases for: WebSocket disconnection, notification preferences not saved, duplicate notifications, and notification behaviour when a user is logged in on two devices.”

The output is now actually useful. It accounts for the real states your system can be in, not just the happy path.

Think of AI like a smart new joiner on the team. The more context you give in the brief, the less time you spend fixing their work.

2. Use AI to Speed Up the Boring Stuff, Not Replace Thinking

AI is excellent at generating boilerplate. Test case scaffolding, repetitive boundary data, API request bodies, all of this can be drafted in seconds. The thinking part of what edge cases matter for this specific feature is still yours.

Live example:

You’re testing a file upload feature. Users can upload CSV reports up to 10MB. Generating 15 different test files by hand takes a good chunk of your morning.

Prompt AI:

“Generate 12 CSV test files for a report upload feature. Include: valid file under 10MB, file exactly at 10MB, file at 10.1MB, empty file, file with headers only and no data rows, file with missing required column ‘test_case_id’, file with duplicate row IDs, file with special characters in text fields, file with UTF-8 encoding, file with Windows line endings (CRLF), file with a mix of valid and invalid rows, and a file where numeric fields contain text values.”

Done in under a minute. You still need to verify the files match your actual validation logic, but the groundwork is laid. The time you saved goes into actually running the tests and observing behaviour, which is where testers add real value.

3. Watch Out for AI Hallucinations (This One Burns Teams)

AI in software testing

This doesn’t get talked about enough in the QA community. AI confidently produces test assertions that are wrong. Not slightly off, fundamentally wrong. And because the output looks clean and professional, it slips through.

A real scenario:

You’re testing a subscription upgrade flow. A user moves from Free to Pro mid-cycle. You ask AI to help write assertions for the API response.

AI generates:

“Assert that the response returns HTTP 200 with { “status”: “upgraded”, “effective_date”: “<today>” }”

Your actual API returns HTTP 202 (accepted, async processing), and effective_date is the next billing cycle start date, not today. The test passes every single time. But the assertions were never validating the right thing.

AI doesn’t know your API contracts. It guesses based on common patterns it has seen before. This is especially dangerous in API testing, where wrong assertions create false confidence.

The fix is simple: Always cross-reference AI-generated assertions against your actual API documentation, OpenAPI specs, or a quick Postman run. Never assume the response structure it generates is correct for your system.

4. Using AI to Write Better Bug Reports

This is one of the most practical and underused applications of AI in QA. Most bug reports written in a hurry are vague, missing reproduction steps, or lack the context that developers actually need to investigate.

Take a rough note and let AI structure it into something that saves everyone time.

Before (what you typed at 5 PM on a Friday):

“Export button broken on the reports page, keeps loading, checked in Chrome.”

Prompt:

“Turn this into a structured bug report: Export button on reports page just spins and never downloads anything. Tested in Chrome. Steps: go to Reports > click Export CSV > button shows loading spinner > nothing downloads after 2 minutes. Expected: CSV file downloads. Actual: spinner keeps going, no download, no error message shown.”

After (AI output you edit and file):

Title: Export CSV button on Reports page hangs indefinitely, no file downloaded, and no error shown

Environment: Chrome 123, staging environment

Steps to Reproduce:

Navigate to the Reports page
Click the “Export CSV” button
Observe the loading spinner

Expected Result: CSV file downloads within a few seconds

Actual Result: Loading spinner continues indefinitely. No file is downloaded. No error message is displayed to the user.

Severity: Medium core reporting feature blocked for all users

The developer gets context they can actually use. You filed the bug in two minutes instead of ten. That’s the right use of AI.

5. Generating Gherkin Test Scenarios with AI

A lot of agile QA teams write acceptance tests in Gherkin (Given/When/Then format). AI is very good at this, but again, only if you tell it what your feature actually does.

Prompt:

“Write Gherkin scenarios for a team member invitation feature. A workspace admin can invite users by email. Invited users receive an email with an accept link valid for 48 hours. If the user already has an account, they are added directly. If they don’t, they go through a signup flow first. An admin can also resend or revoke pending invitations.”

AI Output (that you then review and adjust):

Scenario: Admin invites a new user successfully
Given I am logged in as a workspace admin
When I enter "newuser@example.com" in the invite field and click Send
Then the user should receive an invitation email
And the invitation status should show "Pending"

Scenario: Invited user with existing account accepts invitation
  Given a user with "existing@example.com" already has an account
  When they click the invite link in their email
  Then they should be added to the workspace without going through the signup

Scenario: Invitation link expires after 48 hours
  Given an invitation was sent 49 hours ago
  When the invited user clicks the link
  Then they should see an "Invitation expired" message
  And they should be offered an option to request a new invite

Scenario: Admin revokes a pending invitation
  Given there is a pending invitation for "pending@example.com"
  When the admin clicks Revoke
  Then the invitation should be removed from the pending list
  And the invite link should no longer be valid

6. AI Tools Worth Knowing for Software Testers

Not all AI tools are the same. Here’s a practical breakdown of what actually works for different QA tasks:

ToolBest For in QAWatch Out For
ChatGPT / GPT-4oTest case drafting, edge case brainstorming, bug report structuringGeneric output without detailed context
GitHub CopilotAI pair programmer for writing, explaining, and refactoring all types of code, including test automation scripts in Playwright, Cypress, and SeleniumOutdated API usage in generated scripts
ClaudeAnalysing long PRDs, contracts, and large test plansLess precise for code-specific tasks
GeminiGoogle Workspace integration, quick lookupsResponses can be inconsistent with long context windows; less reliable for complex multi-step coding tasks compared to Copilot
Mabl / TestimAI-powered end-to-end test maintenanceNeeds initial setup investment
ApplitoolsVisual regression testing with AI comparisonOverkill for non-UI-heavy apps

7. AI in CI/CD Pipelines — Where It’s Actually Useful Now

AI in Software Testing

AI is starting to show up not just in how testers write tests, but in how pipelines run and report them.

A few things are happening right now that are worth knowing:

  • Flaky test detection — Tools like Buildkite and some GitHub Actions integrations can now flag tests that pass and fail non-deterministically, cluster them, and suggest whether they’re environment issues or race conditions. This used to be a manual investigation.
  • Failure triage — Instead of parsing 300 lines of a failed build log, AI can summarise what broke, what the likely cause is, and which recent commits are connected. Teams using this are cutting investigation time significantly.
  • AI-assisted test selection — Rather than running the full regression suite on every PR, some pipelines now use AI to predict which tests are most relevant to the changed code. Faster feedback loops, same coverage confidence.

You don’t need to implement all of this tomorrow. But knowing it exists means you can have the right conversation with your engineering team when sprint planning comes around.

8. The 5 Most Common Mistakes Testers Make with AI

If you’re going to share one section from this blog, share this one.

  1. Copying AI output without reading it. The output looks structured and professional, so it feels done. It’s not. Read every line before it goes into Jira or your test suite.
  2. Using generic prompts for specific systems. “Write test cases for the dashboard” is not a prompt. It’s a request for guesswork. Describe your system, your users, and your edge cases.
  3. Trusting AI with business logic. Prorated billing, custom permission models, multi-tenant data isolation — AI doesn’t know your rules. It’ll generate tests based on the most common version of similar features. Your feature might not be the most common version.
  4. Using AI to skip exploratory testing. AI generates test cases based on requirements. It can’t catch the behaviour that only shows up when you actually use the product. Exploratory testing is irreplaceable.
  5. Not iterating on the prompt. If the first output isn’t great, don’t give up — improve the context. Add constraints, add examples, tell it what you didn’t want. The second and third prompts are almost always better than the first.

9. What You Should Never Hand to AI

Some things should stay in the hands of the tester, full stop:

  • Go/no-go decisions before a release — AI can summarise test results. It cannot make a judgment call about business risk.
  • Exploratory testing — This requires curiosity, intuition, and knowing your users. No prompt captures that.
  • Security testing strategy — AI knows OWASP Top 10. It doesn’t know your threat model, your infrastructure, or which data is most sensitive in your specific context.
  • Risk-based test prioritisation — What to test most critically before a release is a judgement that requires understanding business impact, user behaviour, and technical complexity together. That’s a human job.

The Bottom Line

In AI in software testing, an average tester becomes faster at being average, while a sharp tester becomes significantly more productive. The difference comes down to whether you’re using AI to amplify your thinking or replace it.

The testers who look foolish with AI are the ones who copy-paste outputs without reading them, generate test cases without knowing their own system, and treat AI confidence as a substitute for correctness.

The testers who come out ahead are the ones who know exactly what question to ask, verify what comes back, and use the time saved to do the real work — understanding the system, talking to developers, and catching the classes of bugs that never show up in any template.

AI doesn’t know your app. You do. Keep it that way.

Tools Worth Bookmarking

  • ChatGPT — Test case drafting, bug report writing, Gherkin generation
  • GitHub Copilot — Automation scripting in Playwright / Cypress
  • Claude by Anthropic — Analysing large PRDs and test plans
  • Applitools — AI-powered visual regression testing
  • Mabl — End-to-end test automation with AI maintenance

Conclusion:

  • Always give AI context about your system, not just the feature name
  • Use it for speed on repetitive work: test data, bug report structuring, Gherkin scenarios
  • Never trust AI-generated assertions without checking against your actual API
  • AI hallucinates confidently — verify before you file or automate
  • Keep exploratory testing, risk decisions, and go/no-go calls as human responsibilities

The real value of AI in software testing comes from how you use it — the best testers use AI to think more, not less.

Click here to read more blogs like this

Zero Code, Zero Headache – How to do Manual Testing with Playwright MCP?

Zero Code, Zero Headache – How to do Manual Testing with Playwright MCP?

Manual Testing with Playwright MCP – Have you ever felt that a simple manual test should be less manual?

For years, quality assurance relied on pure human effort to explore, click, and record. But what if you could perform structured manual and exploratory testing, generate detailed reports, and even create test cases—all inside your Integrated Development Environment (IDE), using zero code

I’ll tell you this: there’s a tool that can help us perform manual testing in a much more structured and easy way inside the IDE: Playwright MCP. 

Section 1: End the Manual Grind – Welcome to AI-Augmented QA 

The core idea is to pair a powerful AI assistant (like GitHub Copilot) with a tool that can control a real browser (Playwright MCP). This simple setup is done in only a few minutes. 

The Essential Setup for Manual Testing with Playwright MCP: Detailed Steps

  • For this setup, you will integrate Playwright MCP as a tool that your AI agent can call directly from VS Code. 

1. Prerequisites (The Basics) 

  • VS Code installed in your system. 
  • Node.js (LTS version recommended) installed on your machine. 

2. Installing GitHub Copilot (The AI Client) 

  • Open Extensions: In VS Code, navigate to the Extensions view (Ctrl+Shift+X or Cmd+Shift+X). 
  • Search and Install: Search for “GitHub Copilot” and “GitHub Copilot Chat” and install both extensions. 
Manual testing Copilot
  • Authentication: Follow the prompts to sign in with your GitHub account and activate your Copilot subscription. 
    • GitHub Copilot is an AI-powered code assistant that acts almost like an AI pair programmer

        After successful installation and Authentication, you see something like below  

Github Copilot

3. Installing the Playwright MCP Server (The Browser Tool) 

Playwright MCP (Model Context Protocol): This is the bridge that provides browser automation capabilities, enabling the AI to interact with the web page. 

  • The most direct way to install the server and configure the agent is via the official GitHub page: 
  • Navigate to the Source: Open your browser and search for the Playwright MCP Server official GitHub page (https://github.com/microsoft/playwright-mcp)
  • The One-Click Install: On the GitHub page, look for the Install Server VSCode button. 
Playwright MCP Setup
  • Launch VS Code: Clicking this button will prompt you to open Visual Studio Code. 
VS Code pop-up
  • Final Step: Inside VS Code, select the “Install server” option from the prompt to automatically add the MCP entry to your settings. 
MCP setup final step
  • To verify successful installation and configuration, follow these steps: 
    • Click on “Configure Tool” icon 
Playwright Configuration
  • After clicking on the “configure tool “ icon, you see the tools of Playwright MCP as shown in the below image. 
Playwright tool
Settings Icon
  • After clicking on the “Settings” icon, you see the “Configuration (JSON)” file of Playwright MCP, where you start, stop, and restart the server as shown in image below 
{
    "servers": { 
        "playwright": { 
            "command": "npx", 
            "args": [ 
                "@playwright/mcp@latest" 
            ], 
            "type": "stdio" 
        } 
    }, 
    "inputs": [] 
} 

1. Start Playwright MCP Server: 

Playwright MCP Server

After the Playwright MCP Server is successfully configured and installed, you will see the output as shown below. 

Playwright MCP Server

2. Stop and Restart Server

Playwright MCP Start Stop Restart Server

This complete setup allows the Playwright MCP Server to act as the bridge, providing browser automation capabilities and enabling the GitHub Copilot Agent to interact with the web page using natural language. 

Section 2: Phase 1: Intelligent Exploration and Reporting 

The first, most crucial step is to let the AI agent, powered by the Playwright MCP, perform the exploratory testing and generate the foundational report. This immediately reduces the tester’s documentation effort. 

Instead of manually performing steps, you simply give the AI Agent your test objective in natural language. 

The Exploration Workflow: 

  1. Exploration Execution: The AI uses discrete Playwright MCP tools (like browser_navigate, browser_fill, and browser_click) to perform each action in a real browser session. 
  2. Report Generation: Immediately following execution, the AI generates an Exploratory Testing Report. This report is generated on the basis of the exploration, summarizing the detailed steps taken, observations, and any issues found. 

Our focus is simple: Using Playwright MCP, we reduce the repetitive tasks of a Manual Tester by automating the recording and execution of manual steps. 

Execution Showcase: Exploration to Report 

Input (The Prompt File for Exploration) 

This prompt directs the AI to execute the manual steps and generate the initial report. 

Prompt for Exploratory Testing

Exploratory Testing: (Use Playwright MCP) 

Navigate to https://www.demoblaze.com/. Use Playwright MCP Compulsory for Exploring the Module <Module Name> and generate the Exploratory Testing Report in a .md file in the Manual Testing/Documentation Directory.

Output (The Generated Exploration Report) 
The AI generates a structured report summarizing the execution. 

Exploratory Testing Report

Live Browser Snapshot from Playwright MCP Execution 

Live Browser

Section 3: Phase 2: Design, Plan, Execution, Defect Tracking 

Once the initial Exploration Report is generated, QA teams move to design specific, reusable assets based on these findings. 

1. Test Case Design (on basis of Exploration Report) 

The Exploration Report provides the evidence needed to design formal Test Cases. The report’s observations are used to create the Expected Results column in your CSV or Test Management Tool. 

  • The focus is now on designing reusable test cases, which can be stored in a CSV format
  • These manually designed test cases form the core of your execution plan. 
  • We need to provide the Exploratory Report for References at the time of design test Cases.  
  • Drag and drop the Exploratory Report File as context as shown in the image below.
Drag File
Dropped File

Input (Targeted Execution Prompt) 

This prompt instructs the AI to perform a single, critical verification action from your Test Case.

Role: Act as a QA Engineer. 
Based on Exploratory report Generate the Test cases in below of Format of Test Case Design Template 
======================================= 
🧪 TEST CASE DESIGN TEMPLATE For CSV File 
======================================= 
Test Case ID – Unique identifier for the test case (e.g., TC_001) 
Test Case Title / Name – Short descriptive name of what is being tested 
Preconditions / Setup – Any conditions that must be met before test execution 
Test Data – Input values or data required for the test 
Test Steps – Detailed step-by-step instructions on how to perform the test 
Expected Result – What should happen after executing the steps 
Actual Result – What happened (filled after execution) 
Status – Pass / Fail / Blocked (result of the execution) 
Priority – Importance of the test case (High / Medium / Low) 
Severity – Impact level if the test fails (Critical / Major / Minor) 
Test Type – (Optional) e.g., Functional, UI, Negative, Regression, etc. 
Execution Date – (Optional) When the test was executed 
Executed By – (Optional) Name of the tester 
Remarks / Comments – Any additional information, observations, or bugs found 

Output (The Generated Test cases) 

The AI generates structured test cases. 

Test Case Design

2. Test Plan Creation 

  • The created test cases are organized into a formal Test Plan document, detailing the scope, environment, and execution schedule. 

Input (Targeted Execution Prompt) 

This prompt instructs the AI to perform a single, critical verification action from your Test Case. 2

Role: Act as a QA Engineer.
- Use clear, professional language. 
- Include examples where relevant. 
- Keep the structure organized for documentation. 
- Format can be plain text or Markdown. 
- Assume the project is a web application with multiple modules. 
generate Test Cases in Form Of <Module Name >.txt in Manual Testing/Documentation Directory  
Instructions for AI: 
- Generate a complete Test Plan for a software project For Our Test Cases 
- Include the following sections: 
  1. Test Plan ID 
  2. Project Name 
  3. Module/Feature Overview 
  4. Test Plan Description 
  5. Test Strategy (Manual, Automation, Tools) 
  6. Test Objectives 
  7. Test Deliverables 
  8. Testing Schedule / Milestones 
  9. Test Environment 
  10. Roles & Responsibilities 
  11. Risk & Mitigation 
  12. Entry and Exit Criteria 
  13. Test Case Design Approach 
  14. Metrics / Reporting 
  15. Approvals 

Output (The Generated Test plan) 

The AI generates structured test plan of designed test cases. 

Test Plan

3. Test Cases Execution 

This is where the Playwright MCP delivers the most power: executing the formal test cases designed in the previous step. 

  • Instead of manually clicking through the steps defined in the Test Plan, the tester uses the AI agent to execute the written test case (e.g., loaded from the CSV) in the browser. 
  • The Playwright MCP ensures the execution of those test cases is fast, documented, and accurate. 
  • Any failures lead to immediate artifact generation (e.g., defect reports). 

Input (Targeted Execution Prompt) 

This prompt instructs the AI to perform a single, critical verification action from your Test Case. 

Use Playwright MCP to Navigate “https://www.demoblaze.com/” and Execute Test Cases attached in context and Generate Test Execution Report.

First, Drag and drop the test case file for references as shown in the image below.

Test case file

Live Browser Snapshot from Playwright MCP Execution

Nokia Execution

Output (The Generated Test Execution report) 

The AI generates structured test execution report of designed test cases. 

Test Execution Report

4. Defect Reporting and Tracking  

If a Test Case execution fails, the tester immediately leverages the AI Agent and Playwright MCP to generate a detailed defect report, which is a key task in manual testing. 

Execution Showcase: Formal Test Case Run (with Defect Reporting) 

We will now execute a Test Case step, intentionally simulating a failure to demonstrate the automated defect reporting capability. 

Input (Targeted Execution Prompt for Failure) 

This prompt asks the AI to execute a check and explicitly requests a defect report and a screenshot if the assertion fails. 

Refer to the test cases provided in the Context and Use Playwright MCP to execute the test, and if there is any defect, then generate a detailed defect Report. Additionally, I would like a screenshot of the defect for evidence.
Playwright MCP to Execute the test

Output (The Generated Defect report and Screenshots as Evidence) 

The AI generates a structured defect report of designed test cases. 

Playwright Defect Report
Playwright MCP output file evidence

Conclusion: Your Role is Evolving, Not Ending 

Manual Testing with Playwright MCP is not about replacing the manual tester; it’s about augmenting their capabilities. It enables a smooth, documented, and low-code way to perform high-quality exploratory testing with automated execution. 

  • Focus on Logic: Spend less time on repetitive clicks and more time on complex scenario design. 
  • Execute Instantly: Use natural language prompts to execute tests in the browser. 
  • Generate Instant Reports: Create structured exploratory test reports from your execution sessions. 
  • Future-Proof Your Skills: Learn to transition seamlessly to an AI-augmented testing workflow. 

It’s time to move beyond the traditional—set up your Playwright MCP today and start testing with the power of an AI-pair tester!