Every QA team has that one person now. In today’s AI in software testing landscape, it’s the one who pastes a feature description into ChatGPT, copies the output into Jira, and calls it done. Their test coverage looks great on paper. Their sprint metrics are green. And yet, the payment flow broke in production because nobody thought about what “test the checkout” actually means when your app supports three payment providers, two currencies, and a proration logic that kicks in mid-billing-cycle.
AI is a genuinely powerful tool for testers. But powerful tools misused just break things faster — and in QA, that means bugs in production.
According to the World Quality Report 2025–2026, published by Sogeti, Capgemini, and OpenText, the industry’s largest annual study on quality engineering, the adoption of Gen AI in testing has accelerated sharply, yet most organisations still lack a structured approach to using it effectively. That gap is where the problems live. This guide is about bridging it.
1. How to Write Better AI Prompts for Test Case Generation

AI doesn’t know your system. It knows patterns from everything it was trained on. Ask it something generic, and you get something generic back.
Bad prompt:
“Write test cases for a notifications feature.”
You’ll get the standard five: notification appears, notification is dismissed, notification redirects correctly, unread count updates, and maybe one about permission denied. Technically not wrong. Almost certainly incomplete.
Better prompt:
“Write test cases for an in-app notification system. Users receive notifications for: comment replies, status changes on test cases, and billing events. Notifications are real-time via WebSocket and are also sent as emails. Users can configure per-notification preferences. The app has both free and paid tiers. Paid users get email digests, free users don’t. Include edge cases for: WebSocket disconnection, notification preferences not saved, duplicate notifications, and notification behaviour when a user is logged in on two devices.”
The output is now actually useful. It accounts for the real states your system can be in, not just the happy path.
Think of AI like a smart new joiner on the team. The more context you give in the brief, the less time you spend fixing their work.
2. Use AI to Speed Up the Boring Stuff, Not Replace Thinking
AI is excellent at generating boilerplate. Test case scaffolding, repetitive boundary data, API request bodies, all of this can be drafted in seconds. The thinking part of what edge cases matter for this specific feature is still yours.
Live example:
You’re testing a file upload feature. Users can upload CSV reports up to 10MB. Generating 15 different test files by hand takes a good chunk of your morning.
Prompt AI:
“Generate 12 CSV test files for a report upload feature. Include: valid file under 10MB, file exactly at 10MB, file at 10.1MB, empty file, file with headers only and no data rows, file with missing required column ‘test_case_id’, file with duplicate row IDs, file with special characters in text fields, file with UTF-8 encoding, file with Windows line endings (CRLF), file with a mix of valid and invalid rows, and a file where numeric fields contain text values.”
Done in under a minute. You still need to verify the files match your actual validation logic, but the groundwork is laid. The time you saved goes into actually running the tests and observing behaviour, which is where testers add real value.
3. Watch Out for AI Hallucinations (This One Burns Teams)

This doesn’t get talked about enough in the QA community. AI confidently produces test assertions that are wrong. Not slightly off, fundamentally wrong. And because the output looks clean and professional, it slips through.
A real scenario:
You’re testing a subscription upgrade flow. A user moves from Free to Pro mid-cycle. You ask AI to help write assertions for the API response.
AI generates:
“Assert that the response returns HTTP 200 with { “status”: “upgraded”, “effective_date”: “<today>” }”
Your actual API returns HTTP 202 (accepted, async processing), and effective_date is the next billing cycle start date, not today. The test passes every single time. But the assertions were never validating the right thing.
AI doesn’t know your API contracts. It guesses based on common patterns it has seen before. This is especially dangerous in API testing, where wrong assertions create false confidence.
The fix is simple: Always cross-reference AI-generated assertions against your actual API documentation, OpenAPI specs, or a quick Postman run. Never assume the response structure it generates is correct for your system.
4. Using AI to Write Better Bug Reports
This is one of the most practical and underused applications of AI in QA. Most bug reports written in a hurry are vague, missing reproduction steps, or lack the context that developers actually need to investigate.
Take a rough note and let AI structure it into something that saves everyone time.
Before (what you typed at 5 PM on a Friday):
“Export button broken on the reports page, keeps loading, checked in Chrome.”
Prompt:
“Turn this into a structured bug report: Export button on reports page just spins and never downloads anything. Tested in Chrome. Steps: go to Reports > click Export CSV > button shows loading spinner > nothing downloads after 2 minutes. Expected: CSV file downloads. Actual: spinner keeps going, no download, no error message shown.”
After (AI output you edit and file):
Title: Export CSV button on Reports page hangs indefinitely, no file downloaded, and no error shown
Environment: Chrome 123, staging environment
Steps to Reproduce:
Navigate to the Reports page
Click the “Export CSV” button
Observe the loading spinner
Expected Result: CSV file downloads within a few seconds
Actual Result: Loading spinner continues indefinitely. No file is downloaded. No error message is displayed to the user.
Severity: Medium core reporting feature blocked for all users
The developer gets context they can actually use. You filed the bug in two minutes instead of ten. That’s the right use of AI.
5. Generating Gherkin Test Scenarios with AI
A lot of agile QA teams write acceptance tests in Gherkin (Given/When/Then format). AI is very good at this, but again, only if you tell it what your feature actually does.
Prompt:
“Write Gherkin scenarios for a team member invitation feature. A workspace admin can invite users by email. Invited users receive an email with an accept link valid for 48 hours. If the user already has an account, they are added directly. If they don’t, they go through a signup flow first. An admin can also resend or revoke pending invitations.”
AI Output (that you then review and adjust):
Scenario: Admin invites a new user successfully
Given I am logged in as a workspace admin
When I enter "newuser@example.com" in the invite field and click Send
Then the user should receive an invitation email
And the invitation status should show "Pending"
Scenario: Invited user with existing account accepts invitation
Given a user with "existing@example.com" already has an account
When they click the invite link in their email
Then they should be added to the workspace without going through the signup
Scenario: Invitation link expires after 48 hours
Given an invitation was sent 49 hours ago
When the invited user clicks the link
Then they should see an "Invitation expired" message
And they should be offered an option to request a new invite
Scenario: Admin revokes a pending invitation
Given there is a pending invitation for "pending@example.com"
When the admin clicks Revoke
Then the invitation should be removed from the pending list
And the invite link should no longer be valid
6. AI Tools Worth Knowing for Software Testers
Not all AI tools are the same. Here’s a practical breakdown of what actually works for different QA tasks:
| Tool | Best For in QA | Watch Out For |
|---|---|---|
| ChatGPT / GPT-4o | Test case drafting, edge case brainstorming, bug report structuring | Generic output without detailed context |
| GitHub Copilot | AI pair programmer for writing, explaining, and refactoring all types of code, including test automation scripts in Playwright, Cypress, and Selenium | Outdated API usage in generated scripts |
| Claude | Analysing long PRDs, contracts, and large test plans | Less precise for code-specific tasks |
| Gemini | Google Workspace integration, quick lookups | Responses can be inconsistent with long context windows; less reliable for complex multi-step coding tasks compared to Copilot |
| Mabl / Testim | AI-powered end-to-end test maintenance | Needs initial setup investment |
| Applitools | Visual regression testing with AI comparison | Overkill for non-UI-heavy apps |
7. AI in CI/CD Pipelines — Where It’s Actually Useful Now

AI is starting to show up not just in how testers write tests, but in how pipelines run and report them.
A few things are happening right now that are worth knowing:
- Flaky test detection — Tools like Buildkite and some GitHub Actions integrations can now flag tests that pass and fail non-deterministically, cluster them, and suggest whether they’re environment issues or race conditions. This used to be a manual investigation.
- Failure triage — Instead of parsing 300 lines of a failed build log, AI can summarise what broke, what the likely cause is, and which recent commits are connected. Teams using this are cutting investigation time significantly.
- AI-assisted test selection — Rather than running the full regression suite on every PR, some pipelines now use AI to predict which tests are most relevant to the changed code. Faster feedback loops, same coverage confidence.
You don’t need to implement all of this tomorrow. But knowing it exists means you can have the right conversation with your engineering team when sprint planning comes around.
8. The 5 Most Common Mistakes Testers Make with AI
If you’re going to share one section from this blog, share this one.
- Copying AI output without reading it. The output looks structured and professional, so it feels done. It’s not. Read every line before it goes into Jira or your test suite.
- Using generic prompts for specific systems. “Write test cases for the dashboard” is not a prompt. It’s a request for guesswork. Describe your system, your users, and your edge cases.
- Trusting AI with business logic. Prorated billing, custom permission models, multi-tenant data isolation — AI doesn’t know your rules. It’ll generate tests based on the most common version of similar features. Your feature might not be the most common version.
- Using AI to skip exploratory testing. AI generates test cases based on requirements. It can’t catch the behaviour that only shows up when you actually use the product. Exploratory testing is irreplaceable.
- Not iterating on the prompt. If the first output isn’t great, don’t give up — improve the context. Add constraints, add examples, tell it what you didn’t want. The second and third prompts are almost always better than the first.
9. What You Should Never Hand to AI
Some things should stay in the hands of the tester, full stop:
- Go/no-go decisions before a release — AI can summarise test results. It cannot make a judgment call about business risk.
- Exploratory testing — This requires curiosity, intuition, and knowing your users. No prompt captures that.
- Security testing strategy — AI knows OWASP Top 10. It doesn’t know your threat model, your infrastructure, or which data is most sensitive in your specific context.
- Risk-based test prioritisation — What to test most critically before a release is a judgement that requires understanding business impact, user behaviour, and technical complexity together. That’s a human job.
The Bottom Line
In AI in software testing, an average tester becomes faster at being average, while a sharp tester becomes significantly more productive. The difference comes down to whether you’re using AI to amplify your thinking or replace it.
The testers who look foolish with AI are the ones who copy-paste outputs without reading them, generate test cases without knowing their own system, and treat AI confidence as a substitute for correctness.
The testers who come out ahead are the ones who know exactly what question to ask, verify what comes back, and use the time saved to do the real work — understanding the system, talking to developers, and catching the classes of bugs that never show up in any template.
AI doesn’t know your app. You do. Keep it that way.
Tools Worth Bookmarking
- ChatGPT — Test case drafting, bug report writing, Gherkin generation
- GitHub Copilot — Automation scripting in Playwright / Cypress
- Claude by Anthropic — Analysing large PRDs and test plans
- Applitools — AI-powered visual regression testing
- Mabl — End-to-end test automation with AI maintenance
Conclusion:
- Always give AI context about your system, not just the feature name
- Use it for speed on repetitive work: test data, bug report structuring, Gherkin scenarios
- Never trust AI-generated assertions without checking against your actual API
- AI hallucinates confidently — verify before you file or automate
- Keep exploratory testing, risk decisions, and go/no-go calls as human responsibilities
The real value of AI in software testing comes from how you use it — the best testers use AI to think more, not less.
Click here to read more blogs like this
0Top-Tier SDET | Advanced in Manual & Automated Testing | Skilled in Full-Spectrum Testing & CI/CD | API & Mobile Automation | Desktop App Automation | ISTQB Certified

