← Academy Blog

Fundamentals about AI

Part 1

Fundamentals about AI

Understand the "Brain" before you give it orders.

What is AI and how does it "think"? (Probabilistic Reasoning)

To truly understand Artificial Intelligence (AI), you need to look under the hood. It doesn't "think" like a human or a traditional script - it calculates.

A Large Language Model (LLM) is a type of AI trained on vast amounts of text data, documents, images, videos, etc., to understand, summarize, and generate human-like content.

  • Large: They are trained on petabytes of data - almost all of the public internet, books, and GitHub codebases. It has billions of "parameters" (tiny connections like neurons in a brain).
  • Language: It understands the structure, tone, and grammar of human and programming languages. They don't process math or logic like a calculator; they process the relationships between words and code.
  • Model: It is a complex mathematical map (a neural network) that predicts what should come next in a sequence.

*How it works: It uses a "Transformer" architecture to process data. It doesn't "know" things like a human; it predicts the most likely next word in a sequence based on billions of patterns it learned during training. An LLM is a prediction engine, not a database. It doesn't "look up" facts; it generates them based on patterns.*

Before using AI, we must understand that it is Probabilistic, not Deterministic.

Probabilistic vs. Deterministic: Standard code is deterministic (Input A always yields Output B). LLMs are probabilistic (Input A yields the most likely Output B, but it can vary).

In traditional, we are used to Deterministic systems: If I click 'Delete', the record is gone. Every time.

AI is Probabilistic. It doesn't "know" facts; it calculates the statistical probability of what should come next.

  • Analogy: Think of AI as a world-class "Auto-complete." It has read almost everything on the internet and uses that experience to guess the most logical next word (or piece of code).

It predicts the most likely next step based on patterns it learned during training.

Examples of AI "Thinking":

  1. The "Next Word" Guess: If you ask an AI to "Write a test for login," it doesn't "know" what a login is. It sees "Write a test for..." and calculates that the most frequent word following that phrase in its training data is "login" or "authentication".
  2. Pattern Completion: Imagine a broken automation script. The AI looks at the surrounding code, identifies the pattern (e.g., Page Object Model), and "fills in the blanks" with code that statistically fits that structure.
  3. Contextual Weighting: AI assigns "attention" to certain words. In the prompt "Test the checkout page but ignore the payment gateway," the AI puts high mathematical weight on "checkout" and "ignore" (to filter out noise in the instructions), ensuring the output focuses only on the cart and shipping.

Major AI Solutions (LLMs)

  • ChatGPT (OpenAI)
  • Gemini (Google)
  • Claude (Anthropic)
  • DeepSeek (DeepSeek)
  • Mistral (Mistral AI)
  • Llama (Meta)

Each AI provider offers different model levels (e.g., Flash vs. Pro, or Turbo vs. Reasoning). These levels vary in resource use, speed, and logic depth.

What Can AI Be Used For?

AI is a versatile tool that can be applied to almost any task involving information processing, logic, or creativity. Its primary uses fall into these four categories:

1. Learning & Knowledge Synthesis

  • Topic Summarization: Breaking down complex technical papers, long meeting transcripts, or dense documentation into key bullet points.
  • Simplified Explanations: Using the "Explain Like I'm Five" (ELI5) method to understand new concepts, frameworks, or business domains.
  • Skill Acquisition: Acting as a 24/7 tutor for learning new languages, coding syntax, or professional methodologies.

2. Content Generation & Refinement

  • Drafting & Outlining: Creating initial versions of emails, reports, technical specifications, etc.
  • Tone & Style Adjustment: Rewriting existing text to be more professional, more persuasive, or more concise.
  • Creative Brainstorming: Generating ideas or alternative solutions to a problem.

3. Technical & Logical Tasks

  • Coding & Scripting: Writing, debugging, and explaining code.
  • Data Transformation: Converting messy data into structured formats (like turning a text list into a CSV or JSON table).
  • Mathematical Reasoning: Solving complex equations or performing statistical analysis on large datasets.

4. Analysis & Problem Solving

  • Pattern Recognition: Identifying trends in user behavior, financial data, or system logs that a human might miss.
  • Scenario Simulation: Asking "What if?" questions to see how a change in one variable might affect an entire project or system.
  • Comparison & Evaluation: Weighing the pros and cons of different tools, strategies, or architectural choices based on provided criteria.

Tokens: The Currency of AI

AI doesn't read "words." It reads Tokens.

  • What is a Token? Models split text into pieces; common words may be one token, rare or long words split into multiple tokens.
  • Approximate rule of thumb: In English, 1 token ≈ 4 characters or ~0.75 words. Exact counts vary by model and language.
  • Cost: Most vendors bill per 1,000 tokens. Prompts (input) and outputs both count.
  • Speed and latency: More tokens usually mean slower responses and higher compute load.
  • Context: Models have a "Context Window" limit (depends on a model), which is the maximum number of tokens they can remember from the conversation history. Exceed it, and content must be truncated or summarized, potentially reducing accuracy.

You can use the OpenAI Tokenizer to visualize exactly how text is broken down into chunks

AI Glitching (Hallucinating)

Imagine you are writing a text message and your phone's auto-complete suggests the next word. An LLM is just a giant version of that.

  • Probability over Fact: If the AI has seen 1,000 login pages that have a "Sign in with Facebook" button, it will likely tell you to test that button - even if our app doesn't have one. It’s not "lying"; it’s just picking the most likely word.
  • Pleasing the User: AI is designed to be helpful. If you ask a vague question, it would rather "hallucinate" an answer than say "I don't have enough information."

How to Spot a Glitch

  • The Documentation Sync: Always keep your PRD (Product Requirement Document) in another window. If the AI suggests a step that isn't in the requirements, it’s a hallucination.
  • The "Wait, How?" Test: If an AI says "Analyze the user's facial expression via webcam," ask yourself: Does our app even have access to the camera? If the answer is no, the AI is glitching.
  • Negative Consistency: Ask the AI: "Does this feature support [Feature X]?" then immediately ask "Explain why [Feature X] is NOT supported." If it agrees with you both times, it's just following your lead, not the facts.
  • “Critique the Output": Ask, "What assumptions did you make in these test cases that weren't in my original prompt?"

How to Handle a Glitch

If the AI gives you a bad test case:

  1. Don't try to "fix" its bad answer in the same chat; the "wrong" information is now in its short-term memory (context window).
  2. Provide "Grounding": Copy-paste the actual requirement into the chat and say: "Based ONLY on this text, rewrite the test cases."
  3. Correct it firmly: Say, "You suggested a Facebook login, but we do not use that. Only use Email/Password."

The "Clean Slate" Rule: When to Start a New Chat

One of the most common mistakes is using one single chat window for an entire day’s work. Treat every new task like a new investigation.

1. Why is a New Chat Necessary?

As we already know, AI models have a "Context Window" (short-term memory). Everything you’ve said in the current chat influences the next answer.

  • Context Contamination: If you spent the morning testing the "Login Page" and then start asking about the "Checkout Page" in the same chat, the AI might accidentally include login-related steps in your checkout tests.
  • Error Persistence: If the AI "hallucinates" (glitches) once in a chat, it is highly likely to repeat that mistake or stay "stubborn" about it because it's trying to be consistent with its previous (wrong) messages.
  • Token Bloat: A long chat uses more tokens. This makes the AI slower and more likely to "forget" the very first (and most important) instructions you gave it.

2. When to "Hit the Reset Button"

Start a New Chat if:

  • You change features: You're done with "Profile Settings" and moving to "Payment Gateway."
  • The AI gets "Stuck": It keeps giving you the same wrong answer even after you’ve corrected it.
  • The Chat is too long: If you’ve been scrolling for more than 3-4 screens, the "memory" is getting crowded.
  • You change roles: You were asking for "Manual Test Cases" and now you want "Bug Report Summaries."

3. The "Clean Slate" Protocol

Before starting a new chat for a specific task, follow these three steps:

  1. Close the old chat: This prevents you from accidentally clicking it later.
  2. State the Role again: Even if you told it "You are a QA/Developer" in the last chat, it doesn't remember. Every new chat is a brand-new person.
  3. Re-paste the "Source of Truth": Provide the fresh requirement document for this specific feature.

Summary

"Think of a chat window like a test environment. If the environment gets messy with old data and 'junk' from previous tests, your results won't be clean. New Task = New Chat."

Checklist for Working with AI:

Based on the article's principles, here is a checklist for using AI:

  • Identify the Goal: Is this a new feature or a different role (e.g., switching from writing test cases to a bug report)? If yes, start a new chat.
  • Define the Role: Explicitly tell the AI its role at the start of every new session (e.g., "Act as a Manual QA/Developer").
  • Provide the Source of Truth: Have I pasted the specific requirements or documentation for only this feature?
  • Apply Constraints: Have I told the AI what not to include to prevent it from guessing? (e.g., "Do not include social logins") .
  • Check for Consistency: If the output seems suspicious, have I tried running the same prompt in a second chat or a different AI model (like Gemini vs. ChatGPT) to verify?
  • Verify the Output: Have I manually checked that the AI's steps match the actual application fields and logic, rather than "hallucinated" features?

When using AI, security is critical because a single "copy-paste" of sensitive data can lead to company-wide breaches or NDA violations. Furthermore, giving AI "agentic" powers over your own PC creates a direct risk to your local data; an unreviewed automated command could accidentally delete your projects, corrupt system files, or expose your private folders.

  • Set Clear Boundaries (Sandboxing): If you grant an AI permission to modify files or run code, limit its access to a specific, isolated folder rather than your entire computer. This ensures the AI cannot accidentally change or delete important system files or personal data on your hard drive.
  • Review Before Execution: Treat all AI-generated actions—like creating test scripts, modifying documents, or running terminal commands—as drafts that require your approval. Never allow an AI to execute commands automatically without a human first verifying that the action is safe and logical.
  • The Risk of Data Leakage (Anonymization): Any sensitive information you paste into a prompt (like private API keys, real customer names, or passwords) may be used to train future versions of the model. This means your "private" data could potentially be resurfaced in a response to another user later.
  • Isolate Access Areas: When using AI tools that connect to your workspace (like Slack, Jira, or GitHub), ensure you only grant "read" permissions where possible. Avoid giving AI "delete" or "admin" privileges to prevent it from interacting with your private data or external networks.

Interactive for QAs

Since QAs focus on logic, steps, and user experience rather than code, we need to shift the exercise to something you’ll actually do daily: writing test cases.

The goal here is to show you that if you give a vague instruction, the AI "rolls the dice." Because it is probabilistic, it will invent different steps, different fields, and different priorities every time you ask.


Exercise: The "Same Prompt, Different Reality" Test

As a QA, your "source of truth" must be the requirements, not the AI’s imagination. To see why, try this experiment:

Step 1: The First Draft

Open a brand-new chat with the AI and send this exact, vague prompt:

"Write 5 manual test cases for a login page."

  • Observation: Look at the steps. Did it assume there is a "Remember Me" checkbox? Did it suggest using a phone number or an email?

Step 2: The Second Draft (The "Twin" Chat)

Open a completely new chat (do not use the same window) and send the exact same prompt again:

"Write 5 manual test cases for a login page."

Step 3: QA Comparison

Compare the two outputs side-by-side. You will likely notice:

  • Feature Drift: Chat A might include a test for "Social Login (Google/Facebook)," while Chat B ignores it completely.
  • Naming Inconsistency: One calls the field "Username," the other calls it "Email Address."
  • Priority Shift: One might mark "Incorrect Password" as High priority, while the other marks it as Medium.

Why this is a "Red Flag"

If the AI provides different results for the same prompt, it means the AI is guessing.

In manual testing, "guessing" leads to:

  1. Missing Bugs: You might test the "Forgot Password" link because Chat A suggested it, but forget to test "Account Locking" because Chat B (and your current session) didn't mention it.
  2. Inconsistent Results: If you and another trainee both use AI to test the same feature with vague prompts, you will end up with two completely different sets of tests.
  3. Hallucinated Requirements: You might spend 20 minutes looking for a "Show Password" eye icon that doesn't actually exist in our app, just because the AI thought it was a "probable" feature.

🛠️ How to Fix It: The "Standardization" Prompt

To stop the AI from rolling dice, you must ground it. A well-crafted prompt acts as a precise blueprint, ensuring the AI delivers a relevant and high-quality response rather than a generic or hallucinated one:

*Role: Act as a Senior QA Mentor and Coach.*

*Task: I am a student learning to write functional test cases. I am going to write 5 test cases for a login form with these requirements: [Username, Password, Login Button].*

*My Goal: I want to draft these cases myself, but I need you to set the "Standard of Excellence" first so I can follow best practices.*

Please provide me with:

  1. *The Professional Standard: List 3-4 "Best Practices" for writing high-quality test cases (e.g., how to write "Expected Results" clearly or the importance of atomicity).*
  2. *Boundary Reminders: Briefly explain the concepts of Positive vs. Negative testing and Boundary Value Analysis so I can apply them to my scenarios.*
  3. *The Template: Provide a clean, empty Markdown table with headers for ID, Description, Preconditions, Steps, Expected Result, Priority.*
  4. *The Challenge: Give me one "Pro-Tip" on how to catch a bug that a beginner might miss on a simple login form (without giving away a specific test case).*

*Next Step: Once you provide this guidance, I will reply with my 5 drafted test cases for you to critique.*

Exercise: Cross-Model Verification

Using different AI engines - ChatGPT and Gemini - allows you to highlight logical gaps that one model might miss.

Why is this necessary?

Each AI "thinks" differently based on its training. If you feed the same specification to both, they will find different types of errors.

  • ChatGPT (OpenAI): Generally excellent at following complex, multi-step instructions and identifying business-logic gaps.
  • Gemini (Google): Exceptional at handling very long documents (high context window) and finding small contradictions hidden deep within large text files.

How to Perform the Test (Algorithm for the Trainee):

  1. Preparation: Open two separate tabs: one for ChatGPT and one for Gemini.

The Prompt: Use the exact same prompt for both:

Act as an expert QA Mentor. I am a beginner Manual QA learning how to analyze requirements. I will provide a requirement below.

Please perform the following steps:

  • Identify 2-3 potential logical 'traps' or ambiguities in this text.
    • Suggest 3 critical test cases (1 Positive, 1 Negative, 1 Edge Case).
    • For each test case, explain WHY it is important to test this (e.g., what is the risk if we don't?).
    • Keep your explanations simple and focused on the user's perspective.

Requirements

The e-commerce site shall display a 'Countdown Timer' on the product page for all 'Flash Sale' items. The timer must show the remaining time in HH:MM:SS format. If the timer reaches 00:00:00, the 'Add to Cart' button must be automatically disabled and replaced with a 'Sale Ended' label. The discount should only apply if the user clicks 'Checkout' before the timer hits zero. If a user is already on the checkout page when the timer expires, they should still get the discount as a reward for being fast.

  1. The Comparison:
  • The "Shared" Issues: If both models point out the same problem, it is a high-priority bug in the requirements.
  • The "Unique" Finds: If Gemini finds a date-format conflict and ChatGPT finds something else, you have successfully doubled your testing coverage.
  • Spotting Hallucinations: If ChatGPT claims there is an error that doesn't exist, but Gemini confirms the text is fine, you can manually verify who is right, effectively using one AI to "fact-check" the other.

Interactive for Developers

Since developers focus on implementation, architecture, and code quality, the exercises should simulate something you will do daily: generating, reviewing, and refining code with AI.

The goal is to show that if you give vague instructions, the AI will generate different implementations, patterns, and assumptions every time. Because AI is probabilistic, it fills in missing technical details based on patterns it has seen before.


Exercise: The "Same Prompt, Different Implementation" Test

As a developer, your source of truth must be the project architecture and requirements, not the AI’s assumptions. To see why, try this experiment:

Step 1: The First Draft

Open a brand-new chat with the AI and send this exact, simple prompt:

Write a JavaScript function that validates an email address.

Observation:

Look carefully at the generated code. Did it use a regex? Did it rely on browser validation? Did it allow subdomains or "+" aliases? Did it return a boolean or an error message?


Step 2: The Second Draft (The "Twin" Chat)

Open a completely new chat (do not use the same window) and send the exact same prompt again:

Write a JavaScript function that validates an email address.


Step 3: Developer Comparison

Compare the two outputs side-by-side. You will likely notice:

Implementation Drift:

One version may use a simple regex, while another uses a long RFC-style regex.

Return Type Differences:

One function returns true/false, while another returns an object or error message.

Edge-Case Handling:

One implementation may allow addresses like [email protected], while another rejects them.

Approach Differences:

One solution may rely on built-in browser validation, while another builds its own validation logic.


Why This Is a Red Flag

If the AI produces different implementations for the same prompt, it means the AI is filling in missing technical decisions on its own.

In real development, this leads to:

Architecture Drift:

Different developers may introduce different patterns for solving the same problem.

Hidden Bugs:

One implementation might fail edge cases that another handles correctly.

Security Risks:

AI sometimes suggests outdated or unsafe validation logic.

Maintenance Problems:

Multiple implementations of the same logic make systems harder to maintain.


🛠️ How to Fix It: The "Technical Constraints" Prompt

To stop the AI from improvising, you must define strict engineering constraints.

A well-structured prompt acts like a technical specification that limits guessing.

Example prompt:

Role: Act as a Senior JavaScript Engineer and Code Reviewer.

Task: Generate a production-ready email validation utility.

Requirements: - Language: TypeScript - Return type must be boolean - No external libraries - Must support subdomains and "+" aliases - Must reject spaces and invalid domain formats

Output format: 1. Function implementation 2. Explanation of validation logic 3. Example edge-case tests

By defining constraints, the AI behaves more like a coding assistant rather than a guessing engine.


Exercise: The Refactor vs Rewrite Test

Another common mistake when developers use AI is asking it to refactor code, but the AI rewrites the logic entirely.

Step 1: The Original Code

Open a new chat and paste the following function:

function sumPrices(items) { let total = 0; for (let i = 0; i < items.length; i++) { total += items[i].price; } return total; }

Then ask:

Refactor this code to modern JavaScript.

Observation:

The AI may rewrite the function using reduce, destructuring, or arrow functions. Sometimes it will change more than expected.


Step 2: The Controlled Refactor

Now repeat the exercise with stricter instructions:

Refactor this function while following these rules: - Do NOT change the function signature - Preserve the existing logic - Only improve readability - Do NOT introduce new dependencies


Step 3: Developer Comparison

Compare the two outputs.

You will notice:

Scope Control:

The constrained prompt produces smaller, safer changes.

Logic Stability:

The original behavior remains intact.

Predictability:

The AI stops introducing unnecessary structural changes.


Why This Matters

In real projects, uncontrolled AI refactoring can:

Introduce new bugs

Break existing integrations

Change performance characteristics

Create inconsistencies in code style


Developer Pro-Tip: Use AI as a Code Reviewer

One of the most effective ways developers can use AI is simulating a senior code review.

Prompt example:

Act as a Senior Software Engineer performing a strict code review.

Analyze the following code and identify: - potential bugs - performance issues - security risks - unclear naming - missing edge cases

Explain each issue and suggest improvements.

This often helps detect problems such as:

missing null checks

inefficient loops

unsafe input handling

unhandled edge cases