5.4 — Penetration Testing

Testing & Verification 90 min QA & Security

← 5.3 UAT & Acceptance Testing 5.5 Testing AI-Generated Code →

0:00 / 0:00

Listen instead

Penetration Testing

0:00 / 0:00

Learning Objectives

✓ Explain CIS 16.13 requirements for authenticated and unauthenticated penetration testing.
✓ Differentiate between black box, gray box, and white box penetration testing and select the appropriate approach.
✓ Execute a structured penetration test following the OWASP PTES seven-phase methodology.
✓ Produce professional penetration test reports with technical findings, business impact, and remediation guidance.
✓ Evaluate AI-powered penetration testing tools and understand their capabilities and limitations.
✓ Design a bug bounty program that complements formal penetration testing.

1. CIS Control 16.13 — Conduct Application Penetration Testing

CIS Safeguard 16.13 is an Implementation Group 3 (IG3) control:

“Conduct application-level penetration testing. Testing should include both authenticated and unauthenticated testing, be focused on business logic vulnerabilities, and include manual testing by skilled testers, with critical applications prioritized.”

This control contains several specific mandates worth unpacking:

Authenticated AND unauthenticated: Testing only from the outside (unauthenticated) misses the vast majority of attack surface. Most vulnerabilities are exploitable by authenticated users — privilege escalation, IDOR, business logic bypass, horizontal access. Testing must include both perspectives.

Business logic vulnerabilities: Automated scanners find injection, XSS, and misconfiguration. They do not find “user can approve their own expense report,” “coupon code can be applied after checkout,” or “negative quantity order results in a refund.” Business logic testing requires human understanding of the application’s purpose.

Manual testing by skilled testers: Automated tools are necessary but insufficient. Skilled penetration testers find vulnerabilities that no tool can — creative attack chains, context-dependent logic flaws, and novel exploitation techniques.

Critical applications prioritized: Not all applications need the same testing depth. A public-facing financial application gets comprehensive annual testing. An internal wiki gets a lighter assessment. Risk-based prioritization ensures resources are allocated where they have the greatest impact.

2. Types of Penetration Testing

2.1 Black Box Testing

Knowledge: No prior knowledge of the application. No source code, no architecture docs, no credentials.

Simulates: External attacker with no insider knowledge.

Strengths:

Most realistic simulation of an external threat actor.
Tests the application’s externally visible attack surface.
Tests information disclosure (what can an attacker learn from error messages, headers, and publicly accessible resources?).

Weaknesses:

Least thorough. The tester spends significant time on reconnaissance that could be skipped with documentation.
Misses vulnerabilities behind authentication barriers (unless separately tested).
Inefficient use of expensive tester time.

Best for: Compliance requirements that specify “external penetration test,” initial assessment of unknown applications, simulating realistic threat scenarios.

2.2 Gray Box Testing

Knowledge: Partial knowledge — typically valid credentials for one or more roles, architecture documentation, and API specifications.

Simulates: Insider threat, compromised user account, or attacker who has gained initial access.

Strengths:

Most realistic for authorized testing. Most attackers have some information (from OSINT, phishing, or purchased credentials).
Tests the full attack surface: unauthenticated endpoints AND authenticated functionality.
Efficient use of tester time — less time on reconnaissance, more time on exploitation.

Weaknesses:

Requires coordination to provide appropriate access and documentation.
Findings may be influenced by the provided information (tester follows the documentation rather than discovering actual behavior).

Best for: Most penetration tests. Gray box provides the optimal balance of realism, thoroughness, and efficiency.

2.3 White Box Testing

Knowledge: Full knowledge — source code, architecture documents, deployment configurations, database schemas, credentials for all roles.

Simulates: Malicious insider with full system access, or comprehensive security audit.

Strengths:

Most thorough. Every attack surface is known and can be tested.
Testers can identify vulnerabilities in code paths that may be difficult to reach through black box approaches.
Combines the depth of code review with the proof-of-exploitability of penetration testing.

Weaknesses:

Least realistic. Attackers rarely have source code access (though supply chain attacks change this).
Most expensive. Reviewing source code and testing application takes more time.
Can overwhelm testers with information, leading to unfocused testing.

Best for: High-security applications (financial, healthcare, government), applications processing highly sensitive data, applications with complex business logic.

3. OWASP PTES — Seven Phases

The Penetration Testing Execution Standard provides a structured methodology. Following it ensures consistent, comprehensive, and professional testing.

Phase 1: Pre-Engagement Interactions

Everything that happens before testing begins. This phase prevents legal issues, scope disputes, and miscommunication.

Deliverables:

Scope document: Exactly what is in scope (URLs, IP ranges, API endpoints, mobile apps) and what is out of scope (third-party services, production databases, other customers’ data).
Rules of engagement: What testing techniques are permitted (social engineering? DoS testing? Physical access?), testing windows (business hours only? Weekends?), notification requirements.
Authorization letter: Written, signed authorization from the asset owner granting permission to test. Without this, penetration testing is a crime in most jurisdictions.
Communication plan: Primary contacts on both sides, escalation procedures, emergency stop protocol.
Data handling requirements: How will sensitive data discovered during testing be handled? Encrypted storage, secure deletion after engagement, access limited to the testing team.

Phase 2: Intelligence Gathering

Systematic collection of information about the target. For gray/white box tests, this supplements provided documentation with independently discovered information.

Passive reconnaissance (no direct interaction with the target):

DNS records (subdomains, mail servers, TXT records)
WHOIS data (registration, hosting, contacts)
Certificate Transparency logs (all TLS certificates ever issued for the domain)
Search engine cache (indexed pages, cached credentials, exposed documents)
Public code repositories (GitHub, GitLab — accidentally committed secrets, API keys, internal URLs)
Social media and job postings (technology stack, internal tools, team structure)

Active reconnaissance (direct interaction with the target):

Port scanning and service enumeration (Nmap)
Web application fingerprinting (technology stack, frameworks, versions)
Directory and file enumeration (Gobuster, Feroxbuster)
API endpoint discovery (OpenAPI/Swagger docs, WADL, fuzzing)
Authentication mechanism identification

Phase 3: Threat Modeling

Based on intelligence gathered, identify the most likely and impactful attack scenarios.

Map the application’s trust boundaries (where does unauthenticated become authenticated? Where does user become admin?).
Identify high-value targets (payment processing, PII storage, authentication systems, admin panels).
Determine attack vectors (public API, file upload, WebSocket, email processing).
Prioritize testing effort based on risk (likelihood x impact).

Phase 4: Vulnerability Analysis

Systematic identification of vulnerabilities through both automated scanning and manual analysis.

Run automated scanners (OWASP ZAP, Burp Suite, Nuclei) against in-scope targets.
Manually review scanner results for false positives.
Manually test for vulnerabilities that scanners miss (business logic, race conditions, authorization flaws).
Map discovered vulnerabilities to potential exploitation paths.

Phase 5: Exploitation

Attempt to exploit discovered vulnerabilities to demonstrate real-world impact.

Key principles:

Proof over theory: A vulnerability is not just a theoretical risk — demonstrate the actual impact. “SQL injection exists” is less compelling than “SQL injection allows extraction of all user credentials.”
Minimal impact: Demonstrate the vulnerability without causing damage. Extract one record to prove access, not the entire database. Demonstrate privilege escalation, don’t delete the admin account.
Chain vulnerabilities: Individual low-severity findings may combine into critical attack chains. An information disclosure + IDOR + missing rate limiting might chain into an account takeover.
Document everything: Every exploitation attempt (successful or not) is documented with screenshots, request/response pairs, and timestamps. This is your evidence.

Phase 6: Post-Exploitation

After gaining initial access, determine what additional access and data can be reached.

Lateral movement: Can compromised credentials or access be used to reach other systems?
Privilege escalation: Can a regular user account be escalated to admin?
Data access: What sensitive data can be accessed from the compromised position?
Persistence: Could an attacker maintain access after the initial vulnerability is patched?
Impact assessment: What is the realistic business impact of this compromise?

Phase 7: Reporting

The report is the primary deliverable. A vulnerability found but poorly reported is a vulnerability not fixed.

4. Web Application Penetration Testing Methodology

4.1 Reconnaissance and Information Gathering

Beyond the general intelligence gathering in Phase 2, web application testing requires application-specific reconnaissance:

Map all endpoints (pages, APIs, WebSockets, GraphQL queries).
Identify all input vectors (forms, URL parameters, headers, cookies, file uploads).
Determine technology stack (server, framework, database, CDN, WAF).
Review client-side code (JavaScript, comments, hidden fields, hard-coded values).
Identify all user roles and privilege levels.

4.2 Authentication Testing

Test	What to Check
Credential stuffing	Does the app rate-limit login attempts? Detect credential reuse?
Brute force	Account lockout after failed attempts? CAPTCHA? Progressive delays?
MFA bypass	Can MFA be skipped? Is the MFA token predictable? Can it be replayed?
Password reset	Token predictability, token expiration, account enumeration via reset
Session fixation	Can session ID be set before authentication?
Default credentials	Admin accounts with default passwords, API keys in documentation
Token security	JWT algorithm confusion, none algorithm, key disclosure

4.3 Authorization Testing

Authorization bugs are among the most impactful and most commonly missed by automated tools.

IDOR (Insecure Direct Object Reference): Change /api/user/123/profile to /api/user/456/profile. Can User 123 see User 456’s data?

BOLA (Broken Object-Level Authorization): Same concept as IDOR, formalized in the OWASP API Security Top 10. Every API endpoint that accepts a resource identifier must validate that the authenticated user has authorization to access that specific resource.

BFLA (Broken Function-Level Authorization): Regular user calls admin API endpoints. /api/admin/users/delete — does the API check that the caller is actually an admin?

Privilege escalation: Modify role in JWT token, change role=user to role=admin in request parameter, access admin panel URL directly.

4.4 Session Management Testing

Session ID entropy (is it predictable?)
Session timeout (does it expire after inactivity?)
Session invalidation (does logout actually destroy the session?)
Cookie attributes (HttpOnly, Secure, SameSite)
Concurrent session handling (can the same account have unlimited active sessions?)

4.5 Input Validation Testing

Attack Class	Test Approach
SQL Injection	Parameterized? Try `' OR 1=1--`, time-based blind, error-based
XSS (Reflected)	Input reflected in response? Try `<script>`, event handlers, encoding bypass
XSS (Stored)	Input stored and displayed to other users? Same payloads in persistent contexts
SSRF	Can you make the server request internal URLs? `http://169.254.169.254/`
Command Injection	Input reaches system commands? Try `; id`, `id`, `$(id)`
Template Injection	Input rendered in templates? Try `{{77}}`, `${77}`, `<%= 7*7 %>`
XXE	XML input accepted? Try external entity definitions
Path Traversal	File paths in parameters? Try `../../etc/passwd`

4.6 Business Logic Testing

Business logic vulnerabilities are the highest-impact findings because they represent flaws in the application’s core purpose, and they cannot be caught by automated tools.

Common patterns:

Workflow bypass: Skip steps in a multi-step process (e.g., skip payment step in checkout, skip approval step in workflow).
Race conditions: Submit two requests simultaneously to exploit time-of-check-to-time-of-use gaps (double-spend, inventory oversell, duplicate reward claims).
Price manipulation: Modify price in client-side request, apply discount codes multiple times, use negative quantities.
State manipulation: Change order status directly via API, manipulate workflow state to re-enter a completed step.
Limit bypass: Exceed rate limits by varying request parameters, bypass withdrawal limits by splitting into multiple transactions.

4.7 API-Specific Testing

Mass assignment: send extra fields in request body that map to internal model attributes (e.g., { "name": "User", "role": "admin" }).
Excessive data exposure: API returns more data than the client displays (e.g., full user object including hashed password).
Resource enumeration: sequential IDs enabling enumeration of all resources.
Rate limiting: API endpoints without rate limiting enabling denial of service or brute force.
Versioning: old API versions with known vulnerabilities still accessible.

4.8 File Upload Testing

Can executable files be uploaded (PHP, JSP, ASPX)?
Can the upload path be manipulated to overwrite existing files?
Are file type checks based on extension only (bypassable) or content-type analysis?
Are uploaded files served from the same domain (enabling XSS via SVG/HTML uploads)?
Is there a file size limit (preventing denial of service via large uploads)?

4.9 Error Handling and Information Disclosure

Stack traces exposed in error responses.
Database error messages revealing table/column names.
Version information in HTTP headers (Server, X-Powered-By).
Debug endpoints accessible in production.
Source code comments visible in client-side code.

5. Scope and Rules of Engagement

5.1 Scope Definition

A clear scope prevents legal issues and focuses testing effort.

## In Scope
- Web application: https://app.example.com
- API endpoints: https://api.example.com/v2/*
- Mobile API: https://mobile-api.example.com
- Authentication: SSO provider integration testing
- User roles: standard user, premium user, admin

## Out of Scope
- Infrastructure (servers, networks, cloud accounts)
- Third-party integrations (payment gateway, email provider)
- Denial of service testing
- Social engineering
- Physical access testing
- Other customers' data or environments

5.2 Testing Windows

Specify when testing is permitted (e.g., weekdays 8 AM - 6 PM EST, or 24/7).
Identify blackout periods (month-end processing, peak traffic periods, scheduled maintenance).
Define notification requirements (alert SOC before testing begins each day).

5.3 Emergency Procedures

Immediate stop protocol: How to halt testing immediately if something goes wrong.
Escalation contacts: Who to call if the tester discovers active compromise, critical data exposure, or causes unintended impact.
Data breach protocol: If the tester discovers evidence of an existing breach (not from their testing), who to notify and how.

5.4 Legal Authorization

Penetration testing without written authorization is unauthorized access — a criminal offense in most jurisdictions (Computer Fraud and Abuse Act in the US, Computer Misuse Act in the UK, equivalent laws globally).

Authorization must be:

Written and signed by an individual with authority to authorize testing of the systems.
Specific about what systems, techniques, and timeframes are authorized.
Current (not expired, not from a previous engagement).
Available during testing (the tester must be able to produce authorization on demand if questioned).

6. Reporting

6.1 Executive Summary

One page. Non-technical. Written for CISOs, business executives, and board members.

Overall risk rating (Critical / High / Medium / Low)
Number of findings by severity
Top 3 findings with business impact (not technical details)
Comparison to previous assessment (are things getting better or worse?)
Key recommendation (one sentence: the single most important thing to fix)

6.2 Technical Findings

Each finding documented with:

## Finding: Broken Object-Level Authorization in Account API
**Severity**: Critical (CVSS 9.1)
**Business Impact**: Any authenticated user can access any other user's financial data
**CWE**: CWE-639 (Authorization Bypass Through User-Controlled Key)

### Description
The `/api/v2/accounts/{account_id}/transactions` endpoint does not validate
that the authenticated user owns the requested account. By changing the
account_id parameter, an attacker can retrieve transaction history for any
account in the system.

### Evidence
**Request:**
GET /api/v2/accounts/98765/transactions HTTP/2
Host: api.example.com
Authorization: Bearer eyJ...[User A's token]

**Response:** (200 OK — User B's transactions returned)
{
  "account_id": "98765",
  "owner": "user_b@example.com",
  "transactions": [
    {"date": "2026-03-01", "amount": -2500.00, "description": "Wire transfer"},
    ...
  ]
}

### Reproduction Steps
1. Authenticate as User A (any standard user account)
2. Navigate to account transactions: /api/v2/accounts/{own_account_id}/transactions
3. Change the account_id to any other valid account ID
4. Observe that the API returns the other account's transactions

### Remediation
Implement server-side authorization check: verify that the authenticated
user owns the requested account before returning data. Example:

if request.user.id != account.owner_id:
    return 403 Forbidden

### References
- OWASP API Security Top 10: API1 — Broken Object-Level Authorization
- CWE-639: Authorization Bypass Through User-Controlled Key

6.3 Severity Ratings

Use CVSS for technical severity AND business impact for risk context:

Technical Severity (CVSS)	Business Impact	Risk Rating
Critical (9.0-10.0)	Revenue/data loss	Critical
High (7.0-8.9)	Compliance risk	High
Medium (4.0-6.9)	Operational impact	Medium
Low (0.1-3.9)	Minimal impact	Low
Informational	Best practice gap	Info

6.4 Remediation Recommendations

Each finding includes specific, actionable remediation guidance. Not “fix the vulnerability” — specific code patterns, configuration changes, or architectural modifications.

6.5 Retest Verification

After remediation, the tester re-executes the same exploitation steps to verify the fix. The retest report documents:

Original finding reference
Remediation implemented
Retest results (fixed / partially fixed / not fixed)
Evidence of fix (request/response showing the vulnerability is no longer exploitable)

7. AI-Powered Penetration Testing (2025-2026)

7.1 The Shift from Automation to Autonomy

Traditional automated scanning tools follow predetermined scripts: crawl, inject payloads from a list, check responses for patterns. They are fast but stupid.

Agentic AI penetration testing represents a qualitative shift. AI agents:

Generate payloads based on the target’s technology and observed behavior (not from a static list).
Send payloads to the target and observe the response.
Analyze responses to determine whether exploitation succeeded or what information was leaked.
Refine the approach based on what was learned — adapting payloads, trying alternative attack paths, chaining findings.
Retry with new strategies when initial approaches fail.

This is the difference between a script kiddie and a skilled attacker: the ability to think, adapt, and persist.

7.2 Current AI Pen Testing Tools

PentAGI: Fully autonomous AI agent framework for penetration testing. Orchestrates multiple specialized agents (reconnaissance agent, exploitation agent, reporting agent) that collaborate on complex engagements. Can plan and execute multi-step attack chains without human intervention.

Zen-AI-Pentest: Open-source AI-powered penetration testing framework. Provides a structured approach to AI-assisted testing with human oversight at each stage. Designed for integration into existing security testing workflows.

XBOW: Automated vulnerability discovery and exploitation platform. Specializes in finding and exploiting web application vulnerabilities. Demonstrated capability to find and exploit real-world zero-day vulnerabilities in bug bounty programs.

Penligent: Agentic red teaming platform. Goes beyond single-vulnerability discovery to simulate realistic attack campaigns — the AI plans and executes multi-step attack scenarios mimicking sophisticated threat actors.

7.3 Industry Trajectory

Industry analysts predict that by 2027, manual penetration testing will become a boutique service reserved for niche problems that require deep human expertise — complex business logic, novel technology stacks, and adversarial AI testing. An estimated 99% of vulnerability assessments will be conducted by agentic AI systems.

This does not mean human penetration testers become obsolete. It means they shift from “finding SQL injection” (which AI does faster and more consistently) to “understanding the business impact of a complex attack chain” and “validating AI findings” and “testing business logic that requires domain expertise.”

7.4 AI Limitations in Penetration Testing

Business logic testing: AI can find technical vulnerabilities mechanically. It cannot understand that “approving your own purchase order violates segregation of duties” or “transferring funds between your own accounts to generate loyalty points is fraud.” Business logic testing requires understanding business context, organizational controls, and human intent.

Context-specific vulnerabilities: Every organization has unique systems, integrations, and workflows. AI may miss vulnerabilities that arise from the specific way an organization has configured, customized, or integrated standard components.

Ethical and legal considerations: An autonomous AI agent exploiting vulnerabilities raises significant ethical and legal questions. What if the AI causes unintended damage? What if it accesses data outside the scope? What if it discovers and processes PII while testing? Human oversight is essential for scope compliance, data handling, and damage prevention.

Human oversight is mandatory: Even the most advanced AI pen testing tools require human oversight for:

Scope definition and enforcement
Authorization verification
Impact assessment of exploitation
Business context for finding severity
Legal compliance during testing
Final report review and quality assurance

8. Bug Bounty Programs

Bug bounty programs leverage the global security research community to find vulnerabilities continuously, complementing periodic penetration tests.

8.1 Program Structure

Component	Description
Scope	What systems/endpoints are in scope for researchers
Rules	Permitted testing techniques, prohibited actions, disclosure rules
Rewards	Payment tiers by vulnerability severity
Safe harbor	Legal protection for researchers acting in good faith
Response SLA	Time to acknowledge, triage, and resolve submissions
Disclosure policy	Timeline for public disclosure after fix (typically 90 days)

8.2 Platform Selection

Platform	Strengths
HackerOne	Largest community, strong triage support, enterprise features
Bugcrowd	Curated researcher matching, good for private programs
Intigriti	Strong in European market, GDPR-aware platform
Self-hosted	Full control, no platform fees, requires significant internal resources

8.3 Reward Tiers

Severity	Typical Range	Example
Critical	$5,000 – $50,000+	RCE, authentication bypass, mass data access
High	$2,000 – $10,000	IDOR with sensitive data, stored XSS
Medium	$500 – $2,000	CSRF, reflected XSS, information disclosure
Low	$100 – $500	Missing headers, minor info leak, debug info

8.4 Responsible Disclosure

Researcher reports vulnerability through the program.
Organization acknowledges within 1 business day.
Organization triages and confirms within 5 business days.
Organization remediates within agreed timeframe (30-90 days depending on severity).
Researcher may publicly disclose after the fix is deployed (coordinated disclosure).

9. Frequency and Prioritization

9.1 Testing Frequency

Application Risk Level	Formal Pen Test	Automated Testing	Bug Bounty
Critical (external, financial, PII)	Annually + after major changes	Continuous (CI/CD)	Continuous
High (external, moderate data)	Annually	Every release	Consider
Medium (internal, limited data)	Every 2 years	Quarterly	Not typical
Low (internal, no sensitive data)	On significant change	Annually	Not typical

9.2 Trigger-Based Testing

Beyond scheduled testing, penetration tests should be triggered by:

Major architecture changes (new microservice, new integration, new authentication provider)
Technology stack changes (new framework, new database, new cloud provider)
Significant new features (payment processing, user file upload, API opening to partners)
Post-incident (after a security incident, test the remediation and related systems)
Merger/acquisition (assess security of acquired applications before integration)

10. Key Takeaways

CIS 16.13 requires manual testing by skilled testers. Automated tools are necessary but insufficient. Business logic vulnerabilities require human understanding.
Gray box is the default choice. It provides the best balance of thoroughness and efficiency for most penetration tests.
Follow PTES methodology. Structured testing ensures consistency, completeness, and professional reporting.
Authorization testing is where the highest-impact bugs hide. IDOR, BOLA, BFLA, and privilege escalation are the most common critical findings in modern web applications.
AI is transforming penetration testing from a periodic event to a continuous capability. Agentic AI tools can run autonomously, but human oversight remains mandatory for scope compliance, business context, and ethical considerations.
The report is the deliverable. A vulnerability found but poorly reported is a vulnerability not fixed. Invest in clear, actionable reporting with evidence and specific remediation guidance.
Bug bounty programs complement but do not replace formal penetration testing. Bounty programs provide continuous coverage; formal tests provide structured, comprehensive assessment.

Review Questions

A developer asks why gray box testing is preferred over black box. Explain the advantages using specific examples of vulnerability types that gray box testing catches more efficiently.
During a penetration test, you discover evidence of an active breach by a third party (not from your testing). What is your immediate response, and what protocol governs this situation?
Design the scope and rules of engagement for a penetration test of a healthcare web application that processes patient records. Include in-scope and out-of-scope elements, permitted techniques, and data handling requirements.
An AI penetration testing tool reports it has found and exploited a critical vulnerability, but the finding is outside the agreed scope. What are the legal, ethical, and technical implications?
Your organization receives a bug bounty submission for a critical IDOR vulnerability. Describe the end-to-end process from receipt to resolution, including triage, remediation, verification, and researcher communication.

References

CIS Controls v8, Safeguard 16.13 — Conduct Application Penetration Testing
OWASP Penetration Testing Execution Standard (PTES)
OWASP Web Security Testing Guide v4.2
OWASP API Security Top 10 (2023)
NIST SP 800-115 — Technical Guide to Information Security Testing and Assessment
PentAGI Project — https://github.com/vxcontrol/pentagi
XBOW — Automated Vulnerability Discovery
Penligent — Agentic Red Teaming
HackerOne — https://www.hackerone.com
Bugcrowd — https://www.bugcrowd.com
CVSS v4.0 Specification — https://www.first.org/cvss/v4.0/specification-document

Study Guide

Key Takeaways

CIS 16.13 requires manual testing by skilled testers — Automated scanners cannot find business logic flaws like “user can approve their own expense report.”
Gray box is the default choice — Provides optimal balance of thoroughness and efficiency; tests both authenticated and unauthenticated surfaces.
Written authorization is mandatory — Pen testing without it is a criminal offense under CFAA and equivalent global laws.
OWASP PTES has seven phases — Pre-Engagement, Intelligence Gathering, Threat Modeling, Vulnerability Analysis, Exploitation, Post-Exploitation, Reporting.
Authorization bugs are highest-impact findings — BOLA, BFLA, IDOR, and privilege escalation are most common critical findings in modern web apps.
AI is shifting pen testing from periodic to continuous — Agentic AI generates context-aware payloads and chains findings adaptively, but cannot understand business logic.
Bug bounty complements but does not replace formal testing — Continuous coverage from bounties plus structured assessment from formal tests.

Important Definitions

Term	Definition
BOLA	Broken Object-Level Authorization — API fails to validate user owns the requested resource
BFLA	Broken Function-Level Authorization — regular user can call admin-only API endpoints
IDOR	Insecure Direct Object Reference — changing resource IDs in requests accesses other users’ data
PTES	Penetration Testing Execution Standard — OWASP’s 7-phase structured methodology
Black Box	No prior knowledge testing; simulates external attacker
Gray Box	Partial knowledge (credentials, docs, API specs); optimal for most tests
White Box	Full knowledge (source code, schemas, all credentials); most thorough
Agentic AI Pen Testing	AI agents that generate payloads, analyze responses, and chain findings adaptively
Safe Harbor	Legal protection for bug bounty researchers acting in good faith
Rules of Engagement	Permitted testing techniques, windows, and notification requirements

Quick Reference

Testing Frequency: Critical apps = annually + after major changes + continuous bug bounty; High = annually; Medium = every 2 years
Bug Bounty Rewards: Critical $5K-$50K+, High $2K-$10K, Medium $500-$2K, Low $100-$500
Report Structure: Executive Summary (1 page) + Technical Findings (CVSS + evidence + remediation) + Severity Ratings
CVSS Ranges: Critical 9.0-10.0, High 7.0-8.9, Medium 4.0-6.9, Low 0.1-3.9
Common Pitfalls: Testing only unauthenticated surface, skipping business logic testing, no written authorization, poorly documented findings, relying solely on automated tools

Review Questions

Explain why gray box testing is preferred over black box with specific examples of vulnerability types caught more efficiently.
During a pen test you discover an active breach by a third party — what is your immediate response and which protocol governs this situation?
Design scope and rules of engagement for a pen test of a healthcare application processing patient records.
An AI pen testing tool finds and exploits a vulnerability outside the agreed scope — what are the legal, ethical, and technical implications?
Describe the end-to-end process for handling a critical IDOR bug bounty submission from receipt to resolution.

Q1. According to CIS 16.13, what type of testing must be included that automated scanners cannot provide?

Performance testing

Load testing

Manual testing by skilled testers focused on business logic vulnerabilities

Regression testing

Q2. Which penetration testing approach provides the optimal balance of thoroughness and efficiency for most tests?

Black box testing

Gray box testing

White box testing

Automated scanning only

Q3. What must be obtained before conducting a penetration test to avoid criminal liability?

Verbal approval from the project manager

Written, signed authorization from an individual with authority to authorize testing of the systems

An email from the development team lead

Approval from the bug bounty platform

Q4. What are the seven phases of the OWASP Penetration Testing Execution Standard (PTES)?

Planning, Scanning, Attacking, Reporting, Retesting, Closing, Review

Pre-Engagement, Intelligence Gathering, Threat Modeling, Vulnerability Analysis, Exploitation, Post-Exploitation, Reporting

Reconnaissance, Weaponization, Delivery, Exploitation, Installation, C2, Actions

Scope, Discover, Assess, Report, Remediate, Verify, Close

Q5. What is BOLA and why is it significant in penetration testing?

BOLA is a buffer overflow attack against legacy applications

BOLA (Broken Object-Level Authorization) means API endpoints fail to validate that the authenticated user has access to the specific requested resource

BOLA is a brute-force attack against login endpoints

BOLA is a browser-based attack that exploits outdated JavaScript libraries

Q6. What key capability distinguishes agentic AI penetration testing from traditional automated scanning?

Agentic AI runs faster than traditional scanners

Agentic AI can generate payloads, analyze responses, refine approaches, and chain findings adaptively rather than following predetermined scripts

Agentic AI does not require scope definitions

Agentic AI can test physical security controls

Q7. What is the primary limitation of AI in penetration testing according to the module?

AI cannot run network scans

AI cannot generate HTTP requests

AI cannot understand business logic like segregation of duties violations or fraud scenarios

AI cannot produce reports

Q8. In a bug bounty program, what is the typical reward range for a critical vulnerability such as RCE or authentication bypass?

$100 - $500

$500 - $2,000

$2,000 - $10,000

$5,000 - $50,000+

Q9. How often should critical external applications with financial data or PII receive formal penetration testing?

Every two years

Quarterly

Annually plus after major changes

Only when a vulnerability is suspected

Q10. During a penetration test, the tester discovers evidence of an active breach by a third party unrelated to their testing. What should they do?

Continue testing and include it in the final report

Attempt to remediate the breach themselves

Follow the emergency escalation contacts and data breach protocol defined in the pre-engagement phase

Stop testing and wait for the next scheduled meeting

Answered: 0 of 10 · Score: 0/0 (0%)

5.4 — Penetration Testing

Learning Objectives

1. CIS Control 16.13 — Conduct Application Penetration Testing

2. Types of Penetration Testing

2.1 Black Box Testing

2.2 Gray Box Testing

2.3 White Box Testing

3. OWASP PTES — Seven Phases

Phase 1: Pre-Engagement Interactions

Phase 2: Intelligence Gathering

Phase 3: Threat Modeling

Phase 4: Vulnerability Analysis

Phase 5: Exploitation

Phase 6: Post-Exploitation

Phase 7: Reporting

4. Web Application Penetration Testing Methodology

4.1 Reconnaissance and Information Gathering

4.2 Authentication Testing

4.3 Authorization Testing

4.4 Session Management Testing

4.5 Input Validation Testing

4.6 Business Logic Testing

4.7 API-Specific Testing

4.8 File Upload Testing

4.9 Error Handling and Information Disclosure

5. Scope and Rules of Engagement

5.1 Scope Definition

5.2 Testing Windows

5.3 Emergency Procedures

5.4 Legal Authorization

6. Reporting

6.1 Executive Summary

6.2 Technical Findings

6.3 Severity Ratings

6.4 Remediation Recommendations

6.5 Retest Verification

7. AI-Powered Penetration Testing (2025-2026)

7.1 The Shift from Automation to Autonomy

7.2 Current AI Pen Testing Tools

7.3 Industry Trajectory

7.4 AI Limitations in Penetration Testing

8. Bug Bounty Programs

8.1 Program Structure

8.2 Platform Selection

8.3 Reward Tiers

8.4 Responsible Disclosure

9. Frequency and Prioritization

9.1 Testing Frequency

9.2 Trigger-Based Testing

10. Key Takeaways

Review Questions

References

Study Guide

Key Takeaways

Important Definitions

Quick Reference

Review Questions

Module Media