5.4 — Penetration Testing
Listen instead
Learning Objectives
- ✓ Explain CIS 16.13 requirements for authenticated and unauthenticated penetration testing.
- ✓ Differentiate between black box, gray box, and white box penetration testing and select the appropriate approach.
- ✓ Execute a structured penetration test following the OWASP PTES seven-phase methodology.
- ✓ Produce professional penetration test reports with technical findings, business impact, and remediation guidance.
- ✓ Evaluate AI-powered penetration testing tools and understand their capabilities and limitations.
- ✓ Design a bug bounty program that complements formal penetration testing.
1. CIS Control 16.13 — Conduct Application Penetration Testing
CIS Safeguard 16.13 is an Implementation Group 3 (IG3) control:
“Conduct application-level penetration testing. Testing should include both authenticated and unauthenticated testing, be focused on business logic vulnerabilities, and include manual testing by skilled testers, with critical applications prioritized.”
This control contains several specific mandates worth unpacking:
Authenticated AND unauthenticated: Testing only from the outside (unauthenticated) misses the vast majority of attack surface. Most vulnerabilities are exploitable by authenticated users — privilege escalation, IDOR, business logic bypass, horizontal access. Testing must include both perspectives.
Business logic vulnerabilities: Automated scanners find injection, XSS, and misconfiguration. They do not find “user can approve their own expense report,” “coupon code can be applied after checkout,” or “negative quantity order results in a refund.” Business logic testing requires human understanding of the application’s purpose.
Manual testing by skilled testers: Automated tools are necessary but insufficient. Skilled penetration testers find vulnerabilities that no tool can — creative attack chains, context-dependent logic flaws, and novel exploitation techniques.
Critical applications prioritized: Not all applications need the same testing depth. A public-facing financial application gets comprehensive annual testing. An internal wiki gets a lighter assessment. Risk-based prioritization ensures resources are allocated where they have the greatest impact.
2. Types of Penetration Testing
2.1 Black Box Testing
Knowledge: No prior knowledge of the application. No source code, no architecture docs, no credentials.
Simulates: External attacker with no insider knowledge.
Strengths:
- Most realistic simulation of an external threat actor.
- Tests the application’s externally visible attack surface.
- Tests information disclosure (what can an attacker learn from error messages, headers, and publicly accessible resources?).
Weaknesses:
- Least thorough. The tester spends significant time on reconnaissance that could be skipped with documentation.
- Misses vulnerabilities behind authentication barriers (unless separately tested).
- Inefficient use of expensive tester time.
Best for: Compliance requirements that specify “external penetration test,” initial assessment of unknown applications, simulating realistic threat scenarios.
2.2 Gray Box Testing
Knowledge: Partial knowledge — typically valid credentials for one or more roles, architecture documentation, and API specifications.
Simulates: Insider threat, compromised user account, or attacker who has gained initial access.
Strengths:
- Most realistic for authorized testing. Most attackers have some information (from OSINT, phishing, or purchased credentials).
- Tests the full attack surface: unauthenticated endpoints AND authenticated functionality.
- Efficient use of tester time — less time on reconnaissance, more time on exploitation.
Weaknesses:
- Requires coordination to provide appropriate access and documentation.
- Findings may be influenced by the provided information (tester follows the documentation rather than discovering actual behavior).
Best for: Most penetration tests. Gray box provides the optimal balance of realism, thoroughness, and efficiency.
2.3 White Box Testing
Knowledge: Full knowledge — source code, architecture documents, deployment configurations, database schemas, credentials for all roles.
Simulates: Malicious insider with full system access, or comprehensive security audit.
Strengths:
- Most thorough. Every attack surface is known and can be tested.
- Testers can identify vulnerabilities in code paths that may be difficult to reach through black box approaches.
- Combines the depth of code review with the proof-of-exploitability of penetration testing.
Weaknesses:
- Least realistic. Attackers rarely have source code access (though supply chain attacks change this).
- Most expensive. Reviewing source code and testing application takes more time.
- Can overwhelm testers with information, leading to unfocused testing.
Best for: High-security applications (financial, healthcare, government), applications processing highly sensitive data, applications with complex business logic.
3. OWASP PTES — Seven Phases
The Penetration Testing Execution Standard provides a structured methodology. Following it ensures consistent, comprehensive, and professional testing.
Phase 1: Pre-Engagement Interactions
Everything that happens before testing begins. This phase prevents legal issues, scope disputes, and miscommunication.
Deliverables:
- Scope document: Exactly what is in scope (URLs, IP ranges, API endpoints, mobile apps) and what is out of scope (third-party services, production databases, other customers’ data).
- Rules of engagement: What testing techniques are permitted (social engineering? DoS testing? Physical access?), testing windows (business hours only? Weekends?), notification requirements.
- Authorization letter: Written, signed authorization from the asset owner granting permission to test. Without this, penetration testing is a crime in most jurisdictions.
- Communication plan: Primary contacts on both sides, escalation procedures, emergency stop protocol.
- Data handling requirements: How will sensitive data discovered during testing be handled? Encrypted storage, secure deletion after engagement, access limited to the testing team.
Phase 2: Intelligence Gathering
Systematic collection of information about the target. For gray/white box tests, this supplements provided documentation with independently discovered information.
Passive reconnaissance (no direct interaction with the target):
- DNS records (subdomains, mail servers, TXT records)
- WHOIS data (registration, hosting, contacts)
- Certificate Transparency logs (all TLS certificates ever issued for the domain)
- Search engine cache (indexed pages, cached credentials, exposed documents)
- Public code repositories (GitHub, GitLab — accidentally committed secrets, API keys, internal URLs)
- Social media and job postings (technology stack, internal tools, team structure)
Active reconnaissance (direct interaction with the target):
- Port scanning and service enumeration (Nmap)
- Web application fingerprinting (technology stack, frameworks, versions)
- Directory and file enumeration (Gobuster, Feroxbuster)
- API endpoint discovery (OpenAPI/Swagger docs, WADL, fuzzing)
- Authentication mechanism identification
Phase 3: Threat Modeling
Based on intelligence gathered, identify the most likely and impactful attack scenarios.
- Map the application’s trust boundaries (where does unauthenticated become authenticated? Where does user become admin?).
- Identify high-value targets (payment processing, PII storage, authentication systems, admin panels).
- Determine attack vectors (public API, file upload, WebSocket, email processing).
- Prioritize testing effort based on risk (likelihood x impact).
Phase 4: Vulnerability Analysis
Systematic identification of vulnerabilities through both automated scanning and manual analysis.
- Run automated scanners (OWASP ZAP, Burp Suite, Nuclei) against in-scope targets.
- Manually review scanner results for false positives.
- Manually test for vulnerabilities that scanners miss (business logic, race conditions, authorization flaws).
- Map discovered vulnerabilities to potential exploitation paths.
Phase 5: Exploitation
Attempt to exploit discovered vulnerabilities to demonstrate real-world impact.
Key principles:
- Proof over theory: A vulnerability is not just a theoretical risk — demonstrate the actual impact. “SQL injection exists” is less compelling than “SQL injection allows extraction of all user credentials.”
- Minimal impact: Demonstrate the vulnerability without causing damage. Extract one record to prove access, not the entire database. Demonstrate privilege escalation, don’t delete the admin account.
- Chain vulnerabilities: Individual low-severity findings may combine into critical attack chains. An information disclosure + IDOR + missing rate limiting might chain into an account takeover.
- Document everything: Every exploitation attempt (successful or not) is documented with screenshots, request/response pairs, and timestamps. This is your evidence.
Phase 6: Post-Exploitation
After gaining initial access, determine what additional access and data can be reached.
- Lateral movement: Can compromised credentials or access be used to reach other systems?
- Privilege escalation: Can a regular user account be escalated to admin?
- Data access: What sensitive data can be accessed from the compromised position?
- Persistence: Could an attacker maintain access after the initial vulnerability is patched?
- Impact assessment: What is the realistic business impact of this compromise?
Phase 7: Reporting
The report is the primary deliverable. A vulnerability found but poorly reported is a vulnerability not fixed.
4. Web Application Penetration Testing Methodology
4.1 Reconnaissance and Information Gathering
Beyond the general intelligence gathering in Phase 2, web application testing requires application-specific reconnaissance:
- Map all endpoints (pages, APIs, WebSockets, GraphQL queries).
- Identify all input vectors (forms, URL parameters, headers, cookies, file uploads).
- Determine technology stack (server, framework, database, CDN, WAF).
- Review client-side code (JavaScript, comments, hidden fields, hard-coded values).
- Identify all user roles and privilege levels.
4.2 Authentication Testing
| Test | What to Check |
|---|---|
| Credential stuffing | Does the app rate-limit login attempts? Detect credential reuse? |
| Brute force | Account lockout after failed attempts? CAPTCHA? Progressive delays? |
| MFA bypass | Can MFA be skipped? Is the MFA token predictable? Can it be replayed? |
| Password reset | Token predictability, token expiration, account enumeration via reset |
| Session fixation | Can session ID be set before authentication? |
| Default credentials | Admin accounts with default passwords, API keys in documentation |
| Token security | JWT algorithm confusion, none algorithm, key disclosure |
4.3 Authorization Testing
Authorization bugs are among the most impactful and most commonly missed by automated tools.
IDOR (Insecure Direct Object Reference): Change /api/user/123/profile to /api/user/456/profile. Can User 123 see User 456’s data?
BOLA (Broken Object-Level Authorization): Same concept as IDOR, formalized in the OWASP API Security Top 10. Every API endpoint that accepts a resource identifier must validate that the authenticated user has authorization to access that specific resource.
BFLA (Broken Function-Level Authorization): Regular user calls admin API endpoints. /api/admin/users/delete — does the API check that the caller is actually an admin?
Privilege escalation: Modify role in JWT token, change role=user to role=admin in request parameter, access admin panel URL directly.
4.4 Session Management Testing
- Session ID entropy (is it predictable?)
- Session timeout (does it expire after inactivity?)
- Session invalidation (does logout actually destroy the session?)
- Cookie attributes (HttpOnly, Secure, SameSite)
- Concurrent session handling (can the same account have unlimited active sessions?)
4.5 Input Validation Testing
| Attack Class | Test Approach |
|---|---|
| SQL Injection | Parameterized? Try ' OR 1=1--, time-based blind, error-based |
| XSS (Reflected) | Input reflected in response? Try <script>, event handlers, encoding bypass |
| XSS (Stored) | Input stored and displayed to other users? Same payloads in persistent contexts |
| SSRF | Can you make the server request internal URLs? http://169.254.169.254/ |
| Command Injection | Input reaches system commands? Try ; id, `id`, $(id) |
| Template Injection | Input rendered in templates? Try {{7*7}}, ${7*7}, <%= 7*7 %> |
| XXE | XML input accepted? Try external entity definitions |
| Path Traversal | File paths in parameters? Try ../../etc/passwd |
4.6 Business Logic Testing
Business logic vulnerabilities are the highest-impact findings because they represent flaws in the application’s core purpose, and they cannot be caught by automated tools.
Common patterns:
- Workflow bypass: Skip steps in a multi-step process (e.g., skip payment step in checkout, skip approval step in workflow).
- Race conditions: Submit two requests simultaneously to exploit time-of-check-to-time-of-use gaps (double-spend, inventory oversell, duplicate reward claims).
- Price manipulation: Modify price in client-side request, apply discount codes multiple times, use negative quantities.
- State manipulation: Change order status directly via API, manipulate workflow state to re-enter a completed step.
- Limit bypass: Exceed rate limits by varying request parameters, bypass withdrawal limits by splitting into multiple transactions.
4.7 API-Specific Testing
- Mass assignment: send extra fields in request body that map to internal model attributes (e.g.,
{ "name": "User", "role": "admin" }). - Excessive data exposure: API returns more data than the client displays (e.g., full user object including hashed password).
- Resource enumeration: sequential IDs enabling enumeration of all resources.
- Rate limiting: API endpoints without rate limiting enabling denial of service or brute force.
- Versioning: old API versions with known vulnerabilities still accessible.
4.8 File Upload Testing
- Can executable files be uploaded (PHP, JSP, ASPX)?
- Can the upload path be manipulated to overwrite existing files?
- Are file type checks based on extension only (bypassable) or content-type analysis?
- Are uploaded files served from the same domain (enabling XSS via SVG/HTML uploads)?
- Is there a file size limit (preventing denial of service via large uploads)?
4.9 Error Handling and Information Disclosure
- Stack traces exposed in error responses.
- Database error messages revealing table/column names.
- Version information in HTTP headers (Server, X-Powered-By).
- Debug endpoints accessible in production.
- Source code comments visible in client-side code.
5. Scope and Rules of Engagement
5.1 Scope Definition
A clear scope prevents legal issues and focuses testing effort.
## In Scope
- Web application: https://app.example.com
- API endpoints: https://api.example.com/v2/*
- Mobile API: https://mobile-api.example.com
- Authentication: SSO provider integration testing
- User roles: standard user, premium user, admin
## Out of Scope
- Infrastructure (servers, networks, cloud accounts)
- Third-party integrations (payment gateway, email provider)
- Denial of service testing
- Social engineering
- Physical access testing
- Other customers' data or environments
5.2 Testing Windows
- Specify when testing is permitted (e.g., weekdays 8 AM - 6 PM EST, or 24/7).
- Identify blackout periods (month-end processing, peak traffic periods, scheduled maintenance).
- Define notification requirements (alert SOC before testing begins each day).
5.3 Emergency Procedures
- Immediate stop protocol: How to halt testing immediately if something goes wrong.
- Escalation contacts: Who to call if the tester discovers active compromise, critical data exposure, or causes unintended impact.
- Data breach protocol: If the tester discovers evidence of an existing breach (not from their testing), who to notify and how.
5.4 Legal Authorization
Penetration testing without written authorization is unauthorized access — a criminal offense in most jurisdictions (Computer Fraud and Abuse Act in the US, Computer Misuse Act in the UK, equivalent laws globally).
Authorization must be:
- Written and signed by an individual with authority to authorize testing of the systems.
- Specific about what systems, techniques, and timeframes are authorized.
- Current (not expired, not from a previous engagement).
- Available during testing (the tester must be able to produce authorization on demand if questioned).
6. Reporting
6.1 Executive Summary
One page. Non-technical. Written for CISOs, business executives, and board members.
- Overall risk rating (Critical / High / Medium / Low)
- Number of findings by severity
- Top 3 findings with business impact (not technical details)
- Comparison to previous assessment (are things getting better or worse?)
- Key recommendation (one sentence: the single most important thing to fix)
6.2 Technical Findings
Each finding documented with:
## Finding: Broken Object-Level Authorization in Account API
**Severity**: Critical (CVSS 9.1)
**Business Impact**: Any authenticated user can access any other user's financial data
**CWE**: CWE-639 (Authorization Bypass Through User-Controlled Key)
### Description
The `/api/v2/accounts/{account_id}/transactions` endpoint does not validate
that the authenticated user owns the requested account. By changing the
account_id parameter, an attacker can retrieve transaction history for any
account in the system.
### Evidence
**Request:**
GET /api/v2/accounts/98765/transactions HTTP/2
Host: api.example.com
Authorization: Bearer eyJ...[User A's token]
**Response:** (200 OK — User B's transactions returned)
{
"account_id": "98765",
"owner": "user_b@example.com",
"transactions": [
{"date": "2026-03-01", "amount": -2500.00, "description": "Wire transfer"},
...
]
}
### Reproduction Steps
1. Authenticate as User A (any standard user account)
2. Navigate to account transactions: /api/v2/accounts/{own_account_id}/transactions
3. Change the account_id to any other valid account ID
4. Observe that the API returns the other account's transactions
### Remediation
Implement server-side authorization check: verify that the authenticated
user owns the requested account before returning data. Example:
if request.user.id != account.owner_id:
return 403 Forbidden
### References
- OWASP API Security Top 10: API1 — Broken Object-Level Authorization
- CWE-639: Authorization Bypass Through User-Controlled Key
6.3 Severity Ratings
Use CVSS for technical severity AND business impact for risk context:
| Technical Severity (CVSS) | Business Impact | Risk Rating |
|---|---|---|
| Critical (9.0-10.0) | Revenue/data loss | Critical |
| High (7.0-8.9) | Compliance risk | High |
| Medium (4.0-6.9) | Operational impact | Medium |
| Low (0.1-3.9) | Minimal impact | Low |
| Informational | Best practice gap | Info |
6.4 Remediation Recommendations
Each finding includes specific, actionable remediation guidance. Not “fix the vulnerability” — specific code patterns, configuration changes, or architectural modifications.
6.5 Retest Verification
After remediation, the tester re-executes the same exploitation steps to verify the fix. The retest report documents:
- Original finding reference
- Remediation implemented
- Retest results (fixed / partially fixed / not fixed)
- Evidence of fix (request/response showing the vulnerability is no longer exploitable)
7. AI-Powered Penetration Testing (2025-2026)
7.1 The Shift from Automation to Autonomy
Traditional automated scanning tools follow predetermined scripts: crawl, inject payloads from a list, check responses for patterns. They are fast but stupid.
Agentic AI penetration testing represents a qualitative shift. AI agents:
- Generate payloads based on the target’s technology and observed behavior (not from a static list).
- Send payloads to the target and observe the response.
- Analyze responses to determine whether exploitation succeeded or what information was leaked.
- Refine the approach based on what was learned — adapting payloads, trying alternative attack paths, chaining findings.
- Retry with new strategies when initial approaches fail.
This is the difference between a script kiddie and a skilled attacker: the ability to think, adapt, and persist.
7.2 Current AI Pen Testing Tools
PentAGI: Fully autonomous AI agent framework for penetration testing. Orchestrates multiple specialized agents (reconnaissance agent, exploitation agent, reporting agent) that collaborate on complex engagements. Can plan and execute multi-step attack chains without human intervention.
Zen-AI-Pentest: Open-source AI-powered penetration testing framework. Provides a structured approach to AI-assisted testing with human oversight at each stage. Designed for integration into existing security testing workflows.
XBOW: Automated vulnerability discovery and exploitation platform. Specializes in finding and exploiting web application vulnerabilities. Demonstrated capability to find and exploit real-world zero-day vulnerabilities in bug bounty programs.
Penligent: Agentic red teaming platform. Goes beyond single-vulnerability discovery to simulate realistic attack campaigns — the AI plans and executes multi-step attack scenarios mimicking sophisticated threat actors.
7.3 Industry Trajectory
Industry analysts predict that by 2027, manual penetration testing will become a boutique service reserved for niche problems that require deep human expertise — complex business logic, novel technology stacks, and adversarial AI testing. An estimated 99% of vulnerability assessments will be conducted by agentic AI systems.
This does not mean human penetration testers become obsolete. It means they shift from “finding SQL injection” (which AI does faster and more consistently) to “understanding the business impact of a complex attack chain” and “validating AI findings” and “testing business logic that requires domain expertise.”
7.4 AI Limitations in Penetration Testing
Business logic testing: AI can find technical vulnerabilities mechanically. It cannot understand that “approving your own purchase order violates segregation of duties” or “transferring funds between your own accounts to generate loyalty points is fraud.” Business logic testing requires understanding business context, organizational controls, and human intent.
Context-specific vulnerabilities: Every organization has unique systems, integrations, and workflows. AI may miss vulnerabilities that arise from the specific way an organization has configured, customized, or integrated standard components.
Ethical and legal considerations: An autonomous AI agent exploiting vulnerabilities raises significant ethical and legal questions. What if the AI causes unintended damage? What if it accesses data outside the scope? What if it discovers and processes PII while testing? Human oversight is essential for scope compliance, data handling, and damage prevention.
Human oversight is mandatory: Even the most advanced AI pen testing tools require human oversight for:
- Scope definition and enforcement
- Authorization verification
- Impact assessment of exploitation
- Business context for finding severity
- Legal compliance during testing
- Final report review and quality assurance
8. Bug Bounty Programs
Bug bounty programs leverage the global security research community to find vulnerabilities continuously, complementing periodic penetration tests.
8.1 Program Structure
| Component | Description |
|---|---|
| Scope | What systems/endpoints are in scope for researchers |
| Rules | Permitted testing techniques, prohibited actions, disclosure rules |
| Rewards | Payment tiers by vulnerability severity |
| Safe harbor | Legal protection for researchers acting in good faith |
| Response SLA | Time to acknowledge, triage, and resolve submissions |
| Disclosure policy | Timeline for public disclosure after fix (typically 90 days) |
8.2 Platform Selection
| Platform | Strengths |
|---|---|
| HackerOne | Largest community, strong triage support, enterprise features |
| Bugcrowd | Curated researcher matching, good for private programs |
| Intigriti | Strong in European market, GDPR-aware platform |
| Self-hosted | Full control, no platform fees, requires significant internal resources |
8.3 Reward Tiers
| Severity | Typical Range | Example |
|---|---|---|
| Critical | $5,000 – $50,000+ | RCE, authentication bypass, mass data access |
| High | $2,000 – $10,000 | IDOR with sensitive data, stored XSS |
| Medium | $500 – $2,000 | CSRF, reflected XSS, information disclosure |
| Low | $100 – $500 | Missing headers, minor info leak, debug info |
8.4 Responsible Disclosure
- Researcher reports vulnerability through the program.
- Organization acknowledges within 1 business day.
- Organization triages and confirms within 5 business days.
- Organization remediates within agreed timeframe (30-90 days depending on severity).
- Researcher may publicly disclose after the fix is deployed (coordinated disclosure).
9. Frequency and Prioritization
9.1 Testing Frequency
| Application Risk Level | Formal Pen Test | Automated Testing | Bug Bounty |
|---|---|---|---|
| Critical (external, financial, PII) | Annually + after major changes | Continuous (CI/CD) | Continuous |
| High (external, moderate data) | Annually | Every release | Consider |
| Medium (internal, limited data) | Every 2 years | Quarterly | Not typical |
| Low (internal, no sensitive data) | On significant change | Annually | Not typical |
9.2 Trigger-Based Testing
Beyond scheduled testing, penetration tests should be triggered by:
- Major architecture changes (new microservice, new integration, new authentication provider)
- Technology stack changes (new framework, new database, new cloud provider)
- Significant new features (payment processing, user file upload, API opening to partners)
- Post-incident (after a security incident, test the remediation and related systems)
- Merger/acquisition (assess security of acquired applications before integration)
10. Key Takeaways
- CIS 16.13 requires manual testing by skilled testers. Automated tools are necessary but insufficient. Business logic vulnerabilities require human understanding.
- Gray box is the default choice. It provides the best balance of thoroughness and efficiency for most penetration tests.
- Follow PTES methodology. Structured testing ensures consistency, completeness, and professional reporting.
- Authorization testing is where the highest-impact bugs hide. IDOR, BOLA, BFLA, and privilege escalation are the most common critical findings in modern web applications.
- AI is transforming penetration testing from a periodic event to a continuous capability. Agentic AI tools can run autonomously, but human oversight remains mandatory for scope compliance, business context, and ethical considerations.
- The report is the deliverable. A vulnerability found but poorly reported is a vulnerability not fixed. Invest in clear, actionable reporting with evidence and specific remediation guidance.
- Bug bounty programs complement but do not replace formal penetration testing. Bounty programs provide continuous coverage; formal tests provide structured, comprehensive assessment.
Review Questions
-
A developer asks why gray box testing is preferred over black box. Explain the advantages using specific examples of vulnerability types that gray box testing catches more efficiently.
-
During a penetration test, you discover evidence of an active breach by a third party (not from your testing). What is your immediate response, and what protocol governs this situation?
-
Design the scope and rules of engagement for a penetration test of a healthcare web application that processes patient records. Include in-scope and out-of-scope elements, permitted techniques, and data handling requirements.
-
An AI penetration testing tool reports it has found and exploited a critical vulnerability, but the finding is outside the agreed scope. What are the legal, ethical, and technical implications?
-
Your organization receives a bug bounty submission for a critical IDOR vulnerability. Describe the end-to-end process from receipt to resolution, including triage, remediation, verification, and researcher communication.
References
- CIS Controls v8, Safeguard 16.13 — Conduct Application Penetration Testing
- OWASP Penetration Testing Execution Standard (PTES)
- OWASP Web Security Testing Guide v4.2
- OWASP API Security Top 10 (2023)
- NIST SP 800-115 — Technical Guide to Information Security Testing and Assessment
- PentAGI Project — https://github.com/vxcontrol/pentagi
- XBOW — Automated Vulnerability Discovery
- Penligent — Agentic Red Teaming
- HackerOne — https://www.hackerone.com
- Bugcrowd — https://www.bugcrowd.com
- CVSS v4.0 Specification — https://www.first.org/cvss/v4.0/specification-document
Study Guide
Key Takeaways
- CIS 16.13 requires manual testing by skilled testers — Automated scanners cannot find business logic flaws like “user can approve their own expense report.”
- Gray box is the default choice — Provides optimal balance of thoroughness and efficiency; tests both authenticated and unauthenticated surfaces.
- Written authorization is mandatory — Pen testing without it is a criminal offense under CFAA and equivalent global laws.
- OWASP PTES has seven phases — Pre-Engagement, Intelligence Gathering, Threat Modeling, Vulnerability Analysis, Exploitation, Post-Exploitation, Reporting.
- Authorization bugs are highest-impact findings — BOLA, BFLA, IDOR, and privilege escalation are most common critical findings in modern web apps.
- AI is shifting pen testing from periodic to continuous — Agentic AI generates context-aware payloads and chains findings adaptively, but cannot understand business logic.
- Bug bounty complements but does not replace formal testing — Continuous coverage from bounties plus structured assessment from formal tests.
Important Definitions
| Term | Definition |
|---|---|
| BOLA | Broken Object-Level Authorization — API fails to validate user owns the requested resource |
| BFLA | Broken Function-Level Authorization — regular user can call admin-only API endpoints |
| IDOR | Insecure Direct Object Reference — changing resource IDs in requests accesses other users’ data |
| PTES | Penetration Testing Execution Standard — OWASP’s 7-phase structured methodology |
| Black Box | No prior knowledge testing; simulates external attacker |
| Gray Box | Partial knowledge (credentials, docs, API specs); optimal for most tests |
| White Box | Full knowledge (source code, schemas, all credentials); most thorough |
| Agentic AI Pen Testing | AI agents that generate payloads, analyze responses, and chain findings adaptively |
| Safe Harbor | Legal protection for bug bounty researchers acting in good faith |
| Rules of Engagement | Permitted testing techniques, windows, and notification requirements |
Quick Reference
- Testing Frequency: Critical apps = annually + after major changes + continuous bug bounty; High = annually; Medium = every 2 years
- Bug Bounty Rewards: Critical $5K-$50K+, High $2K-$10K, Medium $500-$2K, Low $100-$500
- Report Structure: Executive Summary (1 page) + Technical Findings (CVSS + evidence + remediation) + Severity Ratings
- CVSS Ranges: Critical 9.0-10.0, High 7.0-8.9, Medium 4.0-6.9, Low 0.1-3.9
- Common Pitfalls: Testing only unauthenticated surface, skipping business logic testing, no written authorization, poorly documented findings, relying solely on automated tools
Review Questions
- Explain why gray box testing is preferred over black box with specific examples of vulnerability types caught more efficiently.
- During a pen test you discover an active breach by a third party — what is your immediate response and which protocol governs this situation?
- Design scope and rules of engagement for a pen test of a healthcare application processing patient records.
- An AI pen testing tool finds and exploits a vulnerability outside the agreed scope — what are the legal, ethical, and technical implications?
- Describe the end-to-end process for handling a critical IDOR bug bounty submission from receipt to resolution.