7.4 — Security Logging & Monitoring
Listen instead
Learning Objectives
- ✓ Design an application security logging strategy that captures necessary security events while protecting sensitive data.
- ✓ Implement structured logging with standard fields, appropriate levels, and tamper-evident storage.
- ✓ Integrate application logs with SIEM platforms for correlation, alerting, and investigation.
- ✓ Design security monitoring and alerting that detects real threats without creating alert fatigue.
- ✓ Implement AI-specific audit logging to track AI tool usage, AI-generated code provenance, and AI-attributable defects.
- ✓ Meet compliance requirements for security logging across PCI-DSS, HIPAA, SOC 2, and FedRAMP frameworks.
1. Why Application Security Logging Matters
Security logging is the immune system’s memory. Without it, every attack is a zero-day — not because the vulnerability is unknown, but because the organization has no ability to detect exploitation, reconstruct what happened, or learn from the event.
Detection: Security logs are the primary data source for identifying attacks in progress. A well-instrumented application can detect brute force attacks, injection attempts, privilege escalation, data exfiltration, and unauthorized access through its own logs, often before network-level monitoring tools see anything.
Investigation: When an incident occurs, application logs provide the evidence needed to understand what happened: who did what, when, how, and to what data. Without application-level logging, incident responders can see network traffic but cannot understand application-level semantics.
Compliance: Every major regulatory framework requires security logging. PCI-DSS Requirement 10, HIPAA Security Rule 164.312(b), SOC 2 CC7.2, and FedRAMP AU controls all mandate logging of security-relevant events.
Forensics: In the event of litigation, regulatory investigation, or law enforcement involvement, application logs serve as forensic evidence. Logs that are structured, timestamped, and tamper-evident carry significantly more evidentiary weight than ad-hoc log files.
CISA Secure by Design Goal 7 — Evidence of Intrusions: CISA’s Secure by Design initiative (Goal 7) requires software manufacturers to provide logs sufficient to detect and investigate intrusions. Specifically:
- Provide audit logs at no extra charge (not as a premium tier).
- Include sufficient detail to detect security-relevant events.
- Support integration with enterprise security tools.
- Retain logs for a meaningful period.
This is a direct rebuke to vendors who charge extra for security logging or provide only basic access logs while hiding detailed audit logs behind expensive license tiers.
2. What to Log: Security Events
The following categories of events must be logged for security purposes. This is not optional instrumentation for debugging — this is mandatory security telemetry.
Authentication Events
| Event | Details to Log | Why |
|---|---|---|
| Successful login | User ID, timestamp, source IP, authentication method (password, SSO, MFA), user agent | Baseline for anomaly detection; evidence of account access |
| Failed login | Attempted user ID, timestamp, source IP, failure reason (bad password, account locked, MFA failed) | Brute force detection, credential stuffing detection |
| Account lockout | User ID, timestamp, lockout trigger (failed attempts count), source IP | Attack detection, false positive analysis |
| Password change | User ID, timestamp, change method (self-service, admin, reset) | Account takeover detection |
| MFA events | User ID, timestamp, MFA method, success/failure, MFA enrollment/unenrollment | MFA bypass detection, enrollment anomalies |
| Token issuance | User ID, timestamp, token type (JWT, session, API key), expiration, scope | Token theft detection, scope analysis |
| Token revocation | User ID, timestamp, token ID, revocation reason | Incident response verification |
| Session creation/termination | User ID, session ID, timestamp, duration, termination reason (logout, timeout, forced) | Session hijacking detection |
Authorization Events
| Event | Details to Log | Why |
|---|---|---|
| Access granted | User ID, resource, action, timestamp, authorization rule applied | Access pattern baseline, compliance audit |
| Access denied | User ID, resource, action, timestamp, denial reason | Privilege escalation attempts, misconfiguration detection |
| Privilege change | User ID, old role/permissions, new role/permissions, changed by whom, timestamp | Unauthorized privilege escalation detection |
| Role assignment/removal | User ID, role, assigned/removed by whom, timestamp | Access governance audit |
| Permission delegation | Delegator, delegate, permissions, expiration, timestamp | Delegation abuse detection |
Data Access Events
| Event | Details to Log | Why |
|---|---|---|
| Sensitive data read | User ID, data type (e.g., PII, financial), record identifiers, timestamp, access method | Data exfiltration detection, compliance audit |
| Sensitive data create/update | User ID, data type, record identifiers, timestamp, fields modified (not values) | Data integrity audit |
| Sensitive data delete | User ID, data type, record identifiers, timestamp, soft/hard delete | Data destruction audit |
| Bulk data export | User ID, data type, record count, export format, destination, timestamp | Mass data exfiltration detection |
| Data sharing/transfer | User ID, recipient, data type, record count, transfer method, timestamp | Unauthorized sharing detection |
Administrative Events
| Event | Details to Log | Why |
|---|---|---|
| Configuration change | User ID, setting name, old value, new value, timestamp | Unauthorized configuration change detection |
| User management | Admin user ID, target user ID, action (create, disable, delete), timestamp | Account management audit |
| Feature flag change | User ID, flag name, old value, new value, timestamp | Security control change detection |
| API key management | User ID, key ID (not the key itself), action (create, rotate, revoke), scope, timestamp | Key management audit |
| Deployment event | Deployer, artifact version, environment, timestamp, deployment method | Change tracking, rollback reference |
Security-Relevant Business Events
Application-specific events that have security implications:
- Financial transactions above a threshold.
- Approval workflow actions (approve, reject, delegate).
- Data classification changes.
- Consent management actions (grant, revoke).
- Report generation for sensitive data.
- External integrations activated or modified.
API Access Events
| Event | Details to Log | Why |
|---|---|---|
| API request | Timestamp, client ID, endpoint, HTTP method, source IP, user agent, request ID | API abuse detection, rate limit monitoring |
| API response | Timestamp, request ID, status code, response time | Error pattern detection, performance anomalies |
| Rate limit events | Client ID, endpoint, limit, current count, action taken (throttle, block) | DDoS/abuse detection |
| API authentication failure | Client ID, endpoint, failure reason, source IP | Credential compromise detection |
Error Events
| Event | Details to Log | Why |
|---|---|---|
| Unhandled exception | Timestamp, exception type, stack trace (sanitized), request context | Security control failure detection |
| Security control failure | Timestamp, control name, failure mode, affected request | Defense degradation detection |
| Input validation failure | Timestamp, endpoint, validation rule, input type (not the input value if it could contain attack payloads with PII) | Attack pattern detection |
| Dependency failure | Timestamp, dependency name, failure type, impact | Supply chain compromise indicator |
3. What NOT to Log
Logging too much sensitive data creates its own security risk. Log files become high-value targets if they contain credentials, payment data, or PII. The following must never appear in logs:
Absolute Prohibitions
Passwords: Never log passwords, whether plaintext, hashed, or encrypted. Not in authentication logs, not in debug logs, not in error logs. If a password appears in a stack trace, sanitize the stack trace before logging.
Session tokens, API keys, and secrets: Never log authentication tokens, session identifiers, API keys, bearer tokens, or any other secret material. Log a reference identifier (e.g., last 4 characters of a token hash) if you need to correlate events to a specific token.
Credit card numbers: PCI-DSS explicitly prohibits logging full primary account numbers (PANs). At most, log the first 6 and last 4 digits (BIN and last 4).
Social Security Numbers: Never log SSNs. If you need to reference a specific individual in a log, use an internal user ID.
PHI/ePHI: HIPAA restricts logging of protected health information. Log the minimum necessary for security purposes — typically a patient ID or encounter ID, never diagnosis, treatment, or clinical details.
Full PII beyond necessity: Log the minimum PII necessary for identification. A user ID or email address for audit trail purposes is acceptable. Logging full names, addresses, phone numbers, dates of birth, or biometric data is not necessary for security logging and creates risk.
Encryption keys: Never log encryption keys, key material, or initialization vectors. If you need to reference a key, log a key identifier.
Sanitization Practices
- Implement a logging sanitizer that automatically detects and redacts patterns matching credit card numbers, SSNs, email addresses (where not needed), and common secret formats.
- Use structured logging with typed fields so sanitization can be applied by field type rather than pattern matching.
- Review log output as part of code review, specifically checking for sensitive data leakage.
- Periodically audit log stores for sensitive data that may have been logged inadvertently.
4. Log Format and Structure
Structured Logging
JSON is the preferred log format. Plain text logs (unstructured) require complex regex parsing for each log source, break when formats change, and resist automated analysis. JSON logs are machine-parseable, self-describing, and extensible.
Example: Unstructured (bad):
2026-03-19 14:32:15 INFO UserController - User john.doe@example.com logged in from 10.0.1.50 using SSO
Example: Structured (good):
{
"timestamp": "2026-03-19T14:32:15.123Z",
"level": "INFO",
"event_type": "authentication.login.success",
"service": "user-service",
"version": "2.3.1",
"environment": "production",
"trace_id": "abc123def456",
"actor": {
"user_id": "usr_9f8e7d6c",
"type": "human"
},
"action": "login",
"outcome": "success",
"details": {
"auth_method": "sso",
"mfa_used": true,
"source_ip": "10.0.1.50",
"user_agent": "Mozilla/5.0..."
}
}
Standard Fields
Every security log event should include these fields:
| Field | Format | Description |
|---|---|---|
timestamp | ISO 8601 UTC (e.g., 2026-03-19T14:32:15.123Z) | When the event occurred. Always UTC to avoid timezone confusion during cross-system correlation. |
level | Enum: ERROR, WARN, INFO, DEBUG | Severity of the log entry. Security events should be at INFO minimum. |
event_type | Dotted notation (e.g., authentication.login.failure) | Categorized event type for filtering and alerting. |
service | String | The service that generated the event. |
version | SemVer | The version of the service. Critical for correlating events with deployments. |
environment | Enum: production, staging, development | The environment. |
trace_id | String (UUID or distributed trace ID) | Correlation ID linking this event to a request trace. |
actor | Object: { user_id, type } | Who performed the action. Type is human, service, system, or ai. |
action | String | What was done. |
target | Object: { type, id } | What was acted upon. |
outcome | Enum: success, failure, error | Result of the action. |
source_ip | String | IP address of the requestor. |
details | Object | Event-specific additional details. |
Log Levels for Security Events
| Level | Use For Security | Example |
|---|---|---|
| ERROR | Security control failure, unhandled exception with security implication | WAF bypass detected, encryption service unavailable |
| WARN | Suspicious but not confirmed malicious activity | Multiple failed login attempts, unusual API call pattern |
| INFO | Normal security events that need to be recorded | Successful login, access granted, configuration change |
| DEBUG | Detailed security context for investigation | Full request/response details (sanitized), detailed authorization decision chain |
Security events must be logged at INFO level minimum. Do not log security events at DEBUG level because DEBUG logging is typically disabled in production. If you log a security event at DEBUG, you lose visibility in production — precisely when you need it most.
Immutable Logging
Logs must be tamper-evident to have forensic value. An attacker who compromises a system will attempt to modify or delete logs to cover their tracks.
Append-only storage: Use log storage that supports append-only writes. Once written, log entries cannot be modified or deleted by the application that created them.
Separate log infrastructure: Send logs to a separate system from the application. The application should not have delete or modify access to its own logs.
Cryptographic integrity: Consider log integrity verification using hash chains (each log entry includes a hash of the previous entry) or digital signatures.
Retention locks: Cloud providers offer immutable storage with retention locks (AWS S3 Object Lock, Azure Blob Storage immutability policies). Configure these for log buckets.
5. Log Aggregation and SIEM Integration
Individual application logs are useful for debugging. Aggregated, correlated logs across all applications and infrastructure are essential for security.
Centralized Log Collection
All application logs must flow to a centralized platform where they can be searched, correlated, and analyzed.
Platform options:
| Platform | Type | Strengths |
|---|---|---|
| Elastic (ELK/EFK) | Open-source / commercial | Flexible, powerful search, large community, self-hosted option |
| Splunk | Commercial | Enterprise standard, powerful SPL query language, extensive app ecosystem |
| Datadog | SaaS | Integrated APM + logs + security, easy setup, good for cloud-native |
| Sumo Logic | SaaS | Cloud-native, ML-powered analytics, compliance dashboards |
| Grafana Loki | Open-source | Cost-effective, label-based indexing, excellent Grafana integration |
| CrowdStrike LogScale (Humio) | Commercial | Real-time streaming, minimal indexing, excellent for high-volume |
| Microsoft Sentinel | SaaS | Native Azure integration, SOAR capabilities, AI-powered detection |
| Google Chronicle | SaaS | Massive scale, 12-month hot retention, YARA-L detection rules |
Log Shipping Methods
Agent-based: Deploy a log shipping agent on each host (Filebeat, Fluentd, Fluent Bit, Vector, Datadog Agent). The agent reads log files, applies parsing and filtering, and ships to the central platform.
Advantages: Works with any log format, handles backpressure and buffering, supports log rotation. Disadvantages: Requires agent deployment and management, consumes host resources.
Direct API: Application sends logs directly to the central platform’s API (e.g., Datadog’s log intake API, Splunk’s HEC).
Advantages: No agent to manage, immediate delivery, structured data preserved. Disadvantages: Application must handle delivery failures, buffering, and retries. Tight coupling to the logging platform.
Syslog (RFC 5424): Traditional syslog forwarding, either via UDP (unreliable) or TCP/TLS (reliable).
Advantages: Universal support, no vendor lock-in, works with legacy systems. Disadvantages: Limited structure, message size limits, requires parsing at the destination.
Cloud-native: Use cloud provider log services as the initial collection point (AWS CloudWatch Logs, Azure Monitor Logs, GCP Cloud Logging), then forward to the central SIEM.
Advantages: Native integration, minimal configuration, handles scale automatically. Disadvantages: Cloud-vendor-specific, additional cost for forwarding, potential latency.
Retention Policies
Log retention must balance compliance requirements, investigation needs, and storage costs:
| Requirement Source | Minimum Retention |
|---|---|
| PCI-DSS (Requirement 10.7) | 12 months, with 3 months immediately accessible |
| HIPAA | 6 years for audit logs |
| SOX | 7 years |
| SOC 2 | 1 year (typical) |
| FedRAMP | 1 year minimum, 3 years for AU-11 |
| GDPR | As long as necessary for the purpose (minimize) |
| General recommendation | 90 days hot (searchable), 1 year warm (retrievable), 7 years cold (archived) |
Log Correlation and Enrichment
Raw logs become actionable intelligence through correlation and enrichment:
Correlation: Linking events across multiple sources using shared identifiers (trace ID, user ID, session ID, source IP). A failed login attempt on the authentication service, followed by a successful login from a different IP, followed by an unusual data export — each event is unremarkable alone, but correlated they indicate account compromise.
Enrichment: Adding context to log events from external sources:
- GeoIP: Resolve IP addresses to geographic locations.
- Threat intelligence: Flag known-malicious IPs, domains, or user agents.
- Asset inventory: Add asset criticality and ownership information.
- User directory: Add user role, department, and last-known-good access patterns.
- Vulnerability data: Correlate with known vulnerabilities in the affected service.
6. Security Monitoring and Alerting
Logs without monitoring are just expensive storage. Security monitoring transforms log data into actionable detection and response.
Real-Time Alerting on Critical Security Events
Configure alerts for events that require immediate human attention:
Critical alerts (page the on-call):
- Multiple failed logins followed by a success (credential compromise indicator).
- Admin account login from an unusual location or at an unusual time.
- Bulk data export or access pattern anomaly.
- Security control failure (WAF down, encryption service unavailable).
- Privilege escalation: user gains admin role.
- Known attack signature detected (SQL injection, SSRF, path traversal).
High alerts (notify during business hours):
- New API key created for a service account.
- Configuration change to security-relevant settings.
- Failed access to sensitive resources above threshold.
- Unusual deployment activity (deployment outside normal windows).
- Certificate or secret approaching expiration.
Informational (dashboard only):
- Login/logout activity summary.
- API usage trends.
- Error rate trends.
- Dependency health status.
Anomaly Detection
Rule-based alerting catches known attack patterns. Anomaly detection catches unknown patterns by establishing behavioral baselines and alerting on deviations.
Behavioral baselines:
- Normal API call volume per user/service over time.
- Normal data access patterns (which users access which data, when, how much).
- Normal authentication patterns (login times, locations, devices).
- Normal deployment patterns (who deploys, when, how often).
Deviation detection:
- Statistical: Alert when a metric exceeds N standard deviations from the baseline.
- ML-based: Machine learning models trained on historical behavior that score each event’s anomaly probability.
- Peer comparison: Alert when a user’s behavior differs significantly from their peer group.
Alert Fatigue Management
Alert fatigue is the single biggest threat to security monitoring effectiveness. When analysts receive too many alerts, they start ignoring all of them — including the real ones.
Tuning:
- Regularly review alert volumes and false positive rates.
- Suppress or reduce severity of alerts with consistently high false positive rates.
- Adjust thresholds based on actual observed behavior, not vendor defaults.
- Remove duplicate alerts that fire from multiple systems for the same event.
Prioritization:
- Use risk-based priority: asset criticality times alert confidence times threat relevance.
- Auto-close alerts that correlate with known benign activity (e.g., scheduled jobs, authorized pen tests).
- Group related alerts into a single incident rather than alerting on each component separately.
Progressive alerting:
- First occurrence: log only.
- Repeated occurrence within window: create a ticket.
- Sustained occurrence: page the on-call.
- This prevents single anomalous events from generating noise while catching persistent issues.
7. Application-Level Monitoring
Runtime Application Self-Protection (RASP)
RASP instruments the application runtime to detect and block attacks from inside the application:
- Intercepts function calls related to common vulnerability types (SQL execution, file system access, command execution, deserialization).
- Validates inputs at the point of use, not just at the perimeter.
- Can operate in monitoring mode (detect and log) or blocking mode (detect, log, and block).
- Provides application-level context that network-based tools cannot see.
Tools: Contrast Security, Sqreen (now Datadog Application Security), Hdiv Security, OpenRASP.
Consideration: RASP adds runtime overhead (typically 2-5% latency). Evaluate the performance impact for latency-sensitive applications. RASP is most valuable for legacy applications that are difficult to patch or for applications with a high attack surface.
Application Performance Monitoring (APM) with Security Context
Modern APM tools provide distributed tracing, error tracking, and performance monitoring. When integrated with security context, APM becomes a security tool:
- Distributed tracing: Follow a request through every service it touches. When a security event occurs, the trace shows the complete request path, making it possible to identify the entry point, propagation path, and impact scope.
- Error tracking with security classification: Classify errors as security-relevant or operational. A NullPointerException is operational. An InvalidCipherTextException may be security-relevant.
- Performance anomaly detection: Sudden performance changes can indicate security issues. A database query that suddenly takes 10x longer may indicate a SQL injection time-based attack. An API endpoint that suddenly receives 100x normal traffic may be under attack.
Synthetic Monitoring for Security Controls
Use synthetic monitoring to verify that security controls are functioning:
- Authentication endpoint monitoring: Regularly verify that login pages respond correctly, MFA is enforced, and account lockout works.
- Authorization boundary testing: Synthetic requests that verify access controls are enforced (attempt to access resource A with credentials for resource B, expect 403).
- Security header verification: Monitor response headers to ensure Content-Security-Policy, Strict-Transport-Security, X-Frame-Options, and other security headers are present.
- Certificate monitoring: Verify TLS certificate validity and warn before expiration.
- WAF health monitoring: Send known-bad requests and verify they are blocked.
8. AI Audit Logging Requirements
The integration of AI tools into the development lifecycle creates new logging requirements. Organizations must track AI usage, AI-generated code provenance, and AI-related security events.
What to Log for AI
AI code generation events:
{
"timestamp": "2026-03-19T14:32:15.123Z",
"event_type": "ai.code_generation.completion",
"actor": {
"user_id": "usr_9f8e7d6c",
"type": "human"
},
"ai_tool": {
"name": "github-copilot",
"version": "1.234.0",
"model": "gpt-4o-2025-05-13"
},
"action": "code_suggestion",
"outcome": "accepted",
"details": {
"language": "python",
"file_path": "src/api/auth.py",
"lines_generated": 15,
"suggestion_id": "sug_abc123",
"confidence_score": 0.87,
"security_sensitive_area": true,
"review_status": "pending"
}
}
AI suggestion acceptance tracking:
- Track the acceptance rate of AI suggestions by developer, team, language, and code area.
- Track whether accepted suggestions are later modified during code review.
- Track whether accepted suggestions introduce defects (security or functional).
AI tool usage patterns:
- Which AI tools are being used.
- How frequently each tool is used.
- What types of tasks each tool is used for.
- Usage patterns that deviate from approved tool policies.
Shadow AI detection:
- Network-level monitoring for connections to known AI service endpoints that are not on the approved list.
- DNS monitoring for AI service domain lookups.
- Proxy/CASB logs for AI service API calls.
- DLP monitoring for code or data being submitted to unapproved AI services.
AI tool API calls and data submitted:
- Log the API calls made to AI services (endpoint, method, request size).
- Log metadata about the data submitted (data type, classification, size) without logging the data itself.
- Log the AI service’s response metadata (response size, latency, error codes).
- Alert on submissions that may contain sensitive data (PII patterns, secret patterns, proprietary code markers).
AI-attributable defects post-deployment:
- Track which production defects were introduced in AI-generated code.
- Compare defect rates between AI-generated and human-written code.
- Track defect types to identify patterns (e.g., “AI consistently generates insecure deserialization patterns”).
- Feed this data back into AI tool configuration and developer training.
AI for Monitoring: AIOps
AI is not just something to monitor — it is increasingly the tool doing the monitoring.
ML-based anomaly detection (e.g., Datadog Watchdog): AIOps platforms use machine learning to:
- Automatically detect anomalies in metrics, logs, and traces.
- Correlate anomalies across services to identify root causes.
- Predict future issues based on trends (e.g., “disk will be full in 3 days at current growth rate”).
- Reduce alert noise by clustering related alerts and suppressing known benign anomalies.
AI-powered log analysis:
- Natural language query interfaces (“show me all failed login attempts from external IPs in the last 24 hours”).
- Automatic pattern extraction from unstructured log data.
- Summarization of long log sequences into human-readable narratives.
- Identification of rare or novel log entries that may indicate security events.
AI-powered threat detection:
- User and Entity Behavior Analytics (UEBA): ML models that establish normal behavior patterns for each user and entity, alerting on deviations.
- Attack chain detection: AI that identifies multi-step attacks across multiple log sources.
- Threat hunting assistance: AI that suggests investigation queries based on observed indicators.
Predictive alerting:
- Predict security incidents before they occur based on leading indicators.
- Identify conditions that historically preceded breaches (e.g., unusual account creation patterns often precede insider threats).
- Forecast vulnerability exploitation likelihood based on environmental factors.
9. Compliance Requirements
PCI-DSS (Requirement 10)
PCI-DSS Requirement 10 specifies detailed logging requirements for cardholder data environments:
- 10.2: Log all individual user access to cardholder data.
- 10.2.1: Log all individual user access to cardholder data.
- 10.2.2: Log all actions taken by any individual with root or administrative privileges.
- 10.2.3: Log access to all audit trails.
- 10.2.4: Log invalid logical access attempts.
- 10.2.5: Log use of and changes to identification and authentication mechanisms.
- 10.2.6: Log initialization, stopping, or pausing of audit logs.
- 10.2.7: Log creation and deletion of system-level objects.
- 10.3: Record specific details for each auditable event (user ID, type, date/time, success/failure, origination, identity/name of affected data/component).
- 10.5: Secure audit trails so they cannot be altered.
- 10.7: Retain audit trail history for at least one year, with a minimum of three months immediately available.
HIPAA (Security Rule)
- 164.312(b): Implement hardware, software, and procedural mechanisms that record and examine activity in information systems that contain or use ePHI.
- 164.308(a)(1)(ii)(D): Implement procedures to regularly review records of information system activity (audit logs, access reports, security incident tracking reports).
- Minimum necessary principle: Log only the minimum ePHI necessary for security purposes.
SOC 2 (Trust Services Criteria)
- CC7.2: The entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity’s ability to meet its objectives; anomalies are analyzed to determine whether they represent security events.
- CC7.3: The entity evaluates security events to determine whether they could or have resulted in a failure of the entity to meet its objectives (security incidents) and, if so, takes actions to prevent or address such failures.
- Logging must demonstrate continuous monitoring and detection capability.
FedRAMP
- AU-2: Auditable Events: Define events to be audited and the content of audit records.
- AU-3: Content of Audit Records: Ensure records contain sufficient information.
- AU-6: Audit Review, Analysis, and Reporting: Review and analyze system audit records regularly.
- AU-9: Protection of Audit Information: Protect audit information and tools from unauthorized access, modification, and deletion.
- AU-11: Audit Record Retention: Retain audit records for a defined period.
- AU-12: Audit Generation: Generate audit records for defined auditable events.
- Continuous monitoring requirement: FedRAMP requires ongoing monitoring with monthly reporting.
10. Implementation Guide
Logging Library Selection
Choose a structured logging library for each language in your stack:
| Language | Library | Features |
|---|---|---|
| Java | Logback + SLF4J + Logstash Encoder | Structured JSON output, MDC for context propagation |
| Python | structlog or python-json-logger | Native structured logging, context binding |
| Node.js | Pino or Winston | High-performance JSON logging, child loggers for context |
| Go | Zap or Zerolog | Zero-allocation structured logging |
| .NET | Serilog | Structured logging with typed properties |
| Ruby | Semantic Logger | Structured, high-performance, built-in sanitization |
Implementation Checklist
Phase 1: Foundation (Week 1-2)
- Select and configure structured logging library for each service.
- Define standard field schema (timestamp, event_type, actor, action, target, outcome).
- Implement log sanitization to prevent sensitive data leakage.
- Configure authentication event logging (login, logout, failure, lockout, MFA).
- Configure authorization event logging (access granted, denied, privilege changes).
Phase 2: Coverage (Week 3-4)
- Implement data access event logging for sensitive data operations.
- Implement administrative event logging (config changes, user management).
- Implement API access logging (requests, responses, rate limit events).
- Implement error event logging with security classification.
- Configure log shipping to centralized platform.
Phase 3: Monitoring (Week 5-6)
- Create alerting rules for critical security events.
- Configure anomaly detection baselines.
- Build security monitoring dashboards.
- Implement synthetic monitoring for security controls.
- Configure alert routing and escalation.
Phase 4: AI and Compliance (Week 7-8)
- Implement AI tool usage logging.
- Configure shadow AI detection monitoring.
- Verify compliance coverage (PCI-DSS, HIPAA, SOC 2, FedRAMP as applicable).
- Configure log retention policies and immutable storage.
- Document the logging architecture for audit purposes.
11. OWASP Logging Cheat Sheet
The OWASP Logging Cheat Sheet provides a comprehensive reference for application security logging. Key recommendations:
- Use a common logging framework across all applications in the organization.
- Define log levels consistently and enforce them through code review.
- Include sufficient context in each log entry to support investigation without requiring access to the application.
- Protect log data with appropriate access controls, encryption in transit and at rest, and integrity verification.
- Do not log sensitive data (passwords, tokens, PII beyond necessity).
- Log security events at the appropriate level (not DEBUG).
- Ensure logging does not introduce vulnerabilities (log injection, CRLF injection, XXE in log parsing).
- Test logging as part of the application’s test suite — verify that security events generate the expected log entries.
- Monitor logs in near-real-time for security events.
- Review and update logging regularly as the application evolves.
Log Injection Prevention
A commonly overlooked vulnerability: if user-supplied data is included in log entries without sanitization, attackers can inject malicious content into logs.
Risks:
- Log forging: Attacker injects fake log entries that look legitimate, potentially covering tracks or framing other users.
- CRLF injection: Attacker injects newline characters to create new log entries.
- Log parser exploitation: If logs are processed by tools that interpret special characters (e.g., ANSI escape codes, XML/HTML parsers), injected content can exploit those parsers.
Mitigations:
- Use structured logging (JSON) — the logging framework handles escaping.
- If using plain text logs, sanitize user input before including it in log entries (encode or remove newlines, tabs, and control characters).
- Never allow user input to control the log level or event type.
- Validate log entry content length to prevent log flooding.
Key Takeaways
- Application security logging is mandatory infrastructure, not optional instrumentation. Without it, detection, investigation, compliance, and forensics all fail.
- Log what matters, protect what is sensitive. Comprehensive security event logging with strict exclusion of passwords, tokens, PII, and secrets.
- Structure enables automation. JSON-structured logs with standard fields enable automated parsing, correlation, alerting, and investigation at scale.
- Monitoring without tuning creates alert fatigue. Alert fatigue is the most common failure mode of security monitoring. Invest in tuning, prioritization, and progressive alerting.
- AI audit logging is a new requirement. Track AI tool usage, AI-generated code provenance, shadow AI, and AI-attributable defects. This is the new dimension of application security logging.
- Logs must be immutable and retained per compliance requirements. Append-only storage, separate infrastructure, and defined retention policies are non-negotiable.
- Test your logging. Verify that security events generate expected log entries as part of your test suite. Logging that breaks silently is worse than no logging.
Practical Exercise
Exercise 1: Logging Design Review Review the logging in one of your team’s services. For each category in Section 2, determine whether the events are logged, whether they include sufficient detail, and whether any sensitive data is being logged that should not be. Create a gap analysis document and prioritize the missing logging by security value.
Exercise 2: Alert Fatigue Assessment Audit your current security alerts for the past 30 days. For each alert rule:
- How many times did it fire?
- What percentage were true positives?
- What was the average response time?
- Was the alert actionable (did the responder know what to do)? Identify the three noisiest alerts and propose tuning strategies.
Exercise 3: AI Logging Design Design the logging schema for AI tool usage in your organization. Define the events to log, the fields for each event, the retention policy, and three monitoring rules that would detect concerning AI usage patterns.
References
- CIS Controls v8, Safeguard 16.1: Establish and Maintain a Secure Application Development Process
- CISA Secure by Design Goal 7: Evidence of Intrusions
- OWASP Logging Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html
- OWASP Application Logging Vocabulary: https://owasp.org/www-project-application-logging-vocabulary/
- NIST SP 800-92: Guide to Computer Security Log Management
- PCI-DSS v4.0, Requirement 10: Log and Monitor All Access to System Components and Cardholder Data
- HIPAA Security Rule, 164.312(b): Audit Controls
- RFC 5424: The Syslog Protocol
Study Guide
Key Takeaways
- Security events at INFO level minimum — DEBUG is typically disabled in production; logging security events at DEBUG loses visibility when you need it most.
- Never log passwords, tokens, or PII beyond necessity — Log files become high-value targets if they contain credentials or sensitive data.
- JSON structured logging enables automation — Machine-parseable, self-describing, extensible; prevents log injection through framework-handled escaping.
- Alert fatigue is the biggest monitoring threat — Analysts receiving too many alerts start ignoring all of them including real threats.
- Progressive alerting reduces noise — First occurrence: log only; repeated: create ticket; sustained: page on-call.
- AI audit logging is a new requirement — Track AI tool usage, code generation events, acceptance rates, and AI-attributable defects.
- CISA Secure by Design Goal 7 — Software manufacturers must provide intrusion-detection logs at no extra charge.
Important Definitions
| Term | Definition |
|---|---|
| RASP | Runtime Application Self-Protection — instruments application runtime to detect/block attacks; 2-5% latency overhead |
| Alert Fatigue | Analysts ignoring all alerts including real threats due to excessive alert volume |
| Progressive Alerting | Graduated response: log only -> ticket -> page, based on occurrence frequency |
| Log Injection | Attacker injecting malicious content into log entries through unsanitized user input |
| Immutable Logging | Append-only storage preventing modification or deletion of log entries |
| Structured Logging | JSON-format logs with typed fields enabling automated parsing and correlation |
| UEBA | User and Entity Behavior Analytics — ML models detecting deviations from normal behavior |
| Shadow AI Detection | Network/DNS/DLP monitoring for connections to unapproved AI service endpoints |
| Trace ID | Correlation identifier linking events across multiple services in a request path |
| CSAF | Common Security Advisory Framework — OASIS standard for machine-readable security advisories |
Quick Reference
- Retention Requirements: PCI-DSS 1 year (3 months hot), HIPAA 6 years, SOX 7 years, SOC 2 1 year, FedRAMP 1-3 years
- Standard Log Fields: timestamp (ISO 8601 UTC), level, event_type, service, version, environment, trace_id, actor, action, outcome, source_ip
- Never Log: Passwords, session tokens, API keys, credit card numbers, SSNs, PHI details, encryption keys
- SIEM Platforms: Elastic, Splunk, Datadog, Sumo Logic, Grafana Loki, CrowdStrike LogScale, Sentinel, Chronicle
- Common Pitfalls: Security events at DEBUG level, logging sensitive data, unstructured plain text logs, no alert tuning, missing AI audit logging, no log integrity verification
Review Questions
- Why must security events be logged at INFO minimum and not DEBUG, and what happens in production when DEBUG is disabled?
- Design a structured JSON logging schema for authentication events including all fields necessary for correlation and investigation.
- Your monitoring has 200 alerts per day with 85% false positives — describe a strategy to reduce alert fatigue while maintaining detection.
- What AI audit logging events should be captured, and how do these logs enable tracking AI-attributable defect rates?
- Explain log injection, provide an example attack, and describe how structured JSON logging prevents it.