Security operations centers (SOCs) are overwhelmed with alerts. Automation is no longer optionalβit's essential. This lesson covers security automation, orchestration, and SOAR (Security Orchestration, Automation, and Response) platforms.
Typical SOC challenges: - 10,000+ alerts per day - 99% are false positives or low priority - Analysts spend 80% of time on repetitive tasks - Average time to investigate: 3-5 hours - Critical threats get lost in noise - Analyst burnout and turnover
What automation solves: - Triage alerts automatically - Enrich with threat intelligence - Execute standard response actions - Document everything - Free analysts for complex investigations
Cost savings:
Manual investigation: 3 hours Γ $50/hour = $150
Automated triage: 5 minutes Γ $50/hour = $4.17
Savings per alert: $145.83
With 1,000 alerts/day:
Annual savings: $145.83 Γ 1000 Γ 365 = $53+ million
Efficiency gains: - Respond in seconds vs. hours - Consistent response quality - 24/7 operation - Scale without hiring
Alert enrichment: - IP reputation lookup - Domain age check - VirusTotal scan - WHOIS lookup - Geolocation - Historical activity
Containment actions: - Block IP at firewall - Disable user account - Isolate infected host - Quarantine email - Reset password
Investigation automation: - Query logs automatically - Correlate events - Extract IOCs - Search threat intel - Generate timeline
Response orchestration: - Create ticket - Send notifications - Execute playbooks - Update case management - Generate reports
Level 4: Full Automation (autonomous response)
β
Level 3: Orchestration (multi-tool workflows)
β
Level 2: Scripting (single-tool automation)
β
Level 1: Manual (analyst does everything)
Start at Level 2, progress to Level 4.
Example: Automated alert enrichment
#!/usr/bin/env python3
import requests
import json
class AlertEnricher:
def __init__(self, vt_api_key):
self.vt_api_key = vt_api_key
def check_ip_reputation(self, ip):
"""Check IP against VirusTotal"""
url = f"https://www.virustotal.com/api/v3/ip_addresses/{ip}"
headers = {"x-apikey": self.vt_api_key}
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
stats = data['data']['attributes']['last_analysis_stats']
return {
'malicious': stats.get('malicious', 0),
'suspicious': stats.get('suspicious', 0),
'total_engines': sum(stats.values()),
'reputation': 'malicious' if stats.get('malicious', 0) > 0 else 'clean'
}
return None
def get_domain_info(self, domain):
"""Get WHOIS and age info"""
import whois
try:
w = whois.whois(domain)
return {
'registrar': w.registrar,
'creation_date': str(w.creation_date),
'expiration_date': str(w.expiration_date),
'age_days': (datetime.now() - w.creation_date).days if w.creation_date else None
}
except:
return None
def enrich_alert(self, alert):
"""Enrich alert with contextual information"""
enriched = alert.copy()
# Check source IP
if 'src_ip' in alert:
enriched['src_ip_reputation'] = self.check_ip_reputation(alert['src_ip'])
# Check destination IP
if 'dst_ip' in alert:
enriched['dst_ip_reputation'] = self.check_ip_reputation(alert['dst_ip'])
# Check domain
if 'domain' in alert:
enriched['domain_info'] = self.get_domain_info(alert['domain'])
# Calculate risk score
enriched['risk_score'] = self.calculate_risk_score(enriched)
return enriched
def calculate_risk_score(self, alert):
"""Calculate risk score based on enrichment data"""
score = 0
# IP reputation
if alert.get('src_ip_reputation', {}).get('malicious', 0) > 0:
score += 50
# New domain
domain_age = alert.get('domain_info', {}).get('age_days')
if domain_age and domain_age < 30:
score += 30
# Severity
if alert.get('severity') == 'critical':
score += 20
return min(score, 100) # Cap at 100
# Usage
enricher = AlertEnricher(vt_api_key='your_api_key')
alert = {
'id': 'ALT-12345',
'severity': 'high',
'src_ip': '198.51.100.42',
'dst_ip': '203.0.113.10',
'domain': 'suspicious-domain.com'
}
enriched_alert = enricher.enrich_alert(alert)
print(json.dumps(enriched_alert, indent=2))
Example: Auto-block malicious IPs
#!/usr/bin/env python3
import subprocess
import logging
from datetime import datetime
class AutoResponder:
def __init__(self, log_file='auto_response.log'):
logging.basicConfig(
filename=log_file,
level=logging.INFO,
format='%(asctime)s - %(message)s'
)
self.logger = logging
def block_ip(self, ip, reason):
"""Block IP at firewall"""
try:
# Using iptables
cmd = f"sudo iptables -I INPUT 1 -s {ip} -j DROP"
subprocess.run(cmd.split(), check=True)
self.logger.info(f"BLOCKED IP: {ip} - Reason: {reason}")
# Add to blocklist file
with open('/etc/security/blocklist.txt', 'a') as f:
f.write(f"{ip}\t{datetime.now()}\t{reason}\n")
return True
except Exception as e:
self.logger.error(f"Failed to block {ip}: {e}")
return False
def disable_user_account(self, username, reason):
"""Disable compromised user account"""
try:
# Linux
subprocess.run(['sudo', 'passwd', '-l', username], check=True)
self.logger.info(f"DISABLED ACCOUNT: {username} - Reason: {reason}")
# Send notification
self.send_notification(
f"User account {username} has been disabled due to: {reason}"
)
return True
except Exception as e:
self.logger.error(f"Failed to disable {username}: {e}")
return False
def isolate_host(self, hostname, reason):
"""Isolate compromised host"""
try:
# This would integrate with your network management
# Example: modify firewall rules, change VLAN, etc.
self.logger.info(f"ISOLATED HOST: {hostname} - Reason: {reason}")
# Create incident ticket
self.create_ticket(
title=f"Host Isolation: {hostname}",
description=f"Automatically isolated due to: {reason}",
priority='high'
)
return True
except Exception as e:
self.logger.error(f"Failed to isolate {hostname}: {e}")
return False
def send_notification(self, message):
"""Send notification to security team"""
# Integrate with Slack, email, SMS, etc.
print(f"NOTIFICATION: {message}")
def create_ticket(self, title, description, priority):
"""Create incident ticket"""
# Integrate with JIRA, ServiceNow, etc.
print(f"TICKET CREATED: {title} - {priority}")
# Usage
responder = AutoResponder()
# Automated response to brute force
if failed_login_count > 10:
responder.block_ip(source_ip, "Brute force attempt")
responder.create_ticket(
title=f"Brute force from {source_ip}",
description=f"Blocked after {failed_login_count} failed attempts",
priority='medium'
)
Security Orchestration, Automation, and Response
Components: 1. Orchestration - Connect multiple tools 2. Automation - Execute actions automatically 3. Response - Standardized playbooks 4. Case Management - Track investigations 5. Threat Intelligence - Integrate feeds
Popular SOAR platforms: - Splunk Phantom (now SOAR) - IBM Resilient - Palo Alto Cortex XSOAR - Swimlane - Demisto (acquired by Palo Alto) - TheHive Project (open source)
Automated response workflows
Phishing investigation playbook:
1. Receive phishing alert
β
2. Extract email metadata
- Sender, subject, attachments
- URLs, headers
β
3. Enrich indicators
- Check URLs against threat intel
- Scan attachments with sandbox
- Look up sender reputation
β
4. Assess risk
If malicious:
- Quarantine email across organization
- Block sender
- Add IOCs to blocklist
If suspicious:
- Flag for analyst review
- Request user confirmation
If benign:
- Release from quarantine
- Close case
β
5. Document findings
- Update case
- Generate report
- Close ticket
Malware detection playbook:
1. Malware detected on endpoint
β
2. Gather context
- Process tree
- Network connections
- File hashes
β
3. Containment
- Isolate host from network
- Kill malicious process
- Block C2 domain
β
4. Analysis
- Submit sample to sandbox
- Check threat intelligence
- Identify IOCs
β
5. Hunt for additional infections
- Search for IOCs across environment
- Check other endpoints
β
6. Remediation
- Clean or reimage infected hosts
- Block identified IOCs
- Update signatures
β
7. Recovery
- Restore from backup if needed
- Verify system clean
- Return to production
Common integrations:
SIEM: - Splunk - Elastic - QRadar - ArcSight
EDR: - CrowdStrike - Carbon Black - SentinelOne - Microsoft Defender
Firewall: - Palo Alto - Fortinet - Cisco - pfSense
Threat Intelligence: - VirusTotal - AlienVault OTX - MISP - ThreatConnect
Ticketing: - ServiceNow - JIRA - Remedy - Zendesk
Communication: - Slack - Microsoft Teams - Email - SMS
Open-source SOAR platform
Architecture:
TheHive (Case Management)
β
Cortex (Analyzers & Responders)
β
External Services (VirusTotal, MISP, etc.)
Creating custom analyzer:
#!/usr/bin/env python3
from cortexutils.analyzer import Analyzer
class CustomIPAnalyzer(Analyzer):
def __init__(self):
Analyzer.__init__(self)
self.api_key = self.get_param('config.api_key', None, 'API key is missing')
def summary(self, raw):
"""Generate summary for TheHive"""
taxonomies = []
level = "info"
namespace = "CustomAnalyzer"
predicate = "Reputation"
if raw['reputation'] == 'malicious':
level = "malicious"
value = "Malicious"
else:
value = "Clean"
taxonomies.append(self.build_taxonomy(level, namespace, predicate, value))
return {"taxonomies": taxonomies}
def run(self):
"""Main analysis logic"""
if self.data_type == 'ip':
ip = self.get_data()
# Your analysis logic here
result = {
'ip': ip,
'reputation': self.check_reputation(ip),
'threat_score': self.calculate_threat_score(ip),
'sources': self.query_threat_feeds(ip)
}
self.report(result)
else:
self.error('Invalid data type')
def check_reputation(self, ip):
# Implementation
pass
def calculate_threat_score(self, ip):
# Implementation
pass
def query_threat_feeds(self, ip):
# Implementation
pass
if __name__ == '__main__':
CustomIPAnalyzer().run()
Playbook for incident response:
---
- name: Incident Response Playbook
hosts: affected_hosts
become: yes
tasks:
- name: Isolate host from network
command: ifconfig eth0 down
register: isolation_result
- name: Kill malicious process
shell: pkill -9 -f "{{ malicious_process }}"
ignore_errors: yes
- name: Collect evidence
block:
- name: Dump memory
command: lime-dump /tmp/memory.dump
- name: Collect logs
archive:
path:
- /var/log/syslog
- /var/log/auth.log
dest: /tmp/logs.tar.gz
- name: List processes
shell: ps aux > /tmp/processes.txt
- name: List network connections
shell: netstat -tulpn > /tmp/connections.txt
- name: Transfer evidence
fetch:
src: "{{ item }}"
dest: /evidence/{{ inventory_hostname }}/
flat: yes
with_items:
- /tmp/memory.dump
- /tmp/logs.tar.gz
- /tmp/processes.txt
- /tmp/connections.txt
- name: Clean malware
file:
path: "{{ item }}"
state: absent
with_items:
- /tmp/malware.exe
- /etc/cron.d/malicious
- name: Harden system
include_tasks: hardening.yml
- name: Send notification
slack:
token: "{{ slack_token }}"
msg: "Host {{ inventory_hostname }} has been cleaned and hardened"
channel: '#security-ops'
- name: Update firewall rules
hosts: firewall
tasks:
- name: Block malicious IPs
command: >
iptables -I INPUT 1 -s {{ item }} -j DROP
with_items: "{{ malicious_ips }}"
Detection metrics: - Mean Time to Detect (MTTD) - Alert volume - False positive rate - Detection accuracy
Response metrics: - Mean Time to Respond (MTTR) - Mean Time to Contain (MTTC) - Mean Time to Recover - Automation rate (% automated)
Efficiency metrics: - Alerts triaged automatically - Time saved per alert - Cost per investigation - Analyst productivity
Example dashboard:
Security Operations Dashboard
Alerts Today: 8,432
Automated: 7,891 (94%)
Analyst Review: 541 (6%)
Current Incidents: 12
Critical: 2
High: 5
Medium: 5
MTTR: 45 minutes (target: < 60 min)
MTTD: 12 minutes (target: < 15 min)
Top Alert Sources:
1. Failed Login Attempts (3,221)
2. Malware Detection (2,104)
3. Port Scans (1,876)
4. Data Exfiltration (892)
5. Privilege Escalation (339)
Start small: - Pick high-volume, low-complexity tasks - Automate alert enrichment first - Gradually increase automation
Test thoroughly: - Test in lab environment - Peer review automation logic - Have rollback plan - Document everything
Human in the loop: - Require approval for critical actions - Analyst can override automation - Review automated decisions regularly
Continuous improvement: - Track automation effectiveness - Adjust based on feedback - Update playbooks regularly - Share lessons learned
Over-automation: - Automating complex decisions too early - No human oversight on critical actions - Brittle automation that breaks often
Under-documentation: - No runbooks for automation - Undocumented integrations - No change management
Ignoring false positives: - Automation amplifies bad detection logic - Creates alert fatigue - Wastes resources
Poor error handling: - Automation fails silently - No alerts on automation failures - No fallback to manual process
Automation benefits: - Faster response times - Consistent quality - Free analysts for complex work - Scale without adding headcount - Reduce human error
Start with: - Alert enrichment - Simple containment actions - Reporting automation - Progress to complex playbooks
Success factors: - Executive support - Cross-team collaboration - Proper tooling - Training and documentation - Continuous improvement
Remember: - Automation is journey, not destination - Start small, grow gradually - Measure everything - Keep human in the loop - Continuously improve
Security automation transforms SOC operations from reactive firefighting to proactive threat hunting. Done right, it makes teams more effective, analysts happier, and organizations more secure.