curl --request GET \
--url https://api.example.com/api/v1/postmortems/{name}{
"apiVersion": "v1",
"kind": "PostMortem",
"metadata": {
"name": "PM-20260318-001",
"incident": "INC-20260318-003",
"createdAt": "2026-03-18T23:00:00Z",
"author": "ai-generated",
"reviewedBy": "carlos.silva@empresa.com"
},
"report": {
"title": "OOMKill recorrente no checkout-service",
"severity": "critical",
"duration": "45m",
"detectedAt": "2026-03-18T22:00:00Z",
"resolvedAt": "2026-03-18T22:45:00Z",
"summary": "O checkout-service sofreu multiplos OOMKills durante 45 minutos devido a um memory leak no handler de webhook de pagamento. O incidente afetou 3.2% das requests de checkout e foi resolvido automaticamente pelo operator após aprovação.",
"impact": {
"usersAffected": 1250,
"requestsAffected": "3.2%",
"revenueImpact": "estimado R$ 15.000",
"slosBreached": ["slo-checkout-error-rate"]
},
"rootCause": {
"description": "Memory leak no handler de webhook de pagamento. O handler não fechava o body do HTTP response em caso de erro de validacao, causando acumulo de buffers na heap.",
"category": "code-bug",
"confidence": 0.91
},
"timeline": [
{
"time": "2026-03-18T22:00:00Z",
"event": "Primeiro OOMKill detectado no pod checkout-service-5f8d7c6b4-k9m2n"
},
{
"time": "2026-03-18T22:02:00Z",
"event": "Decision engine inicia análise automatica"
},
{
"time": "2026-03-18T22:03:30Z",
"event": "Root cause identificado: memory leak no webhook handler"
},
{
"time": "2026-03-18T22:05:00Z",
"event": "Runbook 'oomkill-standard' selecionado, aprovação solicitada"
},
{
"time": "2026-03-18T22:08:00Z",
"event": "Aprovação concedida por carlos.silva@empresa.com"
},
{
"time": "2026-03-18T22:10:00Z",
"event": "Memory limit aumentado para 1Gi, pods reiniciando"
},
{
"time": "2026-03-18T22:45:00Z",
"event": "Incidente resolvido — pods estaveis por 30 minutos"
}
],
"remediationApplied": {
"runbook": "runbook-oomkill-standard",
"steps": [
"Diagnostico de memoria via kubectl top",
"Patch de memory limit de 512Mi para 1Gi",
"Verificacao de estabilidade por 30 minutos"
],
"result": "success"
},
"actionItems": [
{
"priority": "P1",
"description": "Corrigir memory leak no handler de webhook — fechar response body em todos os paths de erro",
"assignee": "backend-team",
"status": "open",
"dueDate": "2026-03-22"
},
{
"priority": "P2",
"description": "Adicionar metricas de heap usage no checkout-service para deteccao precoce",
"assignee": "platform-team",
"status": "open",
"dueDate": "2026-03-25"
},
{
"priority": "P3",
"description": "Revisar memory limits de todos os servicos do namespace production",
"assignee": "sre-team",
"status": "open",
"dueDate": "2026-03-30"
}
],
"lessonsLearned": [
"Adicionar linting automático para verificar fechamento de response bodies em handlers HTTP",
"Memory limits devem ser revisados trimestralmente baseado em metricas reais de uso",
"O runbook de OOMKill funcionou bem, mas a aprovação levou 3 minutos — considerar auto-approve para riskScore < 50"
],
"metricSnapshots": [
{"name": "cpu_usage", "value": "0.45", "timestamp": "2026-03-18T21:55:00Z", "phase": "before"},
{"name": "memory_usage", "value": "498000000", "timestamp": "2026-03-18T22:00:00Z", "phase": "during"},
{"name": "memory_usage", "value": "312000000", "timestamp": "2026-03-18T22:30:00Z", "phase": "after"}
],
"blastRadius": [
{"resource": {"kind": "Service", "name": "checkout-svc", "namespace": "production"}, "impact": "5xx responses during pod restarts", "severity": "high"}
],
"gitCorrelation": {
"commitSHA": "a1b2c3d4",
"commitMessage": "feat: add webhook handler for payment notifications",
"author": "dev@empresa.com",
"timestamp": "2026-03-18T19:30:00Z",
"confidence": 0.82,
"filesChanged": ["internal/webhook/handler.go", "internal/webhook/handler_test.go"]
},
"trending": {
"occurrenceCount": 3,
"windowDays": 30,
"relatedPostMortems": ["PM-20260305-001", "PM-20260312-002"],
"pattern": "Recurring oom_kill on Deployment/checkout-service (3 occurrences in 30 days)"
},
"gitOpsContext": "Helm release 'checkout' chart=checkout version=2.3.1 status=deployed revision=42",
"logAnalysisSummary": "1 Go panic stack trace; 12 critical error patterns (resource/connectivity); Primary exception: panic: runtime error: invalid memory address",
"cascadeChain": ["production/checkout-service(root_cause)", "production/api-gateway(victim)"],
"feedback": {
"overrideRootCause": "",
"remediationAccuracy": 4,
"comments": "Good analysis. Consider suggesting AdjustResources before restart next time.",
"providedBy": "sre@company.com",
"providedAt": "2026-03-19T09:00:00Z"
}
}
}
Retorna o postmortem completo de um incidente, incluindo análise de root cause, timeline e action items
curl --request GET \
--url https://api.example.com/api/v1/postmortems/{name}{
"apiVersion": "v1",
"kind": "PostMortem",
"metadata": {
"name": "PM-20260318-001",
"incident": "INC-20260318-003",
"createdAt": "2026-03-18T23:00:00Z",
"author": "ai-generated",
"reviewedBy": "carlos.silva@empresa.com"
},
"report": {
"title": "OOMKill recorrente no checkout-service",
"severity": "critical",
"duration": "45m",
"detectedAt": "2026-03-18T22:00:00Z",
"resolvedAt": "2026-03-18T22:45:00Z",
"summary": "O checkout-service sofreu multiplos OOMKills durante 45 minutos devido a um memory leak no handler de webhook de pagamento. O incidente afetou 3.2% das requests de checkout e foi resolvido automaticamente pelo operator após aprovação.",
"impact": {
"usersAffected": 1250,
"requestsAffected": "3.2%",
"revenueImpact": "estimado R$ 15.000",
"slosBreached": ["slo-checkout-error-rate"]
},
"rootCause": {
"description": "Memory leak no handler de webhook de pagamento. O handler não fechava o body do HTTP response em caso de erro de validacao, causando acumulo de buffers na heap.",
"category": "code-bug",
"confidence": 0.91
},
"timeline": [
{
"time": "2026-03-18T22:00:00Z",
"event": "Primeiro OOMKill detectado no pod checkout-service-5f8d7c6b4-k9m2n"
},
{
"time": "2026-03-18T22:02:00Z",
"event": "Decision engine inicia análise automatica"
},
{
"time": "2026-03-18T22:03:30Z",
"event": "Root cause identificado: memory leak no webhook handler"
},
{
"time": "2026-03-18T22:05:00Z",
"event": "Runbook 'oomkill-standard' selecionado, aprovação solicitada"
},
{
"time": "2026-03-18T22:08:00Z",
"event": "Aprovação concedida por carlos.silva@empresa.com"
},
{
"time": "2026-03-18T22:10:00Z",
"event": "Memory limit aumentado para 1Gi, pods reiniciando"
},
{
"time": "2026-03-18T22:45:00Z",
"event": "Incidente resolvido — pods estaveis por 30 minutos"
}
],
"remediationApplied": {
"runbook": "runbook-oomkill-standard",
"steps": [
"Diagnostico de memoria via kubectl top",
"Patch de memory limit de 512Mi para 1Gi",
"Verificacao de estabilidade por 30 minutos"
],
"result": "success"
},
"actionItems": [
{
"priority": "P1",
"description": "Corrigir memory leak no handler de webhook — fechar response body em todos os paths de erro",
"assignee": "backend-team",
"status": "open",
"dueDate": "2026-03-22"
},
{
"priority": "P2",
"description": "Adicionar metricas de heap usage no checkout-service para deteccao precoce",
"assignee": "platform-team",
"status": "open",
"dueDate": "2026-03-25"
},
{
"priority": "P3",
"description": "Revisar memory limits de todos os servicos do namespace production",
"assignee": "sre-team",
"status": "open",
"dueDate": "2026-03-30"
}
],
"lessonsLearned": [
"Adicionar linting automático para verificar fechamento de response bodies em handlers HTTP",
"Memory limits devem ser revisados trimestralmente baseado em metricas reais de uso",
"O runbook de OOMKill funcionou bem, mas a aprovação levou 3 minutos — considerar auto-approve para riskScore < 50"
],
"metricSnapshots": [
{"name": "cpu_usage", "value": "0.45", "timestamp": "2026-03-18T21:55:00Z", "phase": "before"},
{"name": "memory_usage", "value": "498000000", "timestamp": "2026-03-18T22:00:00Z", "phase": "during"},
{"name": "memory_usage", "value": "312000000", "timestamp": "2026-03-18T22:30:00Z", "phase": "after"}
],
"blastRadius": [
{"resource": {"kind": "Service", "name": "checkout-svc", "namespace": "production"}, "impact": "5xx responses during pod restarts", "severity": "high"}
],
"gitCorrelation": {
"commitSHA": "a1b2c3d4",
"commitMessage": "feat: add webhook handler for payment notifications",
"author": "dev@empresa.com",
"timestamp": "2026-03-18T19:30:00Z",
"confidence": 0.82,
"filesChanged": ["internal/webhook/handler.go", "internal/webhook/handler_test.go"]
},
"trending": {
"occurrenceCount": 3,
"windowDays": 30,
"relatedPostMortems": ["PM-20260305-001", "PM-20260312-002"],
"pattern": "Recurring oom_kill on Deployment/checkout-service (3 occurrences in 30 days)"
},
"gitOpsContext": "Helm release 'checkout' chart=checkout version=2.3.1 status=deployed revision=42",
"logAnalysisSummary": "1 Go panic stack trace; 12 critical error patterns (resource/connectivity); Primary exception: panic: runtime error: invalid memory address",
"cascadeChain": ["production/checkout-service(root_cause)", "production/api-gateway(victim)"],
"feedback": {
"overrideRootCause": "",
"remediationAccuracy": 4,
"comments": "Good analysis. Consider suggesting AdjustResources before restart next time.",
"providedBy": "sre@company.com",
"providedAt": "2026-03-19T09:00:00Z"
}
}
}
PM-20260318-001){
"apiVersion": "v1",
"kind": "PostMortem",
"metadata": {
"name": "PM-20260318-001",
"incident": "INC-20260318-003",
"createdAt": "2026-03-18T23:00:00Z",
"author": "ai-generated",
"reviewedBy": "carlos.silva@empresa.com"
},
"report": {
"title": "OOMKill recorrente no checkout-service",
"severity": "critical",
"duration": "45m",
"detectedAt": "2026-03-18T22:00:00Z",
"resolvedAt": "2026-03-18T22:45:00Z",
"summary": "O checkout-service sofreu multiplos OOMKills durante 45 minutos devido a um memory leak no handler de webhook de pagamento. O incidente afetou 3.2% das requests de checkout e foi resolvido automaticamente pelo operator após aprovação.",
"impact": {
"usersAffected": 1250,
"requestsAffected": "3.2%",
"revenueImpact": "estimado R$ 15.000",
"slosBreached": ["slo-checkout-error-rate"]
},
"rootCause": {
"description": "Memory leak no handler de webhook de pagamento. O handler não fechava o body do HTTP response em caso de erro de validacao, causando acumulo de buffers na heap.",
"category": "code-bug",
"confidence": 0.91
},
"timeline": [
{
"time": "2026-03-18T22:00:00Z",
"event": "Primeiro OOMKill detectado no pod checkout-service-5f8d7c6b4-k9m2n"
},
{
"time": "2026-03-18T22:02:00Z",
"event": "Decision engine inicia análise automatica"
},
{
"time": "2026-03-18T22:03:30Z",
"event": "Root cause identificado: memory leak no webhook handler"
},
{
"time": "2026-03-18T22:05:00Z",
"event": "Runbook 'oomkill-standard' selecionado, aprovação solicitada"
},
{
"time": "2026-03-18T22:08:00Z",
"event": "Aprovação concedida por carlos.silva@empresa.com"
},
{
"time": "2026-03-18T22:10:00Z",
"event": "Memory limit aumentado para 1Gi, pods reiniciando"
},
{
"time": "2026-03-18T22:45:00Z",
"event": "Incidente resolvido — pods estaveis por 30 minutos"
}
],
"remediationApplied": {
"runbook": "runbook-oomkill-standard",
"steps": [
"Diagnostico de memoria via kubectl top",
"Patch de memory limit de 512Mi para 1Gi",
"Verificacao de estabilidade por 30 minutos"
],
"result": "success"
},
"actionItems": [
{
"priority": "P1",
"description": "Corrigir memory leak no handler de webhook — fechar response body em todos os paths de erro",
"assignee": "backend-team",
"status": "open",
"dueDate": "2026-03-22"
},
{
"priority": "P2",
"description": "Adicionar metricas de heap usage no checkout-service para deteccao precoce",
"assignee": "platform-team",
"status": "open",
"dueDate": "2026-03-25"
},
{
"priority": "P3",
"description": "Revisar memory limits de todos os servicos do namespace production",
"assignee": "sre-team",
"status": "open",
"dueDate": "2026-03-30"
}
],
"lessonsLearned": [
"Adicionar linting automático para verificar fechamento de response bodies em handlers HTTP",
"Memory limits devem ser revisados trimestralmente baseado em metricas reais de uso",
"O runbook de OOMKill funcionou bem, mas a aprovação levou 3 minutos — considerar auto-approve para riskScore < 50"
],
"metricSnapshots": [
{"name": "cpu_usage", "value": "0.45", "timestamp": "2026-03-18T21:55:00Z", "phase": "before"},
{"name": "memory_usage", "value": "498000000", "timestamp": "2026-03-18T22:00:00Z", "phase": "during"},
{"name": "memory_usage", "value": "312000000", "timestamp": "2026-03-18T22:30:00Z", "phase": "after"}
],
"blastRadius": [
{"resource": {"kind": "Service", "name": "checkout-svc", "namespace": "production"}, "impact": "5xx responses during pod restarts", "severity": "high"}
],
"gitCorrelation": {
"commitSHA": "a1b2c3d4",
"commitMessage": "feat: add webhook handler for payment notifications",
"author": "dev@empresa.com",
"timestamp": "2026-03-18T19:30:00Z",
"confidence": 0.82,
"filesChanged": ["internal/webhook/handler.go", "internal/webhook/handler_test.go"]
},
"trending": {
"occurrenceCount": 3,
"windowDays": 30,
"relatedPostMortems": ["PM-20260305-001", "PM-20260312-002"],
"pattern": "Recurring oom_kill on Deployment/checkout-service (3 occurrences in 30 days)"
},
"gitOpsContext": "Helm release 'checkout' chart=checkout version=2.3.1 status=deployed revision=42",
"logAnalysisSummary": "1 Go panic stack trace; 12 critical error patterns (resource/connectivity); Primary exception: panic: runtime error: invalid memory address",
"cascadeChain": ["production/checkout-service(root_cause)", "production/api-gateway(victim)"],
"feedback": {
"overrideRootCause": "",
"remediationAccuracy": 4,
"comments": "Good analysis. Consider suggesting AdjustResources before restart next time.",
"providedBy": "sre@company.com",
"providedAt": "2026-03-19T09:00:00Z"
}
}
}