Get Postmortem
curl --request GET \
--url http://{host}:{port}/{basePath}/postmortems/{name} \
--header 'Authorization: <api-key>'{
"apiVersion": "v1",
"kind": "PostMortem",
"metadata": {
"name": "PM-20260318-001",
"incident": "INC-20260318-003",
"createdAt": "2026-03-18T23:00:00Z",
"author": "ai-generated",
"reviewedBy": "carlos.silva@empresa.com"
},
"report": {
"title": "OOMKill recorrente no checkout-service",
"severity": "critical",
"duration": "45m",
"detectedAt": "2026-03-18T22:00:00Z",
"resolvedAt": "2026-03-18T22:45:00Z",
"summary": "O checkout-service sofreu múltiplos OOMKills durante 45 minutos devido a um memory leak no handler de webhook de pagamento. O incidente afetou 3.2% das requests de checkout e foi resolvido automaticamente pelo operator após aprovação.",
"impact": {
"usersAffected": 1250,
"requestsAffected": "3.2%",
"revenueImpact": "estimado R$ 15.000",
"slosBreached": ["slo-checkout-error-rate"]
},
"rootCause": {
"description": "Memory leak no handler de webhook de pagamento. O handler não fechava o body do HTTP response em caso de erro de validação, causando acúmulo de buffers na heap.",
"category": "code-bug",
"confidence": 0.91
},
"timeline": [
{
"time": "2026-03-18T22:00:00Z",
"event": "Primeiro OOMKill detectado no pod checkout-service-5f8d7c6b4-k9m2n"
},
{
"time": "2026-03-18T22:02:00Z",
"event": "Decision engine inicia análise automática"
},
{
"time": "2026-03-18T22:03:30Z",
"event": "Root cause identificado: memory leak no webhook handler"
},
{
"time": "2026-03-18T22:05:00Z",
"event": "Runbook 'oomkill-standard' selecionado, aprovação solicitada"
},
{
"time": "2026-03-18T22:08:00Z",
"event": "Aprovação concedida por carlos.silva@empresa.com"
},
{
"time": "2026-03-18T22:10:00Z",
"event": "Memory limit aumentado para 1Gi, pods reiniciando"
},
{
"time": "2026-03-18T22:45:00Z",
"event": "Incidente resolvido — pods estáveis por 30 minutos"
}
],
"remediationApplied": {
"runbook": "runbook-oomkill-standard",
"steps": [
"Diagnóstico de memória via kubectl top",
"Patch de memory limit de 512Mi para 1Gi",
"Verificação de estabilidade por 30 minutos"
],
"result": "success"
},
"actionItems": [
{
"priority": "P1",
"description": "Corrigir memory leak no handler de webhook — fechar response body em todos os paths de erro",
"assignee": "backend-team",
"status": "open",
"dueDate": "2026-03-22"
},
{
"priority": "P2",
"description": "Adicionar métricas de heap usage no checkout-service para detecção precoce",
"assignee": "platform-team",
"status": "open",
"dueDate": "2026-03-25"
},
{
"priority": "P3",
"description": "Revisar memory limits de todos os serviços do namespace production",
"assignee": "sre-team",
"status": "open",
"dueDate": "2026-03-30"
}
],
"lessonsLearned": [
"Adicionar linting automático para verificar fechamento de response bodies em handlers HTTP",
"Memory limits devem ser revisados trimestralmente baseado em métricas reais de uso",
"O runbook de OOMKill funcionou bem, mas a aprovação levou 3 minutos — considerar auto-approve para riskScore < 50"
],
"metricSnapshots": [
{"name": "cpu_usage", "value": "0.45", "timestamp": "2026-03-18T21:55:00Z", "phase": "before"},
{"name": "memory_usage", "value": "498000000", "timestamp": "2026-03-18T22:00:00Z", "phase": "during"},
{"name": "memory_usage", "value": "312000000", "timestamp": "2026-03-18T22:30:00Z", "phase": "after"}
],
"blastRadius": [
{"resource": {"kind": "Service", "name": "checkout-svc", "namespace": "production"}, "impact": "5xx responses during pod restarts", "severity": "high"}
],
"gitCorrelation": {
"commitSHA": "a1b2c3d4",
"commitMessage": "feat: add webhook handler for payment notifications",
"author": "dev@empresa.com",
"timestamp": "2026-03-18T19:30:00Z",
"confidence": 0.82,
"filesChanged": ["internal/webhook/handler.go", "internal/webhook/handler_test.go"]
},
"trending": {
"occurrenceCount": 3,
"windowDays": 30,
"relatedPostMortems": ["PM-20260305-001", "PM-20260312-002"],
"pattern": "Recurring oom_kill on Deployment/checkout-service (3 occurrences in 30 days)"
},
"gitOpsContext": "Helm release 'checkout' chart=checkout version=2.3.1 status=deployed revision=42",
"logAnalysisSummary": "1 Go panic stack trace; 12 critical error patterns (resource/connectivity); Primary exception: panic: runtime error: invalid memory address",
"cascadeChain": ["production/checkout-service(root_cause)", "production/api-gateway(victim)"],
"feedback": {
"overrideRootCause": "",
"remediationAccuracy": 4,
"comments": "Good analysis. Consider suggesting AdjustResources before restart next time.",
"providedBy": "sre@company.com",
"providedAt": "2026-03-19T09:00:00Z"
}
}
}
PostMortems
Obter PostMortem
Retorna o postmortem completo de um incidente, incluindo análise de root cause, timeline e action items
GET
/
postmortems
/
{name}
Get Postmortem
curl --request GET \
--url http://{host}:{port}/{basePath}/postmortems/{name} \
--header 'Authorization: <api-key>'{
"apiVersion": "v1",
"kind": "PostMortem",
"metadata": {
"name": "PM-20260318-001",
"incident": "INC-20260318-003",
"createdAt": "2026-03-18T23:00:00Z",
"author": "ai-generated",
"reviewedBy": "carlos.silva@empresa.com"
},
"report": {
"title": "OOMKill recorrente no checkout-service",
"severity": "critical",
"duration": "45m",
"detectedAt": "2026-03-18T22:00:00Z",
"resolvedAt": "2026-03-18T22:45:00Z",
"summary": "O checkout-service sofreu múltiplos OOMKills durante 45 minutos devido a um memory leak no handler de webhook de pagamento. O incidente afetou 3.2% das requests de checkout e foi resolvido automaticamente pelo operator após aprovação.",
"impact": {
"usersAffected": 1250,
"requestsAffected": "3.2%",
"revenueImpact": "estimado R$ 15.000",
"slosBreached": ["slo-checkout-error-rate"]
},
"rootCause": {
"description": "Memory leak no handler de webhook de pagamento. O handler não fechava o body do HTTP response em caso de erro de validação, causando acúmulo de buffers na heap.",
"category": "code-bug",
"confidence": 0.91
},
"timeline": [
{
"time": "2026-03-18T22:00:00Z",
"event": "Primeiro OOMKill detectado no pod checkout-service-5f8d7c6b4-k9m2n"
},
{
"time": "2026-03-18T22:02:00Z",
"event": "Decision engine inicia análise automática"
},
{
"time": "2026-03-18T22:03:30Z",
"event": "Root cause identificado: memory leak no webhook handler"
},
{
"time": "2026-03-18T22:05:00Z",
"event": "Runbook 'oomkill-standard' selecionado, aprovação solicitada"
},
{
"time": "2026-03-18T22:08:00Z",
"event": "Aprovação concedida por carlos.silva@empresa.com"
},
{
"time": "2026-03-18T22:10:00Z",
"event": "Memory limit aumentado para 1Gi, pods reiniciando"
},
{
"time": "2026-03-18T22:45:00Z",
"event": "Incidente resolvido — pods estáveis por 30 minutos"
}
],
"remediationApplied": {
"runbook": "runbook-oomkill-standard",
"steps": [
"Diagnóstico de memória via kubectl top",
"Patch de memory limit de 512Mi para 1Gi",
"Verificação de estabilidade por 30 minutos"
],
"result": "success"
},
"actionItems": [
{
"priority": "P1",
"description": "Corrigir memory leak no handler de webhook — fechar response body em todos os paths de erro",
"assignee": "backend-team",
"status": "open",
"dueDate": "2026-03-22"
},
{
"priority": "P2",
"description": "Adicionar métricas de heap usage no checkout-service para detecção precoce",
"assignee": "platform-team",
"status": "open",
"dueDate": "2026-03-25"
},
{
"priority": "P3",
"description": "Revisar memory limits de todos os serviços do namespace production",
"assignee": "sre-team",
"status": "open",
"dueDate": "2026-03-30"
}
],
"lessonsLearned": [
"Adicionar linting automático para verificar fechamento de response bodies em handlers HTTP",
"Memory limits devem ser revisados trimestralmente baseado em métricas reais de uso",
"O runbook de OOMKill funcionou bem, mas a aprovação levou 3 minutos — considerar auto-approve para riskScore < 50"
],
"metricSnapshots": [
{"name": "cpu_usage", "value": "0.45", "timestamp": "2026-03-18T21:55:00Z", "phase": "before"},
{"name": "memory_usage", "value": "498000000", "timestamp": "2026-03-18T22:00:00Z", "phase": "during"},
{"name": "memory_usage", "value": "312000000", "timestamp": "2026-03-18T22:30:00Z", "phase": "after"}
],
"blastRadius": [
{"resource": {"kind": "Service", "name": "checkout-svc", "namespace": "production"}, "impact": "5xx responses during pod restarts", "severity": "high"}
],
"gitCorrelation": {
"commitSHA": "a1b2c3d4",
"commitMessage": "feat: add webhook handler for payment notifications",
"author": "dev@empresa.com",
"timestamp": "2026-03-18T19:30:00Z",
"confidence": 0.82,
"filesChanged": ["internal/webhook/handler.go", "internal/webhook/handler_test.go"]
},
"trending": {
"occurrenceCount": 3,
"windowDays": 30,
"relatedPostMortems": ["PM-20260305-001", "PM-20260312-002"],
"pattern": "Recurring oom_kill on Deployment/checkout-service (3 occurrences in 30 days)"
},
"gitOpsContext": "Helm release 'checkout' chart=checkout version=2.3.1 status=deployed revision=42",
"logAnalysisSummary": "1 Go panic stack trace; 12 critical error patterns (resource/connectivity); Primary exception: panic: runtime error: invalid memory address",
"cascadeChain": ["production/checkout-service(root_cause)", "production/api-gateway(victim)"],
"feedback": {
"overrideRootCause": "",
"remediationAccuracy": 4,
"comments": "Good analysis. Consider suggesting AdjustResources before restart next time.",
"providedBy": "sre@company.com",
"providedAt": "2026-03-19T09:00:00Z"
}
}
}
Nome único do postmortem (ex:
PM-20260318-001){
"apiVersion": "v1",
"kind": "PostMortem",
"metadata": {
"name": "PM-20260318-001",
"incident": "INC-20260318-003",
"createdAt": "2026-03-18T23:00:00Z",
"author": "ai-generated",
"reviewedBy": "carlos.silva@empresa.com"
},
"report": {
"title": "OOMKill recorrente no checkout-service",
"severity": "critical",
"duration": "45m",
"detectedAt": "2026-03-18T22:00:00Z",
"resolvedAt": "2026-03-18T22:45:00Z",
"summary": "O checkout-service sofreu múltiplos OOMKills durante 45 minutos devido a um memory leak no handler de webhook de pagamento. O incidente afetou 3.2% das requests de checkout e foi resolvido automaticamente pelo operator após aprovação.",
"impact": {
"usersAffected": 1250,
"requestsAffected": "3.2%",
"revenueImpact": "estimado R$ 15.000",
"slosBreached": ["slo-checkout-error-rate"]
},
"rootCause": {
"description": "Memory leak no handler de webhook de pagamento. O handler não fechava o body do HTTP response em caso de erro de validação, causando acúmulo de buffers na heap.",
"category": "code-bug",
"confidence": 0.91
},
"timeline": [
{
"time": "2026-03-18T22:00:00Z",
"event": "Primeiro OOMKill detectado no pod checkout-service-5f8d7c6b4-k9m2n"
},
{
"time": "2026-03-18T22:02:00Z",
"event": "Decision engine inicia análise automática"
},
{
"time": "2026-03-18T22:03:30Z",
"event": "Root cause identificado: memory leak no webhook handler"
},
{
"time": "2026-03-18T22:05:00Z",
"event": "Runbook 'oomkill-standard' selecionado, aprovação solicitada"
},
{
"time": "2026-03-18T22:08:00Z",
"event": "Aprovação concedida por carlos.silva@empresa.com"
},
{
"time": "2026-03-18T22:10:00Z",
"event": "Memory limit aumentado para 1Gi, pods reiniciando"
},
{
"time": "2026-03-18T22:45:00Z",
"event": "Incidente resolvido — pods estáveis por 30 minutos"
}
],
"remediationApplied": {
"runbook": "runbook-oomkill-standard",
"steps": [
"Diagnóstico de memória via kubectl top",
"Patch de memory limit de 512Mi para 1Gi",
"Verificação de estabilidade por 30 minutos"
],
"result": "success"
},
"actionItems": [
{
"priority": "P1",
"description": "Corrigir memory leak no handler de webhook — fechar response body em todos os paths de erro",
"assignee": "backend-team",
"status": "open",
"dueDate": "2026-03-22"
},
{
"priority": "P2",
"description": "Adicionar métricas de heap usage no checkout-service para detecção precoce",
"assignee": "platform-team",
"status": "open",
"dueDate": "2026-03-25"
},
{
"priority": "P3",
"description": "Revisar memory limits de todos os serviços do namespace production",
"assignee": "sre-team",
"status": "open",
"dueDate": "2026-03-30"
}
],
"lessonsLearned": [
"Adicionar linting automático para verificar fechamento de response bodies em handlers HTTP",
"Memory limits devem ser revisados trimestralmente baseado em métricas reais de uso",
"O runbook de OOMKill funcionou bem, mas a aprovação levou 3 minutos — considerar auto-approve para riskScore < 50"
],
"metricSnapshots": [
{"name": "cpu_usage", "value": "0.45", "timestamp": "2026-03-18T21:55:00Z", "phase": "before"},
{"name": "memory_usage", "value": "498000000", "timestamp": "2026-03-18T22:00:00Z", "phase": "during"},
{"name": "memory_usage", "value": "312000000", "timestamp": "2026-03-18T22:30:00Z", "phase": "after"}
],
"blastRadius": [
{"resource": {"kind": "Service", "name": "checkout-svc", "namespace": "production"}, "impact": "5xx responses during pod restarts", "severity": "high"}
],
"gitCorrelation": {
"commitSHA": "a1b2c3d4",
"commitMessage": "feat: add webhook handler for payment notifications",
"author": "dev@empresa.com",
"timestamp": "2026-03-18T19:30:00Z",
"confidence": 0.82,
"filesChanged": ["internal/webhook/handler.go", "internal/webhook/handler_test.go"]
},
"trending": {
"occurrenceCount": 3,
"windowDays": 30,
"relatedPostMortems": ["PM-20260305-001", "PM-20260312-002"],
"pattern": "Recurring oom_kill on Deployment/checkout-service (3 occurrences in 30 days)"
},
"gitOpsContext": "Helm release 'checkout' chart=checkout version=2.3.1 status=deployed revision=42",
"logAnalysisSummary": "1 Go panic stack trace; 12 critical error patterns (resource/connectivity); Primary exception: panic: runtime error: invalid memory address",
"cascadeChain": ["production/checkout-service(root_cause)", "production/api-gateway(victim)"],
"feedback": {
"overrideRootCause": "",
"remediationAccuracy": 4,
"comments": "Good analysis. Consider suggesting AdjustResources before restart next time.",
"providedBy": "sre@company.com",
"providedAt": "2026-03-19T09:00:00Z"
}
}
}
Autorizações
Bearer token issued by the operator. Format: Authorization: Bearer <token>.
Parâmetros de caminho
Unique postmortem name.
Exemplo:
"PM-20260318-001"
⌘I