Skip to content

log 500 signoz errors and reduce timerange query#2469

Merged
shagun-singh-inkeep merged 5 commits intomainfrom
TI-25
Mar 2, 2026
Merged

log 500 signoz errors and reduce timerange query#2469
shagun-singh-inkeep merged 5 commits intomainfrom
TI-25

Conversation

@shagun-singh-inkeep
Copy link
Copy Markdown
Collaborator

No description provided.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 2, 2026 9:13pm
agents-docs Ready Ready Preview, Comment Mar 2, 2026 9:13pm
agents-manage-ui Ready Ready Preview, Comment Mar 2, 2026 9:13pm

Request Review

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Mar 2, 2026

Narrows SigNoz trace queries to the conversation's actual time window (with 3-hour padding) instead of the previous 180-day lookback, reducing query load and eliminating 500 errors from overly broad time ranges.

  • agents-manage-ui/src/lib/api/signoz-conversation-time-range.ts — New shared helper getConversationTimeRange that fetches conversation bounds from the API and computes a tight [createdAt - 3h, updatedAt + 3h] window, falling back to the 180-day default when bounds are unavailable
  • agents-api/src/domains/manage/routes/conversations.ts — Adds a GET /{id}/bounds endpoint returning conversation createdAt/updatedAt timestamps
  • agents-api/src/domains/manage/routes/signoz.ts — Enriches SigNoz error logs with error.response?.data from Axios for easier debugging of upstream failures
  • agents-manage-ui/src/app/api/signoz/conversations/[conversationId]/route.ts — Replaces inline time-range math with getConversationTimeRange
  • agents-manage-ui/src/app/api/signoz/spans/[spanId]/route.ts — Same refactor; also passes projectId through to enable bounds lookup
  • agents-manage-ui/src/components/traces/timeline/timeline-wrapper.tsx — Forwards projectId in the span fetch URL and adds it to the effect dependency array

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend changes look solid — the new /bounds endpoint follows existing authorization, schema, and error-handling patterns correctly, and the SigNoz logging improvements are well-guarded. The frontend utility has two potential bugs worth addressing around timestamp parsing and NaN propagation.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

const bounds = await makeManagementApiRequest<{
data: { createdAt: string; updatedAt: string };
}>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`);
const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appending Z unconditionally is fragile. The runtime schema uses timestamp without withTimezone, so Drizzle returns strings like 2024-01-01 00:00:00 — appending Z works today. But if the column is ever changed to timestamptz, or if the driver behavior changes, the API could return 2024-01-01T00:00:00+00:00 and appending Z would produce an invalid string.

Safer approach:

const ensureUtc = (s: string) => {
  if (/Z$|[+-]\d{2}:\d{2}$/.test(s)) return s;
  return `${s}Z`;
};
const createdMs = new Date(ensureUtc(bounds.data.createdAt)).getTime();
const updatedMs = new Date(ensureUtc(bounds.data.updatedAt)).getTime();

Comment on lines +16 to +17
if (startParam != null && endParam != null) {
return { start: Number(startParam), end: Number(endParam) };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number(startParam) on a non-numeric string (e.g. ?start=abc) returns NaN, which would propagate into the SigNoz query payload silently. Consider validating:

const start = Number(startParam);
const end = Number(endParam);
if (Number.isNaN(start) || Number.isNaN(end)) {
  // fall through to bounds fetch
} else {
  return { start, end };
}

Comment on lines +33 to +34
} catch {
return { start: now - DEFAULT_LOOKBACK_MS, end: now };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The silent catch swallows all errors — including auth failures or network timeouts — and falls back to a 180-day window. Consider logging a warning so debugging is possible:

} catch (err) {
  console.warn('Failed to fetch conversation bounds, using default lookback', err);
  return { start: now - DEFAULT_LOOKBACK_MS, end: now };
}

Comment on lines +46 to +52
const { start, end } = await getConversationTimeRange({
startParam: null,
endParam: null,
projectId,
tenantId,
conversationId,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding startParam: null, endParam: null means every span fetch now makes an extra round-trip to the bounds API (or falls back to 180-day lookback if projectId is missing). The old code used now - DEFAULT_LOOKBACK_MS directly with no network call. If the goal is to narrow the ClickHouse scan window for span lookups too, this trade-off is worth a brief comment explaining the intent.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 2, 2026

🦋 Changeset detected

Latest commit: debf745

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@inkeep/agents-manage-ui Patch
@inkeep/agents-api Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(3) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

🟠 1) signoz-conversation-time-range.ts Missing test coverage for new utility function

Issue: The getConversationTimeRange function has no unit tests despite containing date/time calculation logic with multiple code paths: (1) explicit start/end params, (2) no projectId fallback, (3) successful bounds fetch with padding calculations, and (4) API error fallback.

Why: Without tests, regressions in time range calculation could cause SigNoz queries to search incorrect time windows. The date parsing logic (appending Z to timestamps) and the Math.min/Math.max clamping could produce unexpected results with edge cases (NaN from invalid dates, inverted ranges if timestamps are in the future).

Fix: Add unit tests in agents-manage-ui/src/lib/api/__tests__/signoz-conversation-time-range.test.ts:

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { getConversationTimeRange, BOUNDS_PADDING_MS, DEFAULT_LOOKBACK_MS } from '../signoz-conversation-time-range';

vi.mock('../api-config', () => ({ makeManagementApiRequest: vi.fn() }));

describe('getConversationTimeRange', () => {
  it('should return explicit params when both provided', async () => {
    const result = await getConversationTimeRange({
      startParam: '1000', endParam: '2000',
      projectId: 'proj', tenantId: 'tenant', conversationId: 'conv'
    });
    expect(result).toEqual({ start: 1000, end: 2000 });
  });

  it('should fallback to default when projectId missing', async () => {
    const now = Date.now();
    const result = await getConversationTimeRange({
      startParam: null, endParam: null,
      projectId: undefined, tenantId: 'tenant', conversationId: 'conv'
    });
    expect(result.end).toBeCloseTo(now, -2);
    expect(result.start).toBeCloseTo(now - DEFAULT_LOOKBACK_MS, -2);
  });

  it('should fallback on API error', async () => {
    const { makeManagementApiRequest } = await import('../api-config');
    vi.mocked(makeManagementApiRequest).mockRejectedValue(new Error('Network error'));
    // ... verify fallback behavior
  });
});

Refs:

Inline Comments:

  • 🟠 Major: signoz-conversation-time-range.ts:33-35 Silent error catch without logging

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: signoz-conversation-time-range.ts:28-29 Timestamp parsing assumes format without timezone suffix

💭 Consider (1) 💭

Inline Comments:

  • 💭 Consider: signoz-conversation-time-range.ts:16-17 Validate parsed numbers for NaN

💡 APPROVE WITH SUGGESTIONS

Summary: This PR is a solid improvement - the conversation bounds optimization will meaningfully reduce SigNoz query times, and the enhanced error logging will help debug 500 errors. The main gap is the lack of test coverage for the new getConversationTimeRange utility, which has multiple code paths and date parsing logic that would benefit from unit tests. The silent error catch should also be updated to log when the fallback is used for observability.

Discarded (5)
Location Issue Reason Discarded
nine-turtles-lay.md:6 Changeset message doesn't follow "Fix X" style Nitpick - message is descriptive enough, not blocking
signoz.ts:131-137 Error logging could include tenantId/projectId Already an improvement over existing logging, additional context is nice-to-have
api-config.ts:73 Global API requests lack timeout Pre-existing issue, outside PR scope
conversations.ts:98-140 Endpoint lacks route-level tests Endpoint is simple and delegates to tested getConversation; acceptable for now
spans/[spanId]/route.ts:147 Error handling loses context from bounds failure The bounds function catches internally and falls back, so this is moot
Reviewers (7)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 2 0 0 0 1 0 1
pr-review-consistency 5 0 0 0 0 0 5
pr-review-errors 2 0 0 0 1 0 1
pr-review-frontend 1 0 0 0 0 0 1
pr-review-sre 5 0 0 0 0 0 5
pr-review-tests 2 1 0 0 0 0 1
pr-review-types 3 0 1 0 0 0 2
Total 20 1 1 0 2 0 16

Note: High discard rate due to significant deduplication across reviewers (timestamp parsing, silent catch, NaN validation all raised by 3+ reviewers).

Comment on lines +33 to +35
} catch {
return { start: now - DEFAULT_LOOKBACK_MS, end: now };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Silent error catch swallows all failures without logging

Issue: The catch block silently returns the fallback time range without any logging or visibility into what failed.

Why: When the bounds API call fails (network issues, auth errors, service down), users will silently get the slower 180-day time range query. This makes it impossible to detect systemic issues - if the bounds endpoint is consistently failing, all trace queries degrade to the expensive path with zero visibility.

Fix: Add logging to track when fallback is used:

} catch (error) {
  console.warn('[getConversationTimeRange] Failed to fetch bounds, using fallback', {
    conversationId,
    projectId,
    tenantId,
    error: error instanceof Error ? error.message : String(error),
  });
  return { start: now - DEFAULT_LOOKBACK_MS, end: now };
}

Refs:

Comment on lines +28 to +29
const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();
const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Timestamp parsing assumes format without timezone suffix

Issue: Appending Z to timestamps assumes the API returns timestamps without timezone info. If the backend ever returns timestamps with timezone (e.g., 2024-01-01T00:00:00.000Z), this produces invalid dates like ...ZZ.

Why: While the current Drizzle schema uses mode: 'string' which returns timestamps without Z, this is fragile if the format changes. Invalid dates produce NaN which propagates silently.

Fix: Add validation that parsed dates are valid:

Suggested change
const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();
const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();
const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();
const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();
if (!Number.isFinite(createdMs) || !Number.isFinite(updatedMs)) {
return { start: now - DEFAULT_LOOKBACK_MS, end: now };
}

Refs:

Comment on lines +16 to +17
if (startParam != null && endParam != null) {
return { start: Number(startParam), end: Number(endParam) };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Validate parsed numbers for NaN

Issue: Number() conversion from query params doesn't validate for NaN. If params are malformed strings like ?start=invalid, this returns { start: NaN, end: NaN }.

Why: NaN values would propagate to SigNoz queries causing unclear failures. The current code assumes callers provide valid numeric strings.

Fix: Consider validating before return:

Suggested change
if (startParam != null && endParam != null) {
return { start: Number(startParam), end: Number(endParam) };
if (startParam != null && endParam != null) {
const start = Number(startParam);
const end = Number(endParam);
if (Number.isFinite(start) && Number.isFinite(end)) {
return { start, end };
}
}

@github-actions github-actions Bot deleted a comment from claude Bot Mar 2, 2026
@vercel vercel Bot temporarily deployed to Preview – agents-docs March 2, 2026 20:03 Inactive
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(3) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

Inline Comments:

  • 🟠 Major: signoz-conversation-time-range.ts:34 External API call without explicit timeout

🟡 Minor (0) 🟡

No new minor issues.

💭 Consider (1) 💭

💭 1) signoz-conversation-time-range.ts:44-52 Fallback metric for operational visibility

Issue: The fallback to 180-day lookback is now properly logged with structured context, which is good. However, if the bounds API starts failing frequently, this would silently degrade query performance without triggering alerts.

Why: Logging alone may not be easily alertable in high-volume environments.

Fix: If operational alerting on fallback frequency is desired, consider emitting a counter metric (e.g., signoz.conversation_bounds.fallback) alongside the log. Otherwise, the current logging is acceptable for debugging.

Refs:

🕐 Pending Recommendations (2)


💡 APPROVE WITH SUGGESTIONS

Summary: Good improvement! This PR successfully addresses the 500 errors and query optimization goal. The delta since the last review shows meaningful improvements: NaN validation was added for timestamp parsing, and the catch block now logs with proper context instead of being silent. The main remaining gap is the lack of a timeout on the bounds API call (Major) and the still-unaddressed test coverage from the prior review. The API endpoint and error logging enhancements are solid.

Discarded (4)
Location Issue Reason Discarded
spans/[spanId]/route.ts:14 Duplicate DEFAULT_LOOKBACK_MS constant Pre-existing issue — this file was NOT changed in the PR
signoz.ts:131-137 Error logging could include more context Already an improvement over existing logging; additional context is nice-to-have
conversations.ts:98-140 New endpoint lacks route-level tests Endpoint is simple, delegates to tested getConversation, acceptable for now
nine-turtles-lay.md:6 Changeset message style Nitpick — message is descriptive enough
Reviewers (6)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-consistency 1 0 0 0 0 0 1
pr-review-errors 0 0 0 0 0 0 0
pr-review-tests 1 0 0 0 0 1 0
pr-review-types 0 0 0 0 0 0 0
pr-review-sre 2 0 1 0 1 0 0
Total 4 0 1 0 1 1 1

Note: Most reviewers found no new issues because the delta addressed prior review feedback (NaN validation, logging). The test coverage finding was already raised in the prior review.

try {
const bounds = await makeManagementApiRequest<{
data: { createdAt: string; updatedAt: string };
}>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: External API call without explicit timeout

Issue: The makeManagementApiRequest call to fetch conversation bounds does not have a timeout configured. If the management API is slow or unresponsive, this will hang indefinitely, adding significant latency to trace timeline loads.

Why: While the catch block handles total failures, a slow (but not failed) request will still block the calling code. This could cause the trace timeline to appear frozen while waiting for bounds.

Fix: Consider wrapping with a timeout using Promise.race:

const BOUNDS_TIMEOUT_MS = 3000; // 3 seconds

const boundsPromise = makeManagementApiRequest<{
  data: { createdAt: string; updatedAt: string };
}>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`);

const timeoutPromise = new Promise<never>((_, reject) =>
  setTimeout(() => reject(new Error('Bounds fetch timeout')), BOUNDS_TIMEOUT_MS)
);

const bounds = await Promise.race([boundsPromise, timeoutPromise]);

Refs:

@github-actions github-actions Bot deleted a comment from claude Bot Mar 2, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) New Issues | Risk: Low

Delta Review Summary

This re-review is scoped to changes since the last automated review. The delta consists of 1 file with 4 lines changed:

timeline-wrapper.tsx — Removed projectId from span fetch URL and useEffect dependency array.

Verification:

  • ✅ The /api/signoz/spans/[spanId] route does NOT use projectId (only conversationId and tenantId) — removal is correct
  • ✅ URL encoding via encodeURIComponent is properly applied to both dynamic values
  • ✅ Dependency array correctly reflects actual dependencies per React hooks exhaustive-deps rule
  • ✅ No stale closure issues — if projectId changes, the effect doesn't need to re-run since the API doesn't use it

No new issues introduced by this delta.

🕐 Pending Recommendations (3)

These items were raised in prior reviews and remain unaddressed:


💡 APPROVE WITH SUGGESTIONS

Summary: The delta change is clean and correct — it removes an unused projectId parameter from the span fetch call, aligning the client with what the server actually expects. No new issues introduced. The pending recommendations from prior reviews (timeout, tests, timestamp parsing) remain open but don't block this PR. Ship it! 🚀

Reviewers (2)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-frontend 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Both reviewers confirmed the delta is correct with no issues to report.

@github-actions github-actions Bot deleted a comment from claude Bot Mar 2, 2026
@shagun-singh-inkeep shagun-singh-inkeep enabled auto-merge (squash) March 2, 2026 21:10
@shagun-singh-inkeep shagun-singh-inkeep merged commit 55eb8cb into main Mar 2, 2026
9 checks passed
@shagun-singh-inkeep shagun-singh-inkeep deleted the TI-25 branch March 2, 2026 21:14
@github-actions github-actions Bot deleted a comment from claude Bot Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant