log 500 signoz errors and reduce timerange query by shagun-singh-inkeep · Pull Request #2469 · inkeep/agents

shagun-singh-inkeep · 2026-03-02T19:37:00Z

No description provided.

vercel · 2026-03-02T19:37:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Mar 2, 2026 9:13pm
agents-docs	Ready	Preview, Comment	Mar 2, 2026 9:13pm
agents-manage-ui	Ready	Preview, Comment	Mar 2, 2026 9:13pm

pullfrog · 2026-03-02T19:37:51Z

Narrows SigNoz trace queries to the conversation's actual time window (with 3-hour padding) instead of the previous 180-day lookback, reducing query load and eliminating 500 errors from overly broad time ranges.

agents-manage-ui/src/lib/api/signoz-conversation-time-range.ts — New shared helper getConversationTimeRange that fetches conversation bounds from the API and computes a tight [createdAt - 3h, updatedAt + 3h] window, falling back to the 180-day default when bounds are unavailable
agents-api/src/domains/manage/routes/conversations.ts — Adds a GET /{id}/bounds endpoint returning conversation createdAt/updatedAt timestamps
agents-api/src/domains/manage/routes/signoz.ts — Enriches SigNoz error logs with error.response?.data from Axios for easier debugging of upstream failures
agents-manage-ui/src/app/api/signoz/conversations/[conversationId]/route.ts — Replaces inline time-range math with getConversationTimeRange
agents-manage-ui/src/app/api/signoz/spans/[spanId]/route.ts — Same refactor; also passes projectId through to enable bounds lookup
agents-manage-ui/src/components/traces/timeline/timeline-wrapper.tsx — Forwards projectId in the span fetch URL and adds it to the effect dependency array

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

pullfrog

The backend changes look solid — the new /bounds endpoint follows existing authorization, schema, and error-handling patterns correctly, and the SigNoz logging improvements are well-guarded. The frontend utility has two potential bugs worth addressing around timestamp parsing and NaN propagation.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-03-02T19:40:10Z

+    const bounds = await makeManagementApiRequest<{
+      data: { createdAt: string; updatedAt: string };
+    }>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`);
+    const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();


Appending Z unconditionally is fragile. The runtime schema uses timestamp without withTimezone, so Drizzle returns strings like 2024-01-01 00:00:00 — appending Z works today. But if the column is ever changed to timestamptz, or if the driver behavior changes, the API could return 2024-01-01T00:00:00+00:00 and appending Z would produce an invalid string.

Safer approach:

const ensureUtc = (s: string) => { if (/Z$|[+-]\d{2}:\d{2}$/.test(s)) return s; return `${s}Z`; }; const createdMs = new Date(ensureUtc(bounds.data.createdAt)).getTime(); const updatedMs = new Date(ensureUtc(bounds.data.updatedAt)).getTime();

pullfrog · 2026-03-02T19:40:10Z

+  if (startParam != null && endParam != null) {
+    return { start: Number(startParam), end: Number(endParam) };


Number(startParam) on a non-numeric string (e.g. ?start=abc) returns NaN, which would propagate into the SigNoz query payload silently. Consider validating:

const start = Number(startParam); const end = Number(endParam); if (Number.isNaN(start) || Number.isNaN(end)) { // fall through to bounds fetch } else { return { start, end }; }

pullfrog · 2026-03-02T19:40:10Z

+  } catch {
+    return { start: now - DEFAULT_LOOKBACK_MS, end: now };


The silent catch swallows all errors — including auth failures or network timeouts — and falls back to a 180-day window. Consider logging a warning so debugging is possible:

} catch (err) { console.warn('Failed to fetch conversation bounds, using default lookback', err); return { start: now - DEFAULT_LOOKBACK_MS, end: now }; }

pullfrog · 2026-03-02T19:40:10Z

+    const { start, end } = await getConversationTimeRange({
+      startParam: null,
+      endParam: null,
+      projectId,
+      tenantId,
+      conversationId,
+    });


Hardcoding startParam: null, endParam: null means every span fetch now makes an extra round-trip to the bounds API (or falls back to 180-day lookback if projectId is missing). The old code used now - DEFAULT_LOOKBACK_MS directly with no network call. If the goal is to narrow the ClickHouse scan window for span lookups too, this trade-off is worth a brief comment explaining the intent.

changeset-bot · 2026-03-02T19:40:37Z

🦋 Changeset detected

Latest commit: debf745

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages

Name	Type
@inkeep/agents-manage-ui	Patch
@inkeep/agents-api	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-core	Patch
@inkeep/agents-mcp	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

claude

PR Review Summary

(3) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

🟠 1) signoz-conversation-time-range.ts Missing test coverage for new utility function

Issue: The getConversationTimeRange function has no unit tests despite containing date/time calculation logic with multiple code paths: (1) explicit start/end params, (2) no projectId fallback, (3) successful bounds fetch with padding calculations, and (4) API error fallback.

Why: Without tests, regressions in time range calculation could cause SigNoz queries to search incorrect time windows. The date parsing logic (appending Z to timestamps) and the Math.min/Math.max clamping could produce unexpected results with edge cases (NaN from invalid dates, inverted ranges if timestamps are in the future).

Fix: Add unit tests in agents-manage-ui/src/lib/api/__tests__/signoz-conversation-time-range.test.ts:

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { getConversationTimeRange, BOUNDS_PADDING_MS, DEFAULT_LOOKBACK_MS } from '../signoz-conversation-time-range';

vi.mock('../api-config', () => ({ makeManagementApiRequest: vi.fn() }));

describe('getConversationTimeRange', () => {
  it('should return explicit params when both provided', async () => {
    const result = await getConversationTimeRange({
      startParam: '1000', endParam: '2000',
      projectId: 'proj', tenantId: 'tenant', conversationId: 'conv'
    });
    expect(result).toEqual({ start: 1000, end: 2000 });
  });

  it('should fallback to default when projectId missing', async () => {
    const now = Date.now();
    const result = await getConversationTimeRange({
      startParam: null, endParam: null,
      projectId: undefined, tenantId: 'tenant', conversationId: 'conv'
    });
    expect(result.end).toBeCloseTo(now, -2);
    expect(result.start).toBeCloseTo(now - DEFAULT_LOOKBACK_MS, -2);
  });

  it('should fallback on API error', async () => {
    const { makeManagementApiRequest } = await import('../api-config');
    vi.mocked(makeManagementApiRequest).mockRejectedValue(new Error('Network error'));
    // ... verify fallback behavior
  });
});

Refs:

Vitest testing patterns in AGENTS.md

Inline Comments:

🟠 Major: signoz-conversation-time-range.ts:33-35 Silent error catch without logging

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: signoz-conversation-time-range.ts:28-29 Timestamp parsing assumes format without timezone suffix

💭 Consider (1) 💭

Inline Comments:

💭 Consider: signoz-conversation-time-range.ts:16-17 Validate parsed numbers for NaN

💡 APPROVE WITH SUGGESTIONS

Summary: This PR is a solid improvement - the conversation bounds optimization will meaningfully reduce SigNoz query times, and the enhanced error logging will help debug 500 errors. The main gap is the lack of test coverage for the new getConversationTimeRange utility, which has multiple code paths and date parsing logic that would benefit from unit tests. The silent error catch should also be updated to log when the fallback is used for observability.

Discarded (5)

Location	Issue	Reason Discarded
`nine-turtles-lay.md:6`	Changeset message doesn't follow "Fix X" style	Nitpick - message is descriptive enough, not blocking
`signoz.ts:131-137`	Error logging could include tenantId/projectId	Already an improvement over existing logging, additional context is nice-to-have
`api-config.ts:73`	Global API requests lack timeout	Pre-existing issue, outside PR scope
`conversations.ts:98-140`	Endpoint lacks route-level tests	Endpoint is simple and delegates to tested `getConversation`; acceptable for now
`spans/[spanId]/route.ts:147`	Error handling loses context from bounds failure	The bounds function catches internally and falls back, so this is moot

Reviewers (7)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-standards`	2	0	0	1	1
`pr-review-consistency`	5	0	0	0	5
`pr-review-errors`	2	0	0	1	1
`pr-review-frontend`	1	0	0	0	1
`pr-review-sre`	5	0	0	0	5
`pr-review-tests`	2	1	0	0	1
`pr-review-types`	3	0	1	0	2
Total	20	1	1	2	16

Note: High discard rate due to significant deduplication across reviewers (timestamp parsing, silent catch, NaN validation all raised by 3+ reviewers).

claude · 2026-03-02T19:42:48Z

+  } catch {
+    return { start: now - DEFAULT_LOOKBACK_MS, end: now };
+  }


🟠 MAJOR: Silent error catch swallows all failures without logging

Issue: The catch block silently returns the fallback time range without any logging or visibility into what failed.

Why: When the bounds API call fails (network issues, auth errors, service down), users will silently get the slower 180-day time range query. This makes it impossible to detect systemic issues - if the bounds endpoint is consistently failing, all trace queries degrade to the expensive path with zero visibility.

Fix: Add logging to track when fallback is used:

} catch (error) { console.warn('[getConversationTimeRange] Failed to fetch bounds, using fallback', { conversationId, projectId, tenantId, error: error instanceof Error ? error.message : String(error), }); return { start: now - DEFAULT_LOOKBACK_MS, end: now }; }

Refs:

api-logging-guidelines skill

claude · 2026-03-02T19:42:53Z

+    const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();
+    const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();


🟡 Minor: Timestamp parsing assumes format without timezone suffix

Issue: Appending Z to timestamps assumes the API returns timestamps without timezone info. If the backend ever returns timestamps with timezone (e.g., 2024-01-01T00:00:00.000Z), this produces invalid dates like ...ZZ.

Why: While the current Drizzle schema uses mode: 'string' which returns timestamps without Z, this is fragile if the format changes. Invalid dates produce NaN which propagates silently.

Fix: Add validation that parsed dates are valid:

Suggested change

const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();

const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();

const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();

const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();

if (!Number.isFinite(createdMs) || !Number.isFinite(updatedMs)) {

return { start: now - DEFAULT_LOOKBACK_MS, end: now };

}

Refs:

runtime-schema.ts:62 - timestamp mode

claude · 2026-03-02T19:42:57Z

+  if (startParam != null && endParam != null) {
+    return { start: Number(startParam), end: Number(endParam) };


💭 Consider: Validate parsed numbers for NaN

Issue: Number() conversion from query params doesn't validate for NaN. If params are malformed strings like ?start=invalid, this returns { start: NaN, end: NaN }.

Why: NaN values would propagate to SigNoz queries causing unclear failures. The current code assumes callers provide valid numeric strings.

Fix: Consider validating before return:

Suggested change

if (startParam != null && endParam != null) {

return { start: Number(startParam), end: Number(endParam) };

if (startParam != null && endParam != null) {

const start = Number(startParam);

const end = Number(endParam);

if (Number.isFinite(start) && Number.isFinite(end)) {

return { start, end };

}

}

claude

PR Review Summary

(3) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

Inline Comments:

🟠 Major: signoz-conversation-time-range.ts:34 External API call without explicit timeout

🟡 Minor (0) 🟡

No new minor issues.

💭 Consider (1) 💭

💭 1) signoz-conversation-time-range.ts:44-52 Fallback metric for operational visibility

Issue: The fallback to 180-day lookback is now properly logged with structured context, which is good. However, if the bounds API starts failing frequently, this would silently degrade query performance without triggering alerts.

Why: Logging alone may not be easily alertable in high-volume environments.

Fix: If operational alerting on fallback frequency is desired, consider emitting a counter metric (e.g., signoz.conversation_bounds.fallback) alongside the log. Otherwise, the current logging is acceptable for debugging.

Refs:

signoz-conversation-time-range.ts:44-52

🕐 Pending Recommendations (2)

🟠 signoz-conversation-time-range.ts Missing unit tests for getConversationTimeRange (prior review finding, still unaddressed)
🟡 signoz-conversation-time-range.ts:35-36 Timestamp parsing assumes format without timezone suffix (prior Pullfrog finding — NaN validation mitigates but doesn't fully resolve)

💡 APPROVE WITH SUGGESTIONS

Summary: Good improvement! This PR successfully addresses the 500 errors and query optimization goal. The delta since the last review shows meaningful improvements: NaN validation was added for timestamp parsing, and the catch block now logs with proper context instead of being silent. The main remaining gap is the lack of a timeout on the bounds API call (Major) and the still-unaddressed test coverage from the prior review. The API endpoint and error logging enhancements are solid.

Discarded (4)

Location	Issue	Reason Discarded
`spans/[spanId]/route.ts:14`	Duplicate `DEFAULT_LOOKBACK_MS` constant	Pre-existing issue — this file was NOT changed in the PR
`signoz.ts:131-137`	Error logging could include more context	Already an improvement over existing logging; additional context is nice-to-have
`conversations.ts:98-140`	New endpoint lacks route-level tests	Endpoint is simple, delegates to tested `getConversation`, acceptable for now
`nine-turtles-lay.md:6`	Changeset message style	Nitpick — message is descriptive enough

Reviewers (6)

Reviewer	Returned	Consider	Inline Comments	Pending Recs	Discarded
`pr-review-standards`	0	0	0	0	0
`pr-review-consistency`	1	0	0	0	1
`pr-review-errors`	0	0	0	0	0
`pr-review-tests`	1	0	0	1	0
`pr-review-types`	0	0	0	0	0
`pr-review-sre`	2	1	1	0	0
Total	4	1	1	1	1

Note: Most reviewers found no new issues because the delta addressed prior review feedback (NaN validation, logging). The test coverage finding was already raised in the prior review.

claude · 2026-03-02T20:13:36Z

+  try {
+    const bounds = await makeManagementApiRequest<{
+      data: { createdAt: string; updatedAt: string };
+    }>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`);


🟠 MAJOR: External API call without explicit timeout

Issue: The makeManagementApiRequest call to fetch conversation bounds does not have a timeout configured. If the management API is slow or unresponsive, this will hang indefinitely, adding significant latency to trace timeline loads.

Why: While the catch block handles total failures, a slow (but not failed) request will still block the calling code. This could cause the trace timeline to appear frozen while waiting for bounds.

Fix: Consider wrapping with a timeout using Promise.race:

const BOUNDS_TIMEOUT_MS = 3000; // 3 seconds const boundsPromise = makeManagementApiRequest<{ data: { createdAt: string; updatedAt: string }; }>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`); const timeoutPromise = new Promise<never>((_, reject) => setTimeout(() => reject(new Error('Bounds fetch timeout')), BOUNDS_TIMEOUT_MS) ); const bounds = await Promise.race([boundsPromise, timeoutPromise]);

Refs:

api-logging-guidelines skill — recommends logging slow operations

claude

PR Review Summary

(0) New Issues | Risk: Low

Delta Review Summary

This re-review is scoped to changes since the last automated review. The delta consists of 1 file with 4 lines changed:

timeline-wrapper.tsx — Removed projectId from span fetch URL and useEffect dependency array.

Verification:

✅ The /api/signoz/spans/[spanId] route does NOT use projectId (only conversationId and tenantId) — removal is correct
✅ URL encoding via encodeURIComponent is properly applied to both dynamic values
✅ Dependency array correctly reflects actual dependencies per React hooks exhaustive-deps rule
✅ No stale closure issues — if projectId changes, the effect doesn't need to re-run since the API doesn't use it

No new issues introduced by this delta.

🕐 Pending Recommendations (3)

These items were raised in prior reviews and remain unaddressed:

🟠 signoz-conversation-time-range.ts:34 External API call without explicit timeout
🟠 signoz-conversation-time-range.ts Missing unit tests for getConversationTimeRange
🟡 signoz-conversation-time-range.ts:35-36 Timestamp parsing assumes format without timezone suffix

💡 APPROVE WITH SUGGESTIONS

Summary: The delta change is clean and correct — it removes an unused projectId parameter from the span fetch call, aligning the client with what the server actually expects. No new issues introduced. The pending recommendations from prior reviews (timeout, tests, timestamp parsing) remain open but don't block this PR. Ship it! 🚀

Reviewers (2)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`pr-review-standards`	0	0	0	0	0	0	0
`pr-review-frontend`	0	0	0	0	0	0	0
Total	0	0	0	0	0	0	0

Note: Both reviewers confirmed the delta is correct with no issues to report.

debug 500 signoz errors and reduce timerange query

1eaac86

vercel Bot deployed to Preview – agents-api March 2, 2026 19:38 View deployment

vercel Bot deployed to Preview – agents-docs March 2, 2026 19:38 View deployment

vercel Bot deployed to Preview – agents-manage-ui March 2, 2026 19:39 View deployment

pullfrog Bot reviewed Mar 2, 2026

View reviewed changes

claude Bot reviewed Mar 2, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Mar 2, 2026

knip and removing from spanid route

a51f5e7

vercel Bot temporarily deployed to Preview – agents-docs March 2, 2026 20:03 Inactive

claude

dfdc887

vercel Bot temporarily deployed to Preview – agents-docs March 2, 2026 20:05 Inactive

github-actions Bot deleted a comment from claude Bot Mar 2, 2026

vercel Bot deployed to Preview – agents-manage-ui March 2, 2026 20:08 View deployment

claude Bot reviewed Mar 2, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Mar 2, 2026

claude

fe6d626

vercel Bot temporarily deployed to Preview – agents-docs March 2, 2026 21:00 Inactive

vercel Bot deployed to Preview – agents-manage-ui March 2, 2026 21:03 View deployment

claude Bot reviewed Mar 2, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Mar 2, 2026

shagun-singh-inkeep enabled auto-merge (squash) March 2, 2026 21:10

shagun-singh-inkeep disabled auto-merge March 2, 2026 21:11

Merge branch 'main' into TI-25

debf745

vercel Bot deployed to Preview – agents-manage-ui March 2, 2026 21:13 View deployment

vercel Bot deployed to Preview – agents-api March 2, 2026 21:13 View deployment

vercel Bot deployed to Preview – agents-docs March 2, 2026 21:13 View deployment

shagun-singh-inkeep changed the title ~~debug 500 signoz errors and reduce timerange query~~ log 500 signoz errors and reduce timerange query Mar 2, 2026

shagun-singh-inkeep merged commit 55eb8cb into main Mar 2, 2026
9 checks passed

shagun-singh-inkeep deleted the TI-25 branch March 2, 2026 21:14

github-actions Bot deleted a comment from claude Bot Mar 2, 2026

This was referenced Mar 2, 2026

feat(agents-api): User scoped webhooks #2442

Merged

fix(agents-cli) Inkeep pull uses human-readable stems for file imports #2468

Closed

		if (startParam != null && endParam != null) {
		return { start: Number(startParam), end: Number(endParam) };

		} catch {
		return { start: now - DEFAULT_LOOKBACK_MS, end: now };

		const createdMs = new Date(`${bounds.data.createdAt}Z`).getTime();
		const updatedMs = new Date(`${bounds.data.updatedAt}Z`).getTime();

-  if (startParam != null && endParam != null) {
-    return { start: Number(startParam), end: Number(endParam) };
+  if (startParam != null && endParam != null) {
+    const start = Number(startParam);
+    const end = Number(endParam);
+    if (Number.isFinite(start) && Number.isFinite(end)) {
+      return { start, end };
+    }
+  }

Conversation

shagun-singh-inkeep commented Mar 2, 2026

Uh oh!

vercel Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pullfrog Bot commented Mar 2, 2026

Uh oh!

pullfrog Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

changeset-bot Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (1) 🟠⚠️

🟡 Minor (1) 🟡

💭 Consider (1) 💭

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (1) 🟠⚠️

🟡 Minor (0) 🟡

💭 Consider (1) 💭

🕐 Pending Recommendations (2)

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

Delta Review Summary

🕐 Pending Recommendations (3)

💡 APPROVE WITH SUGGESTIONS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Mar 2, 2026 •

edited

Loading

pullfrog Bot left a comment •

edited

Loading

changeset-bot Bot commented Mar 2, 2026 •

edited

Loading