Skip to content

test(iroh-dns): APK-based smoke test for the Android JNI path#4209

Draft
Frando wants to merge 5 commits intoFrando/android-dnsfrom
Frando/android-apk-test
Draft

test(iroh-dns): APK-based smoke test for the Android JNI path#4209
Frando wants to merge 5 commits intoFrando/android-dnsfrom
Frando/android-apk-test

Conversation

@Frando
Copy link
Copy Markdown
Member

@Frando Frando commented Apr 29, 2026

Description

This is a follow-up to #4183. That PR gets the iroh resolver working on Android once ndk_context is initialized, but the workspace unit tests can't actually exercise the JNI path: they run as plain ELFs pushed via adb shell, with no JVM in scope. On Android they either fall back to public DNS (debug builds) or would panic (release), but they never go through the real LinkProperties.getDnsServers() lookup. So the JNI code is, in practice, untested in CI.

This branch adds a tiny standalone APK that does. It lives at iroh-dns/tests/android_apk/ as its own [workspace] (so android-activity and friends do not pollute the main workspace), gets built and launched by cargo apk run from the existing x86_64 Android emulator CI job, and resolves dns.iroh.link through DnsResolver::new().

android-activity populates ndk_context for us before android_main runs, so the system DNS reader takes the JNI path. To prove that is what actually happened (rather than a silent fallback to public DNS) the smoke test installs an n0_tracing_test buffer subscriber and asserts that the read system DNS via Android JNI debug line was emitted. That is the only signal that distinguishes "JNI worked" from "JNI silently failed and we used Google".

cargo apk run does not propagate the in-app exit code, so the test prints RESULT=ok only on full success and the CI step polls logcat for that marker. Failed assertions panic, the process aborts, and the poll loop times out.

Breaking Changes

None.

Notes & open questions

Based on #4183, which adds the debug! line this test asserts on.

Change checklist

  • Self-review.
  • Tests if relevant.
  • All breaking changes documented.

Frando added 2 commits April 29, 2026 13:00
Add `iroh-dns/tests/android_apk/`, a stand-alone NativeActivity cdylib
(own [workspace] so its android-only deps stay out of the main lock)
that runs on the x86_64 Android emulator via cargo-apk.

`android-activity` populates `ndk_context` before `android_main`, so
the system DNS reader runs through real JNI against
`ConnectivityService`. The reader emits a `debug!` line tagged with
the resolved nameserver count on the JNI success branch, and the
smoke test installs an `n0_tracing_test` buffer subscriber and uses
`logs_assert` to verify that line was emitted: a positive proof that
the JNI path executed rather than the public-DNS fallback that
`DnsResolver::new()` would silently take. The test then resolves
`dns.iroh.link` via the resolver to validate the full path.

cargo-apk does not propagate the in-app exit code, so the test prints
`RESULT=ok` only on success and the CI step polls logcat for that
marker. Failed assertions panic, the process aborts, and the poll
loop times out. Wire the smoke test into the existing x86_64-android
emulator job through `cargo-apk`, installed via `cargo-binstall` on
that matrix entry only.
`reactivecircus/android-emulator-runner` runs each line of its
`script:` in a separate `sh -c` invocation, so the multi-line
`for/done` and `if/fi` blocks I had been using were getting torn
apart and dash bailed out with `Syntax error: end of file unexpected
(expecting "done")` (job 73559982095).

Collapse the polling loop and the timeout check onto single lines
each. Same logic, same exit codes; just the layout changes.
@Frando Frando force-pushed the Frando/android-apk-test branch from af3c3b8 to ac0059e Compare April 29, 2026 11:03
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/4209/docs/iroh/

Last updated: 2026-04-29T20:24:29Z

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 6516906

Frando added 3 commits April 29, 2026 20:40
NativeActivity processes' stdout and stderr are not captured by
logcat, so the `println!("RESULT=ok")` we relied on never reached the
buffer the CI step polls. The activity launched fine and the test
itself succeeded; the run-time check just had nothing to find.

Replace the println/eprintln pair with a direct call to
`__android_log_write` from `liblog`. The marker now shows up in
`adb logcat -d -v raw` as a plain `RESULT=ok` line, which matches the
existing grep.

The iroh-dns workspace tests separately catch the panic raised by
`ndk_context::android_context()` (the test binaries are pushed via
`adb shell` with no JVM in scope) and warn-fall-back to public DNS,
which is the design we documented; that part of the job already
passed.
The previous run starts the activity and then times out without
`RESULT=ok` showing up in logcat, with no app-tagged log lines
visible in the dump tail. Either the FFI write is silently failing
or the test never reaches the success line.

Add `MARK: ...` checkpoints around `android_main` and `run()` so the
next failure shows how far execution got. Install a panic hook that
routes the message through `__android_log_write` so a panic also
surfaces. Tighten the CI fallback dump to filter on our app's
`iroh_dns_smoke` tag plus `FATAL`/`AndroidRuntime` so diagnostic
output is actually visible instead of buried in 200 lines of
GMSCore noise.
…tracing-android

Two fixes for the symptom we were chasing.

The activity got `freezing 5033 net.iroh.dns.test` from
ActivityManager about 11 seconds after launch, before the resolver
test could finish. That is the cached-app-freezer: when a
NativeActivity does not pump Android lifecycle events, AOSP
classifies it as cached and freezes the cgroup. We were running the
test on the same thread that should have been polling, and we were
not polling at all. Move the test onto a worker thread and have the
main thread loop on `AndroidApp::poll_events`.

Replace the handrolled `__android_log_write` FFI with the
`tracing-android` layer: it gets composed alongside the existing
`n0_tracing_test` mock-writer layer in a `tracing_subscriber`
`Registry`, so `logs_assert` keeps reading the in-memory buffer for
the JNI proof while everything else (`info!`, `error!`, the panic
hook) reaches logcat under the `iroh_dns_smoke` tag. The CI grep
flips from `^RESULT=ok$` to `grep -F 'RESULT=ok'` because
tracing-android prefixes lines with span context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🚑 Needs Triage

Development

Successfully merging this pull request may close these issues.

1 participant