Platform-agnostic
The interaction tool names are identical on iOS and Android — gesture-tap, gesture-swipe, describe, screenshot, launch-app, etc. — and the tool-server auto-dispatches based on the udid you pass (UUID-shape → iOS, adb serial → Android).
Before testing, resolve which device to test on. Call list-devices and follow <device_selection_rule>: prefer a running device on any platform;
Once a platform is chosen, the per-platform setup skill takes over:
| Platform | Setup skill | Find devices with |
|---|---|---|
| iOS | argent-ios-simulator-setup | list-devices → boot-device with udid if none booted |
| Android | argent-android-emulator-setup | list-devices → boot-device with avdName if none ready |
1. Workflow
All interactions go through argent MCP tools. Ensure the simulator/emulator is ready before starting.
For implementation tasks that modify visible UI, this workflow can also serve as a visual acceptance path.
- Baseline screenshot: Call
screenshotto see the current UI state. For visual regression comparison or UI change verification, capture the baseline atscale: 1.0withincludeImageInContext: falseand keep the returnedpathbefore editing whenever feasible. - Find target: Before tapping, use a discovery tool to get element coordinates:
- React Native apps: use
debugger-component-tree— it returns component names with (tap: x,y) coordinates. This is the preferred tool for RN apps on either platform. To use it, resolve theargent-react-native-app-workflowskill for setup; on Android you must also runadb -s <serial> reverse tcp:8081 tcp:8081so Metro is reachable from the device. - Standard app screens and in-app modals: use
describe. On iOS this returns the AX tree (falls back to native-devtools when AX is empty); on Android it returns the uiautomator tree in the same DescribeNode shape. - Permission prompts / system modal overlays: try
describefirst. Fall back toscreenshotonly if the overlay is not exposed reliably. - Fallback: use
screenshotto estimate where the desired component is, then verify immediately after the action.
- Interact: Perform the action (
gesture-tap,gesture-swipe,keyboard,button, ...) — you receive a screenshot automatically. - Verify: Check the returned screenshot for expected results. If it shows a loading/transitional state, retake with normal downscaled
screenshot. Pick evidence by what's being asserted:
- Visual (layout, spacing, color, typography, image/icon rendering, clipping, overflow, text rendering): prefer
screenshot-diffagainst the baseline captured in step 1 — it surfaces pixel-visible changes the auto-screenshot might miss. Fall back to visual inspection of the auto-screenshot only when a stable baseline isn't available. - Structural (navigation state, element existence, accessibility labels/values, selection, hierarchy, route): verify with
describe,debugger-component-tree, ornative-describe-screen. - Runtime / log / network (console errors, API calls, persistence, timing): verify with
view-network-logs,debugger-log-registry,debugger-evaluate, or targeted tests. - Mixed: collect evidence for each relevant class.
- Report the combined verdict: expected behavior, observed behavior, evidence used, and any blocker for requested visual diffing.
- Repeat for each step in the flow.
2. Template
Goal: Test [feature name]
Steps:
1. Classify expected result: visual / structural / runtime-log-network / mixed → choose evidence
2. [Navigate / tap / type to reach stable comparable starting point] → verify auto-screenshot
3. screenshot { scale: 1.0, includeImageInContext: false } → save baseline path when visual or mixed evidence needs diffing
4. [Perform the action to test] → verify auto-screenshot
5. Use screenshot-diff when requested or when comparable images add useful visual evidence
6. Report: pass / fail with combined visual, structural, runtime/log/network evidence as applicable
3. Examples
Login flow
1. screenshot → see login screen
2. gesture-tap { x: 0.5, y: 0.4 } → tap email field
3. paste { text: "user@example.com" }
4. gesture-tap { x: 0.5, y: 0.55 } → tap password field
5. paste { text: "password123" }
6. gesture-tap { x: 0.5, y: 0.7 } → tap Login button
7. screenshot → verify home screen appeared
Scroll and navigation
1. screenshot → see list at top
2. gesture-swipe { fromY: 0.7, toY: 0.3 } → scroll down
3. gesture-tap item at visible position → verify auto-screenshot
4. screenshot → verify detail view opened
5. button { button: "back" }
6. screenshot → verify returned to list
Visual behavior check
1. Classify expected result as visual or mixed.
2. Navigate to the stable starting state.
3. screenshot { scale: 1.0, includeImageInContext: false } → save baseline path.
4. describe / debugger-component-tree → find the control and use its returned tap coordinates.
5. gesture-tap → perform the visual behavior under test.
6. screenshot-diff { baselinePath, captureCurrent: true, udid, outputDir } → inspect visible change or stability.
7. describe / debugger-component-tree → verify selected state, label, route, or attributes if relevant.
8. Report combined verdict from expected behavior, visual inspection, diff summary, and structural evidence.
---
4. Recovery Pattern
- If screenshot shows loading/transition: wait 500ms, retake with
screenshot. - If tap misses target: re-run discovery tool (
describe/debugger-component-tree), retry once with new coordinates. - If a permission dialog or modal is visible: re-run
describefirst. Stay in screenshot-driven navigation only when the overlay is not exposed reliably, then switch back todescribe/debugger-component-treeas soon as it is dismissed. - If tap fails twice at same coordinates: stop, re-discover, report if element not found.
- If a saved flow fails during
flow-executereplay (as opposed to live test steps above): followargent-create-flowskill §10 for structured diagnosis and correction.
Tips
- Use
pastefor text entry on iOS — faster and more reliable than key-by-keykeyboard.pasteis iOS-only; on Android usekeyboardinstead. - Use
gesture-customfor long-press context menus (800ms hold). - Report clearly: state what you expected, what you saw, and the verdict.
- Permission modals: try
describefirst. Usescreenshotonly as fallback, tap one visible button at a time, and verify with the returned screenshot before continuing. - Record for replay: If a tested flow is likely to be repeated, use the
argent-create-flowskill to record it as a.yamlscript. This lets you replay the entire sequence later with a singleflow-executecall instead of re-running each step manually.
Related Skills
| Skill | When to use |
|---|---|
argent-device-interact | Tool usage for tapping, swiping, typing (iOS + Android) |
argent-screenshot-diff | Visual regression and before/after screenshot comparison |
argent-ios-simulator-setup | Booting and connecting an iOS simulator |
argent-android-emulator-setup | Booting and connecting an Android emulator |
argent-react-native-app-workflow | Starting the app, Metro, build issues |
argent-metro-debugger | Breakpoints, console logs, JS evaluation |
argent-create-flow | Record a test sequence as a replayable flow |

