Wave Status¶
Active waves can be read and decoded with the –waves command. Ideally, one should avoid issuing this command too often if GFX power gating is enabled. Typically, this command is used when the GPU has hung and the status of the waves will aid in debugging as it indicates the current state of the shaders.
umr --waves [ ${ringname} | uq | vmid@addr.size | none ]
The parameter to the command can be one of the following: the name of a kernel ring, the word ‘uq’ to specify using the user queue the user bound to, a triple of a VMID, virtual address, and buffer size (assumes PM4), or simply the word ‘none’ to indicate no packet stream is associated with this command.
The specification of a packet stream source is useful because it tells umr where it might find information about how the shader (kernel) being debugged was programmed.
Basic Decoding¶
If there are active waves the default output format resembles:
------------------------------------------------------
se0.sh0.cu2.simd0.wave0
Main Registers:
ixSQ_WAVE_STATUS: 08010100 | ixSQ_WAVE_PC_LO: 00202128 | ixSQ_WAVE_PC_HI: 00008000 | ixSQ_WAVE_EXEC_LO: a5ca57c8 |
ixSQ_WAVE_EXEC_HI: 855382fa | ixSQ_WAVE_HW_ID: 10300200 | ixSQ_WAVE_INST_DW0: bf8c0071 | ixSQ_WAVE_INST_DW1: 0a0a0217 |
ixSQ_WAVE_GPR_ALLOC: 01060203 | ixSQ_WAVE_LDS_ALLOC: 00000000 | ixSQ_WAVE_TRAPSTS: 20000000 | ixSQ_WAVE_IB_STS: 00000002 |
ixSQ_WAVE_IB_DBG0: 08000b06 | ixSQ_WAVE_M0: e4910000 | ixSQ_WAVE_MODE: 000001c0 |
The output can be fed through the command ‘column -t’ to pretty print it. The first line represents the column headings. When appropriate SGPRs and (on GFX9+) VGPRs will be printed if the wave is halted. Where possible it will attempt to print out the surrounding instruction words in the shader with disassembly.
On live systems if there is a desire to inspect wave data the ‘halt_waves’ option can be used. This will issue an SQ_CMD halt command which will halt any waves currently being processed. If there are no waves being processed the command is effectively ignored.
umr -O halt_waves --waves gfx_0.0.0
Typically, if the command succeeds the display will hang while umr is running (it will issue a resume before terminating). For instance, if you pipe umr to less the display will appear frozen while umr is blocked trying to write data to stdout. If you terminate umr uncleanly (say with a SIGINT or SIGKILL) the waves will not resume. This can be cleaned up by re-issuing umr with halt_waves and letting it terminate cleanly.
The wave status command supports an alternative output format with the ‘bits’ option.
Detailed output¶
umr -O bits --waves gfx_0.0.0
Which produces output that resembles:
se0.sh0.cu0.simd0.wave0
Main Registers:
ixSQ_WAVE_STATUS: 0801a001 | ixSQ_WAVE_PC_LO: 00200a48 | ixSQ_WAVE_PC_HI: 00008000 | ixSQ_WAVE_EXEC_LO: ffffffff |
ixSQ_WAVE_EXEC_HI: ffffffff | ixSQ_WAVE_HW_ID: 0f200000 | ixSQ_WAVE_INST_DW0: bf8c0f70 | ixSQ_WAVE_INST_DW1: d2960000 |
ixSQ_WAVE_GPR_ALLOC: 01000300 | ixSQ_WAVE_LDS_ALLOC: 0000203c | ixSQ_WAVE_TRAPSTS: 20000000 | ixSQ_WAVE_IB_STS: 00000000 |
ixSQ_WAVE_IB_DBG0: 00000026 | ixSQ_WAVE_M0: 80100000 | ixSQ_WAVE_MODE: 000001c0 |
Register Bits:
ixSQ_WAVE_STATUS[0801a001]:
SCC: 00000001 | SPI_PRIO: 00000000 | USER_PRIO: 00000000 | PRIV: 00000000 |
TRAP_EN: 00000000 | TTRACE_EN: 00000000 | EXPORT_RDY: 00000000 | EXECZ: 00000000 |
VCCZ: 00000000 | IN_TG: 00000000 | IN_BARRIER: 00000000 | HALT: 00000001 |
TRAP: 00000000 | TTRACE_CU_EN: 00000001 | VALID: 00000001 | ECC_ERR: 00000000 |
SKIP_EXPORT: 00000000 | PERF_EN: 00000000 | COND_DBG_USER: 00000000 | COND_DBG_SYS: 00000000 |
ALLOW_REPLAY: 00000000 | FATAL_HALT: 00000000 | MUST_EXPORT: 00000001 |
ixSQ_WAVE_PC_LO[00200a48]:
PC_LO: 00200a48 |
ixSQ_WAVE_PC_HI[00008000]:
PC_HI: 00008000 |
ixSQ_WAVE_EXEC_LO[ffffffff]:
EXEC_LO: ffffffff |
ixSQ_WAVE_EXEC_HI[ffffffff]:
EXEC_HI: ffffffff |
ixSQ_WAVE_HW_ID[0f200000]:
WAVE_ID: 00000000 | SIMD_ID: 00000000 | PIPE_ID: 00000000 | CU_ID: 00000000 |
SH_ID: 00000000 | SE_ID: 00000000 | TG_ID: 00000000 | VM_ID: 00000002 |
QUEUE_ID: 00000007 | STATE_ID: 00000001 | ME_ID: 00000000 |
ixSQ_WAVE_INST_DW0[bf8c0f70]:
INST_DW0: bf8c0f70 |
ixSQ_WAVE_INST_DW1[d2960000]:
INST_DW1: d2960000 |
ixSQ_WAVE_GPR_ALLOC[01000300]:
VGPR_BASE: 00000000 | VGPR_SIZE: 00000003 | SGPR_BASE: 00000000 | SGPR_SIZE: 00000001 |
ixSQ_WAVE_LDS_ALLOC[0000203c]:
LDS_BASE: 0000003c | LDS_SIZE: 00000002 |
ixSQ_WAVE_TRAPSTS[20000000]:
EXCP: 00000000 | SAVECTX: 00000000 | ILLEGAL_INST: 00000000 | EXCP_HI: 00000000 |
EXCP_CYCLE: 00000000 | XNACK_ERROR: 00000000 | DP_RATE: 00000001 |
ixSQ_WAVE_IB_STS[00000000]:
VM_CNT: 00000000 | EXP_CNT: 00000000 | LGKM_CNT: 00000000 | VALU_CNT: 00000000 |
FIRST_REPLAY: 00000000 | RCNT: 00000000 | VM_CNT_HI: 00000000 |
ixSQ_WAVE_IB_DBG0[00000026]:
IBUF_ST: 00000006 | PC_INVALID: 00000000 | NEED_NEXT_DW: 00000000 | NO_PREFETCH_CNT: 00000001 |
IBUF_RPTR: 00000000 | IBUF_WPTR: 00000000 | INST_STR_ST: 00000000 | ECC_ST: 00000000 |
IS_HYB: 00000000 | HYB_CNT: 00000000 | KILL: 00000000 | NEED_KILL_IFETCH: 00000000 |
NO_PREFETCH_CNT_HI: 00000000 |
ixSQ_WAVE_M0[80100000]:
M0: 80100000 |
ixSQ_WAVE_MODE[000001c0]:
FP_ROUND: 00000000 | FP_DENORM: 0000000c | DX10_CLAMP: 00000001 | IEEE: 00000000 |
LOD_CLAMPED: 00000000 | DEBUG_EN: 00000000 | EXCP_EN: 00000000 | FP16_OVFL: 00000000 |
POPS_PACKER0: 00000000 | POPS_PACKER1: 00000000 | DISABLE_PERF: 00000000 | GPR_IDX_EN: 00000000 |
VSKIP: 00000000 | CSP: 00000000 |
>SGPRS[0..3] = { 98000000, 00f00000, c0000000, 00000000 }
>SGPRS[4..7] = { b8c7ffb5, 80100000, ffffffff, ffffffff }
>SGPRS[8..11] = { 01018000, 00a00080, 4095c31f, 9190032e }
>SGPRS[12..15] = { 806fe000, 0d000000, 00600000, 0101a400 }
VGPRS: t00 t01 t02 t03 t04 t05 t06 t07 t08 t09 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 t20 t21 t22 t23 t24 t25 t26 t27 t28 t29 t30 t31 t32 t33 t34 t35 t36 t37 t38 t39 t40 t41 t42 t43 t44 t45 t46 t47 t48 t49 t50 t51 t52 t53 t54 t55 t56 t57 t58 t59 t60 t61 t62 t63
[ 0] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 }
[ 1] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 }
[ 2] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 }
[ 3] = { 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 3f800000 }
[ 4] = { 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 00000040 3f66147e 3f6520cc 3f66a9b9 3f65b5bb 3f6d04b7 3f6c0d0b 3f6d9c19 3f6ca41e 3f6b15ec 3f6a1f63 3f6bacb3 3f6ab5da 3f6e33a7 3f6d3b5d 3f6ecb5e 3f6dd2c5 3f6c43a2 3f6b4c7b 3f6cdabc 3f6be346 3f692965 3f6833fc 3f69bf8f 3f68c9d8 3f673f1f 3f664ad4 3f67d4ad 3f66e014 3f6a55e2 3f695fdd 3f6aec5e 3f69f60a }
[ 5] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3c499758 3c3eba4b 3c3320d1 3c2848d1 3c5e1b61 3c532226 3c4779cd 3c3c85b2 3c482f5b 3c3d42fc 3c319808 3c26b0c4 3c30d1ef 3c25e2f8 3c1a23c5 3c0f39f7 3c1afa71 3c10184e 3c045695 3bf2f32a 3c325d02 3c277d66 3c1bcfe7 3c10f564 3c1ca417 3c11d12a 3c06212c 3bf6a6a7 3c053c8f 3bf4ce50 3bdd45ef 3bc7a55f }
[ 6] = { 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000800 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 }
[ 7] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 }
[ 8] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 }
[ 9] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 }
[ 10] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 }
[ 11] = { 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 fffffff0 }
[ 12] = { 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 0002a600 0002a610 0002a620 0002a630 0002a640 0002a650 0002a660 0002a670 0002a680 0002a690 0002a6a0 0002a6b0 0002a6c0 0002a6d0 0002a6e0 0002a6f0 0002a700 0002a710 0002a720 0002a730 0002a740 0002a750 0002a760 0002a770 0002a780 0002a790 0002a7a0 0002a7b0 0002a7c0 0002a7d0 0002a7e0 0002a7f0 }
[ 13] = { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 }
[ 14] = { 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 00000000 00000000 00000008 00000008 }
[ 15] = { 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 }
PGM_MEM:
pgm[2@0x800000200a28 + 0x0 ] = 0xd4000002 v_interp_p1_f32_e32 v0, v2, attr0.x
pgm[2@0x800000200a28 + 0x4 ] = 0xd4040102 v_interp_p1_f32_e32 v1, v2, attr0.y
pgm[2@0x800000200a28 + 0x8 ] = 0xd4010003 v_interp_p2_f32_e32 v0, v3, attr0.x
pgm[2@0x800000200a28 + 0xc ] = 0xd4050103 v_interp_p2_f32_e32 v1, v3, attr0.y
pgm[2@0x800000200a28 + 0x10 ] = 0x86fe067e s_and_b64 exec, exec, s[6:7]
pgm[2@0x800000200a28 + 0x14 ] = 0xbf8cc07f s_waitcnt lgkmcnt(0)
pgm[2@0x800000200a28 + 0x18 ] = 0xf0800f00 image_sample v[0:3], v0, s[8:15], s[0:3] dmask:0xf
pgm[2@0x800000200a28 + 0x1c ] = 0x00020000 ;;
* pgm[2@0x800000200a28 + 0x20 ] = 0xbf8c0f70 s_waitcnt vmcnt(0)
pgm[2@0x800000200a28 + 0x24 ] = 0xd2960000 v_cvt_pkrtz_f16_f32 v0, v0, v1
pgm[2@0x800000200a28 + 0x28 ] = 0x00020300 ;;
pgm[2@0x800000200a28 + 0x2c ] = 0xd2960001 v_cvt_pkrtz_f16_f32 v1, v2, v3
pgm[2@0x800000200a28 + 0x30 ] = 0x00020702 ;;
pgm[2@0x800000200a28 + 0x34 ] = 0xc4001c0f exp mrt0 v0, v0, v1, v1 done compr vm
pgm[2@0x800000200a28 + 0x38 ] = 0x00000100 ;;
pgm[2@0x800000200a28 + 0x3c ] = 0xbf810000 s_endpgm
End of disassembly.
This format of output is a lot more verbose but includes human readable bitfield decodings which may aid in debugging purposes. Where possible it will also print out SGPRs and on newer platforms (gfx9+) it may also include VGPRs.
Kernel configurations¶
When a shader (kernel) is found in the packet stream the output of the PGM_MEM area changes:
$ umr --user-queue kfd,comm=test,queue=0 --waves uq -O halt_waves
------------------------------------------------------
se2.sa1.wgp2.simd0.wave0
Main Registers:
ixSQ_WAVE_STATUS: 10010040 | ixSQ_WAVE_PC_LO: a8ff9a08 | ixSQ_WAVE_PC_HI: 00007fb2 | ixSQ_WAVE_EXEC_LO: 00000001 |
ixSQ_WAVE_EXEC_HI: 00000000 | ixSQ_WAVE_HW_ID1: 20090800 | ixSQ_WAVE_HW_ID2: 09000102 | ixSQ_WAVE_GPR_ALLOC: 00001000 |
ixSQ_WAVE_LDS_ALLOC: 00000000 | ixSQ_WAVE_IB_STS: 00000000 | ixSQ_WAVE_IB_STS2: 70000000 | ixSQ_WAVE_IB_DBG1: 01000000 |
ixSQ_WAVE_M0: 80000000 | ixSQ_WAVE_MODE: 000000f0 | ixSQ_WAVE_STATE_PRIV: 00004200 | ixSQ_WAVE_EXCP_FLAG_PRIV: 00000000 |
ixSQ_WAVE_EXCP_FLAG_USER: 00000000 | ixSQ_WAVE_TRAP_CTRL: 00000000 | ixSQ_WAVE_ACTIVE: 00000000 | ixSQ_WAVE_VALID_AND_IDLE: 00000001 |
ixSQ_WAVE_DVGPR_ALLOC_LO: bebebeef | ixSQ_WAVE_DVGPR_ALLOC_HI: bebebeef | ixSQ_WAVE_SCHED_MODE: 00000000 |
>SGPRS[0..3] = { ffffffff, 00007fb1, a928a000, 00007fb2 }
>SGPRS[4..7] = { a4200100, 00007fb1, 00000001, 00000000 }
>SGPRS[8..11] = { 00000000, 002e5e00, 00000000, 00025180 }
>SGPRS[12..15] = { 40605000, 0000751c, 32e00000, 00007520 }
>SGPRS[16..19] = { ffffffff, 00000000, 00000001, 00000000 }
...<snip>...
PGM_MEM: (found shader at: 0@0x7fb2a8ff9a00 of 24 bytes)
Shader registers:
gfx1201.regCOMPUTE_PGM_RSRC1(0@0x7fb2a8ff0940) == 0xe00f0100
gfx1201.regCOMPUTE_PGM_RSRC2(0@0x7fb2a8ff0940) == 0x1390
gfx1201.regCOMPUTE_PGM_RSRC3(0@0x7fb2a8ff0940) == 0x0
pgm[9@0x7fb2a8ff9a00 + 0x0 ] = 0xbea10080 s_mov_b32 s33, 0
pgm[9@0x7fb2a8ff9a00 + 0x4 ] = 0xbf830008 s_sleep 8
* pgm[9@0x7fb2a8ff9a00 + 0x8 ] = 0xbe8000c1 s_mov_b32 s0, -1
pgm[9@0x7fb2a8ff9a00 + 0xc ] = 0x8b6a007e s_and_b32 vcc_lo, exec_lo, s0
pgm[9@0x7fb2a8ff9a00 + 0x10 ] = 0xbfa4fffc s_cbranch_vccnz 65532
pgm[9@0x7fb2a8ff9a00 + 0x14 ] = 0xbfb00000 s_endpgm
End of disassembly.
Here it found the compute kernel (shader) was programmed by an AQL packet at 0x7fb2a8ff9a00 in the clients virtual memory space. The registers printed are related to the programming of the kernel and will change depending on the client. For instance, kgd clients likely program far more registers that control the execution of the shader. The -O bits option can be specified to get bitfield decoding of the kernel (shader) programming registers.
On certain architectures UMR supports finding AQL data when the PC address of the wave is outside the kernels understood virtual memory range. For instance, in this demo the kernel programmed jumps to another kernel that was not programmed by an AQL packet directly:
$ umr --user-queue kfd,comm=test2,queue=0 --waves uq -O halt_waves
...<snip>...
PGM_MEM:
Found DISPATCH_KERNEL, Shader registers:
gfx1201.regCOMPUTE_PGM_RSRC1(0@0x7a05428c8940) == 0xe00f0103
gfx1201.regCOMPUTE_PGM_RSRC2(0@0x7a05428c8940) == 0x1391
gfx1201.regCOMPUTE_PGM_RSRC3(0@0x7a05428c8940) == 0x0
pgm[8@0x7a05428d1914 + 0x0 ] = 0xbfc80000 s_wait_loadcnt_dscnt 0x0
pgm[8@0x7a05428d1914 + 0x4 ] = 0xbfc40000 s_wait_expcnt 0x0
pgm[8@0x7a05428d1914 + 0x8 ] = 0xbfc20000 s_wait_samplecnt 0x0
pgm[8@0x7a05428d1914 + 0xc ] = 0xbfc30000 s_wait_bvhcnt 0x0
pgm[8@0x7a05428d1914 + 0x10 ] = 0xbfc70000 s_wait_kmcnt 0x0
pgm[8@0x7a05428d1914 + 0x14 ] = 0xbe810021 s_mov_b32 s1, s33
pgm[8@0x7a05428d1914 + 0x18 ] = 0xbea10020 s_mov_b32 s33, s32
pgm[8@0x7a05428d1914 + 0x1c ] = 0xbf830001 s_sleep 1
* pgm[8@0x7a05428d1914 + 0x20 ] = 0xbe8000c1 s_mov_b32 s0, -1
pgm[8@0x7a05428d1914 + 0x24 ] = 0xbf88fffe s_wait_alu 0xfffe
pgm[8@0x7a05428d1914 + 0x28 ] = 0x8b6a007e s_and_b32 vcc_lo, exec_lo, s0
pgm[8@0x7a05428d1914 + 0x2c ] = 0xbf88fffe s_wait_alu 0xfffe
pgm[8@0x7a05428d1914 + 0x30 ] = 0xbfa4fffa s_cbranch_vccnz 65530
pgm[8@0x7a05428d1914 + 0x34 ] = 0xbea10001 s_mov_b32 s33, s1
pgm[8@0x7a05428d1914 + 0x38 ] = 0xbf88fffe s_wait_alu 0xfffe
pgm[8@0x7a05428d1914 + 0x3c ] = 0xbe80481e s_setpc_b64 s[30:31]
End of disassembly.
In this example we see the term “Found DISPATCH_KERNEL” which means UMR found the AQL packet that dispatched this wave. In this event the registers displayed are accurate. The disassembly is based on just rewinding the PC value 8 words which may or may not align with a valid opcode in the kernel.
Full Kernel Text¶
By default, UMR outputs upto about 16 words worth of the shader data to dissassemble. If you want to see the entire kernel program use the -O full_shader option which when the kernel dispatch opcode is found in the packet stream it will disassemble the entire shader. If no packet is found it will just revert to disassembling some data around the waves PC address.
Using the full shader (kernel) is more reliable for decoding since UMR can start at the beginning of the text section and there is no chance of a misaligned opcode decoding.
$ umr --user-queue kfd,comm=ollama,queue=2 --waves uq -O full_shader,halt_waves
PGM_MEM: (found shader at: 0@0x7e4d3c182f00 of 1836 bytes)
Shader registers:
gfx1201.regCOMPUTE_PGM_RSRC1(0@0x7e4d3c13dd00) == 0x600f0083
gfx1201.regCOMPUTE_PGM_RSRC2(0@0x7e4d3c13dd00) == 0xb84
gfx1201.regCOMPUTE_PGM_RSRC3(0@0x7e4d3c13dd00) == 0x0
pgm[8@0x7e4d3c182f00 + 0x0 ] = 0xf4002080 s_load_b64 s[2:3], s[0:1], 0x10
pgm[8@0x7e4d3c182f00 + 0x4 ] = 0xf8000010 ;;
pgm[8@0x7e4d3c182f00 + 0x8 ] = 0x8b13ff73 s_and_b32 s19, ttmp7, 0xffff
pgm[8@0x7e4d3c182f00 + 0xc ] = 0x0000ffff ;;
pgm[8@0x7e4d3c182f00 + 0x10 ] = 0xbfc70000 s_wait_kmcnt 0x0
pgm[8@0x7e4d3c182f00 + 0x14 ] = 0xbf118002 s_cmp_lg_u64 s[2:3], 0
pgm[8@0x7e4d3c182f00 + 0x18 ] = 0x980c80c1 s_cselect_b32 s12, -1, 0
pgm[8@0x7e4d3c182f00 + 0x1c ] = 0xbf108002 s_cmp_eq_u64 s[2:3], 0
pgm[8@0x7e4d3c182f00 + 0x20 ] = 0xbfa201c2 s_cbranch_scc1 450
pgm[8@0x7e4d3c182f00 + 0x24 ] = 0x84048213 s_lshl_b32 s4, s19, 2
pgm[8@0x7e4d3c182f00 + 0x28 ] = 0xf4000081 s_load_b32 s2, s[2:3], s4 offset:0x0
pgm[8@0x7e4d3c182f00 + 0x2c ] = 0x08000000 ;;
pgm[8@0x7e4d3c182f00 + 0x30 ] = 0xf4006100 s_load_b256 s[4:11], s[0:1], 0x34
pgm[8@0x7e4d3c182f00 + 0x34 ] = 0xf8000034 ;;
pgm[8@0x7e4d3c182f00 + 0x38 ] = 0xbfa60023 s_cbranch_execnz 35
pgm[8@0x7e4d3c182f00 + 0x3c ] = 0xbfc70000 s_wait_kmcnt 0x0
pgm[8@0x7e4d3c182f00 + 0x40 ] = 0x86029f04 s_ashr_i32 s2, s4, 31
pgm[8@0x7e4d3c182f00 + 0x44 ] = 0xbf870499 s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
...<snip>...
pgm[8@0x7e4d3c182f00 + 0x3e8 ] = 0x0000000f ;;
pgm[8@0x7e4d3c182f00 + 0x3ec ] = 0xffffb800 ;;
pgm[8@0x7e4d3c182f00 + 0x3f0 ] = 0xee05007c global_load_b32 v17, v[0:1], off
pgm[8@0x7e4d3c182f00 + 0x3f4 ] = 0x00000011 ;;
pgm[8@0x7e4d3c182f00 + 0x3f8 ] = 0x00000000 ;;
pgm[8@0x7e4d3c182f00 + 0x3fc ] = 0xd7006a00 v_add_co_u32 v0, vcc_lo, 0x120, v0
pgm[8@0x7e4d3c182f00 + 0x400 ] = 0x000200ff ;;
pgm[8@0x7e4d3c182f00 + 0x404 ] = 0x00000120 ;;
pgm[8@0x7e4d3c182f00 + 0x408 ] = 0x40020280 v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
pgm[8@0x7e4d3c182f00 + 0x40c ] = 0xbfc00008 s_wait_loadcnt 0x8
* pgm[8@0x7e4d3c182f00 + 0x410 ] = 0x34282905 v_ashrrev_i32_e32 v20, v5, v20
pgm[8@0x7e4d3c182f00 + 0x414 ] = 0xbfc00007 s_wait_loadcnt 0x7
pgm[8@0x7e4d3c182f00 + 0x418 ] = 0x322e2484 v_lshrrev_b32_e32 v23, 4, v18
...<snip>...
Here UMR found a kernel of 1836 bytes length (at address 0x7e4d3c182f00) where the PC of this particular wave is 0x410 bytes into the kernel text.