Add fast path for interpreter to interpreter invokes.

This speeds up arm64 golem interpreter benchmarks by 5%
on average with some invoke-heavy ones up to 40% faster.

Test: test.py --host -b
Change-Id: I66069fd391488409b9e3e32127c88ee3d889b076
2 files changed