_UserExceptionVector and _UserExceptionVector_1 are actually compiled to different sections, so the compiler doesn't know the address of _UserExceptionVector_1 at compile time..
It sounds like you're implying 'j' cannot be retargeted link-time, and that it must be a 'call0' for that reason. I never really considered it, and never noticed because I do everything all-together.
Actually, writing this out makes me think that you might be better off creating your own set of vectors as well - just like the esp-open-rtos approach.
This feels like a very heavy approach, and feels like it might not be as compatible with other systems. Though I agree it is a lot cleaner. Speaking of that...
This also means you don't have to --wrap anything at all, or memcpy random bits of memory.
I did... and it works... Sort of... It can enumerate and works. My latency went from 1.86us to 0.78us! (Latency can be decreased further on my end) A savings of about 74 cycles. I say "sort of" because, unless I attach my handler to the existing interrupt system as well, everything breaks. I am guessing sometimes there is an interrupt already being handled and if levels change while in their handler, everything explodes because the handler is null (Which is OKAY! I can simply throw out the interrupt)
(in a .S, anywhere...)
//This code will be memcpy'd over top of _UserExceptionVector, since I can't figure out how to override it with GCC.
.global replacement_user_vect
.align 4
replacement_user_vect:
//Original code:
_wsr.excsave1 a0
_rsr.interrupt a0
_bbci a0, 4, not_a_gpio_interrupt
_nop //Will get replaced with a "call0" to my code. (TODO: Can this be a jump?)
not_a_gpio_interrupt:
_call0 _UserExceptionVector_1
_ill
_ill //Zero padding to make it so we can see it clearly in a memory dump.
And, here's my memcopy surgery nastyness, not yet edited or commented... But it creates the 3-byte opcode of what the operations there should be since we can't rely on the compiler to relocate nicely.
int i;
uint8_t * ovect = (uint8_t*)0x40100050;
uint32_t * ovect32 = (uint32_t*)0x40100050;
uint8_t * replacevect8 = (uint8_t*)(&replacement_user_vect);
uint8_t * targ8 = (uint8_t*)(&_UserExceptionVector_1);
uint8_t vect8copy[16]; //We only need 16 bytes.
ets_memcpy( vect8copy, (&replacement_user_vect), 0x20 );
//+1 to +4 (When 'call' instruction is at +3)
//+5 to +8 (When 'call' instruction is at +5)
//+5 to +8 (When 'call' instruction is at +6)
//+9 to +12 (When 'call' instruction is at +9)
//+13 to +16 (When 'call' instruction is at +12)
int delta_gp = ((uint8_t*)&gpio_intr) - (ovect+11);
int delta_ue = targ8 - (ovect+14);
delta_ue = (delta_ue & ~0x03)<<4;
delta_gp = (delta_gp & ~0x03)<<4;
//for call0 to the gpio handler.
vect8copy[9] = 0x05 | (delta_gp & 0xff); //lsb of jump
vect8copy[10] = (delta_gp >> 8)&0xff; //...
vect8copy[11] = (delta_gp >> 16)&0xff; //msb of jump.
//For call0 to the normal handler
vect8copy[12] = 0x05 | (delta_ue & 0xff); //lsb of jump
vect8copy[13] = (delta_ue >> 8)&0xff; //...
vect8copy[14] = (delta_ue >> 16)&0xff; //msb of jump.
printf( "%08x %08x %08x\n", ovect, replacevect8, delta_ue);
for( i = 0; i <0x10; i++ )
{
printf( "%02x ", vect8copy[i] );
}
ovect32[0] = ((uint32_t*)vect8copy)[0];
ovect32[1] = ((uint32_t*)vect8copy)[1];
ovect32[2] = ((uint32_t*)vect8copy)[2];
ovect32[3] = ((uint32_t*)vect8copy)[3];
Regarding the xtos stuff... cool. I didn't see any of that, but I made a random guess and restore the state myself with:
_rsr.excsave1 a0
rsync
rfe
There's a real gotcha with understanding Xtensa: exceptions and interrupts are two different things.
That is a small problem. I still have enough room in my vector to handle that if I have to. It shouldn't be too bad to read in EXCCAUSE and jump to the regular handler if it's not set to 4. But... of the nice things about this USB mess is it's okay if I miss interrupts or sometimes call the interrupt without anything legitimate - though it would be better to be clean, for initial testing, things can be very dirty. Especially in USB Full speed! With full speed, I can tell if the interrupt was spurious in about .3-.4us.
If you push the non-working wrap code & makefile you have to a branch somewhere, I can probably take a look. But like I said above, maybe you're better off just writing your own set of vectors and leaving the SDK ones as-is.
I don't know if it's worth your time - I have already asked stack overflow etc. I think I have everything I critically need now to start investigating full-speed.
P.S. I really appreciate your willingness to work with me on this, especially since there's a very high chance it will all be for naught.