Yeah... I screwed up with the CS pin... the pin header had a bad joint so "there was nothing" on the pinheader... I checked at the module "pin/contact" and there it was! Damn!
And the CS is showing just "right" - it goes down (active) when transfer starts (no delays) and returns to high when the SPICLK stops.
Anyway, I'm using a esp-03 or 04 (the one without antenna) and all "tied" (for boot, flash...) pins are "pulled up or down" with resistors, no direct VCC/GND connections.
I understand the issue of the time needed to fill the fifo and that its locked while in use, but if you look at the code, the fifo is filled just once. The only thing done once the transfer is done (when the SPI_USR bit goes low) is a transfer restart (SPI_USR is set again) - nothing else is touched and it still needs a LOT of time to start sending data out.
There must be something blocking/slowings this down...
I know in normal SPI this should not be an issue but its not used as "normal spi" and the gaps in data/clk streams makes a lot of difference:
Another strange thing I noticed is some strange "code slowdown" when filling the spi fifo.
To be as fast as possible I went the asm route... and use a sequence of 16 l32i / s32i pairs to move data in fifo.
The strange thing is that 8 pairs need 8 cpu clk cycles, 12 pairs need 20 cycles and 16 pairs need 64 cycles!!! This is measured with the CCOUNT (cycle count) register.
Seems like some "cache miss" or something... but all references states there is no cacse in the cpu. I could understand that a write to a periph register could be slower (as the gpios are) but that should be constant and not changing in such "growing" way.
Any idea why is this happening or, better, how to avoid that slowdown?
Best regards