One difference from resweet's problem is where the delays occur. In his case each client.println() call took ~400 millis to execute. In my case it was only the last one that was slow:
~20ms client.println(1st header);
~20ms client.println(2nd header);
~20ms client.println(3rd header);
...
~5 seconds client.println(last header);
Changing the last bit to client.print() solved the problem.