I am new to this forum and I am currently struggling with an issue regarding my ESP8266 software project. I am not sure if this is due to me misusing the Arduino libraries or an actual bug in the libraries itself. Therefore, I wanted to ask around in the forum before publishing a full GitHub issue on the topic.
I have already conducted a bit of debugging, testing and research, but as already stated it was inconclusive. Thus, I hope I can give you a clear enough picture of what I am trying to achieve, and more importantly what kind of problem I am currently facing.
A short overview on what I am trying to achieve
I am currently working on an ESP8266 based IoT device which is supposed act as a wireless dongle for certain electric smart meters. It listens for data on its serial port, processes them a bit and posts them via wifi to a MQTT endpoint. For this I use the PubSubClient library with a WifiClientSecure that has a SHA1 certificate fingerprint set. Every few seconds a serial data packet is read, converted to JSON and sent via MQTT. This configuration has been running for several weeks without any trouble.
Now I have been working on also hosting a webserver as well on the device, to make configuration for users a little simpler. I want to keep the web traffic secure without rolling some custom encryption scheme, which is why I opted for the BearSSL::ESP8266WebServerSecure. Here is where I have been running into problems.
Problem description
To be precise, the problem is twofold: Is was able to fix the first one using a rather "dumb" work around, but the second one is the one this post is concerned with. I am briefly including the first one, as I think that they are either connected or are at least in the same vein.
Problem 1: The webserver and wificlient (used by the MQTT PubSubClient) cannot coexist if SSL is enabled for both of them at the same time. While the webserver can run and handle clients without disturbing the wificlient, the wificlient on the other hand breaks the webserver permanently after sending something. This seems to be an at least somewhat known and long standing issue as can be seen by this Stackoverflow question, where someone encountered the same problem in April 2020. To be honest I do not fully understand the provided answer (marked as accepted by the OP) of creating a "provisioning" server. I solved the issue on my end in a rather dumb way: Whenever a MQTT message is sent through the wificlient, the whole webserver and wificlient are torn down (destructed & deallocated) and reinitialized. This has been working so far, although it is a very much unsatisfactory solution.
Problem 2: The second and current problem is as follows: While the fix described above allows the webserver and wificlient to both be used with SSL, the system now suffers from random crashes. Taking a look at the serial output (listed below), it can be seen, that the system crashes due to a software reset triggered by tripping the watchdog timer. At first I did not realize the full extent of how unstable the device is in this configuration, but switching my terminal emulator to repeatedly send serial packets every five seconds the same way a smart meter would, it turns out that crashes sometimes happen after only 4 to 5 packets. However, the crashes happen completely randomly, sometimes the system remains functional for more than 30 packets, sometimes it crashes after the second. As much as I can tell, removing the call to WebServer::handleClient() prevents the crash, but renders the webserver useless of course.
Configuration description
- Hardware: ESP-12E (Arduino NodeMCU 1.0)
- Core Version: v3.1.1 (installed via the Arduino IDE)
- Development Env: Arduino IDE
- Operating System: Windows
- Module: Nodemcu
- Flash Mode: ?
- Flash Size: 4MB
- lwip Variant: v2 Lower Memory
- Reset Method: ?
- Flash Frequency: ?
- CPU Frequency: 80Mhz
- Upload Using: SERIAL
- Upload Speed: 115200
Source code
The source code is publicly accessible on GitHub Tobias0110/EVN_Kaifa_ESP_MQTT in the dev-webserver branch.
Stack trace
I have captured multiple stack traces. One of them is shown below:
--------------- CUT HERE FOR EXCEPTION DECODER --------------
Soft WDT reset
>>>stack>>>
ctx: sys
sp: 3ffff560 end: 3fffffb0 offset: 01a0
3ffff700: 00000000 00051a8d 5645a1ca 05e6b36b
3ffff710: 00000000 c02fc02b c030c02c 4020994a
3ffff720: 00003a98 000679c2 3fff15e4 4020b04f
3ffff730: 000000dc 3fff4e64 3fff4464 4022b390
3ffff740: 00000008 00000001 00050f63 00000000
3ffff750: 3fff4964 4022c4f4 3fff4464 00000000
3ffff760: 3fff4964 3ffef69c 3fff4464 3ffef69c
3ffff770: 00000000 00000001 3fff15e4 4020b17b
3ffff780: 00000000 00000001 3fff15e4 4020b7d3
3ffff790: 00000000 00050dec 3e353f7c 05d81927
3ffff7a0: 00000000 00000000 00000034 00050dec
3ffff7b0: 00003a98 3fff43dc 3fff15e4 4020a183
3ffff7c0: 3fff0cec 00001e00 00000002 00000000
3ffff7d0: 3fff1e00 00000003 00000000 3ffef9c8
3ffff7e0: 000022b3 3ffef69c 3fff15e4 4020b875
3ffff7f0: 40213b6c f683a8c0 40213b6c f683a8c0
3ffff800: 00000000 3fff3831 3fff3846 4020c4dc
3ffff810: 00000008 00000000 000021fe 00000009
3ffff820: 00000000 3fff381c 00000000 00000000
3ffff830: 00000001 0000000b
I am using the CLI version of the ESP8266 stack trace decoder by littleyoda hosted on GitHub. Running my stack dump through it gives me the following stack trace/call stack:
0x4020994a: WiFiClient::available() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClient.cpp:257
0x4020b04f: esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::expiredOneShot() const at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:264
(inlined by) esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::expired() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:164
(inlined by) esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::operator bool() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:170
(inlined by) BearSSL::WiFiClientSecureCtx::_run_until(unsigned int, bool) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:492
0x4022b390: br_ssl_engine_closed at /home/earle/src/esp-quick-toolchain/arduino/tools/sdk/ssl/bearssl/src/inner.h:2211
(inlined by) jump_handshake at /home/earle/src/esp-quick-toolchain/arduino/tools/sdk/ssl/bearssl/src/ssl/ssl_engine.c:1081
0x4022c4f4: br_ssl_hs_client_run at /home/earle/src/esp-quick-toolchain/arduino/tools/sdk/ssl/bearssl/src/ssl/ssl_hs_client.c:913
0x4020b17b: BearSSL::WiFiClientSecureCtx::_wait_for_handshake() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:609
0x4020b7d3: std::shared_ptr<br_x509_minimal_context>::operator=(std::shared_ptr<br_x509_minimal_context>&&) at ~\appdata\local\arduino15\packages\esp8266\tools\xtensa-lx106-elf-gcc\3.1.0-gcc10.3-e5f9fec\xtensa-lx106-elf\include\c++\10.3.0\bits/shared_ptr.h:384
(inlined by) BearSSL::WiFiClientSecureCtx::_connectSSL(char const*) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:1206
0x4020a183: void esp_delay<ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}>(unsigned int, ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}&&, unsigned int) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/coredecls.h:69
(inlined by) void esp_delay<ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}>(unsigned int, ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}&&) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/coredecls.h:78
(inlined by) ClientContext::connect(ip4_addr*, unsigned short) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/include/ClientContext.h:147
(inlined by) WiFiClient::connect(IPAddress, unsigned short) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClient.cpp:164
0x4020b875: BearSSL::WiFiClientSecureCtx::connect(char const*, unsigned short) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:229
0x40213b6c: std::_Function_handler<bool (), settimeofday::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Function_handler<bool (), settimeofday::{lambda()#1}> const&, std::_Manager_operation) at time.cpp:?
0x40213b6c: std::_Function_handler<bool (), settimeofday::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Function_handler<bool (), settimeofday::{lambda()#1}> const&, std::_Manager_operation) at time.cpp:?
0x4020c4dc: PubSubClient::connect(char const*, char const*, char const*, char const*, unsigned char, bool, char const*, bool) at ~\Documents\Arduino\libraries\PubSubClient\src/PubSubClient.cpp:198
(inlined by) PubSubClient::connect(char const*, char const*, char const*, char const*, unsigned char, bool, char const*, bool) at ~\Documents\Arduino\libraries\PubSubClient\src/PubSubClient.cpp:181
(I replaced part of my folder structure with "~" as a common root in the trace above.)
As you can see the program calls PubSubClient::connect() which internally calls BearSSL::WiFiClientSecureCtx::connect(). Finally BearSSL::WiFiClientSecureCtx::_run_until() is called which contains a loop. It looks like the system crashes while running this loop.
A different stack trace again shows a crash in ::connect().
--------------- CUT HERE FOR EXCEPTION DECODER ---------------
Soft WDT reset
>>>stack>>>
ctx: sys
sp: 3ffff540 end: 3fffffb0 offset: 01a0
3ffff6e0: 00000000 000034f8 3a1cac08 003d3f03
3ffff6f0: 402612be 3fff0400 402648b3 00000002
3ffff700: 00000001 00000000 3fff4454 40209daa
3ffff710: 00000000 00f55373 00003a98 00000000
3ffff720: 00000000 c02fc02b c030c02c 00000000
3ffff730: 00003a98 0005b4df 3fff15e4 4020b01c
3ffff740: 000000dc 3fff4f1c 3fff451c 4022b354
3ffff750: 00000008 00000001 000034f7 00000000
3ffff760: 3fff4a1c 4022c4b8 3fff451c 00000000
3ffff770: 3fff4a1c 3ffef69c 3fff451c 3ffef69c
3ffff780: 00000000 00000001 3fff15e4 4020b13f
3ffff790: 00000000 00000001 3fff15e4 4020b797
3ffff7a0: 00000000 000034c2 31a9fbe7 003d0089
3ffff7b0: 00000000 00000000 00000034 000034c2
3ffff7c0: 00003a98 3fff4454 3fff15e4 4020a147
3ffff7d0: 3fff0af4 00001e00 00000002 00000000
3ffff7e0: 3fff1e00 00000003 00000000 3ffef9c8
3ffff7f0: 000022b3 3ffef69c 3fff15e4 4020b839
3ffff800: 40213b30 f683a8c0 40213b30 f683a8c0
3ffff810: 00000000 3fff3831
0x402612be: system_get_sdk_version at ??:?
0x402648b3: etharp_output at ??:?
0x40209daa: ClientContext::_write_from_source(char const*, unsigned int) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/include/ClientContext.h:480
0x4020b01c: esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::checkExpired(unsigned long) const at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:239
(inlined by) esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::expiredOneShot() const at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:264
(inlined by) esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::expired() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:164
(inlined by) esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::DoNothing, esp8266::polledTimeout::TimePolicy::TimeUnit<esp8266::polledTimeout::TimePolicy::TimeSourceMillis, 1000ull> >::operator bool() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/PolledTimeout.h:170
(inlined by) BearSSL::WiFiClientSecureCtx::_run_until(unsigned int, bool) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:492
0x4022b354: br_ssl_engine_closed at /home/earle/src/esp-quick-toolchain/arduino/tools/sdk/ssl/bearssl/src/inner.h:2211
(inlined by) jump_handshake at /home/earle/src/esp-quick-toolchain/arduino/tools/sdk/ssl/bearssl/src/ssl/ssl_engine.c:1081
0x4022c4b8: br_ssl_hs_client_run at /home/earle/src/esp-quick-toolchain/arduino/tools/sdk/ssl/bearssl/src/ssl/ssl_hs_client.c:913
0x4020b13f: BearSSL::WiFiClientSecureCtx::_wait_for_handshake() at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:609
0x4020b797: std::shared_ptr<br_x509_minimal_context>::operator=(std::shared_ptr<br_x509_minimal_context>&&) at ~\appdata\local\arduino15\packages\esp8266\tools\xtensa-lx106-elf-gcc\3.1.0-gcc10.3-e5f9fec\xtensa-lx106-elf\include\c++\10.3.0\bits/shared_ptr.h:384
(inlined by) BearSSL::WiFiClientSecureCtx::_connectSSL(char const*) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:1206
0x4020a147: void esp_delay<ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}>(unsigned int, ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}&&, unsigned int) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/coredecls.h:69
(inlined by) void esp_delay<ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}>(unsigned int, ClientContext::connect(ip4_addr*, unsigned short)::{lambda()#1}&&) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/coredecls.h:78
(inlined by) ClientContext::connect(ip4_addr*, unsigned short) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/include/ClientContext.h:147
(inlined by) WiFiClient::connect(IPAddress, unsigned short) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClient.cpp:164
0x4020b839: BearSSL::WiFiClientSecureCtx::connect(char const*, unsigned short) at ~\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\ESP8266WiFi\src/WiFiClientSecureBearSSL.cpp:229
0x40213b30: std::_Function_handler<bool (), settimeofday::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Function_handler<bool (), settimeofday::{lambda()#1}> const&, std::_Manager_operation) at time.cpp:?
0x40213b30: std::_Function_handler<bool (), settimeofday::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Function_handler<bool (), settimeofday::{lambda()#1}> const&, std::_Manager_operation) at time.cpp:?
Theories & Assumptions
I do not think I am dealing with an OOM issue here. While I am not super aggressive with saving memory right now, except one data structure everything is stack allocated. Further, even when the whole webserver infrstructure (certificates, connection cache, page templates and the server itself) is present, but only the call to ::handleClient() is removed, everything seems fine. (I say "seems", as the problem happens randomly, and I did not encounter it while testing in this case.)
To me it looks like the Arduino libraries really do not like it, if a webserver and a wificlient both are configured to use SSL at the same time. But why would that be, and shouldn't more people have run into this issue? Hosting a secure server and doing secure requests on the same device does not sound like an uncommon task.
Am I missing something obvious right here, or is this an actual problem with the Arduino libraries?
If anyone more knowledgeable than me could give me some pointers, it would be greatly appreciated. Please let me know if you need any further information or questions!
Thank you in advance. :^)