-->
Page 1 of 1

WiFiClient fails when called very frequently

PostPosted: Thu Nov 24, 2016 11:21 am
by fgomes
Hi,

In order to test the software and hardware stability, I've reduced the time between calls to a function that uses WiFiClient to make an HTTP GET and read the server response. The function is the following:

Code: Select allbool sendGatewayInfo()
{
    WiFiClient client;
   
    if (client.connect(myHost, myPort)) {
      printLog("Send gateway: connected...");
      sprintf(myUrl, "/input/post.json?node=%d&apikey=%s&json={millis:%ul,heap:%d,rssi:%d}", NODEID, apikey, millis(), ESP.getFreeHeap(), WiFi.RSSI());
      client.print(String("GET ") + myUrl + " HTTP/1.1\r\n" + "Host: " + myHost + ":" + myPort + "\r\n" + "User-Agent: VLC/2.2.1 LibVLC/2.2.1\r\nConnection: close\r\n\r\n");
      if(client.connected()){   
        unsigned long next = millis() + 5000;
        while(!client.available()) {
          yield();
          if(!client.connected()) {
            client.stop();
            serverFail++;
            printLog("Server failed 2.0\r\n");
            return false;
          }
          if ((long)(millis()-next) > 0) {
            Serial.println(">5s waiting for server reply");
            Serial.println(client.available());
            client.stop();
            serverFail++;
            printLog("Server failed 2.1\r\n");
            return false;
          }
        }
        if(client.connected() && client.available()) {
          client.read(buf, client.available());
          printLog((char*)buf);   
          client.stop();     
          serverFail = 0;
          return true;
        }
      } else {
        serverFail++;
        printLog("Server failed 2.2\r\n");         
      }
    } else {
      serverFail++;
      printLog("Server failed 2.3\r\n");       
    } 
    client.stop();
    return false;
}


This is the call in the main loop:

Code: Select allif((long)( millis()-lNext) >= 0) {
    lNext += 500;
    if(!sendGatewayInfo()) {
      printLog("Send gateway info failed!");
    }
}


I'm usually calling this function in the loop function, only once per minute. In order to stress test the system I started calling it once per second, and it worked well (more than 8 hours running without any failure). But if I reduce the time in the loop to 500ms, I started to have failures, the node reports that doesn't have received the answer from the server, and some times it even crashes (with exception 28 and 29). So my doubt is what could be the cause? Should I guarantee some time between connections in order to have some 'hosekeeping' done in the background? I don't need to transmit every 500ms but I want to understand the failure, because in the field this node is blocking after some days working, and the problem could be related to this one - perhaps if two connections are made very close in time could trigger this situation?

Best regards

Fernando

Re: WiFiClient fails when called very frequently

PostPosted: Tue Nov 29, 2016 12:10 pm
by mrburnette
Stress testing is all fine and well, but you are "modifying" the sketch to test which makes the test invalid, IMO. You must test as will be used.

As to stability, consider that your Sketch Code runs under a simple task manager, NON-OS

Your loop() must repeat within 30mS - 50mS maximum or the RF section will go stale. Therefore, you must do some software architecture thinking and ensure that if you have a long loop/function that you break-it-up with delay(0) or delay(1) or yield().

Ray

Re: WiFiClient fails when called very frequently

PostPosted: Tue Nov 29, 2016 12:43 pm
by fgomes
Hi Ray

Thanks for your reply, it pointed to something that I'm not currently verifying, the total loop time and if it must be split or some delays / yelds introduced. When I first detected some long term stability issues I introduced a log file (using SPIFFS) and a web server to read the log file, and things seems to get worse, so that could be related with taking to much time in the loop (of course it will depend on the SPIFFS and ESP8266WebServer implementation, but for sure loop will take more time if I use them). The problem is to recreate here the issue that take about one week to occur on the real site, where I have no one to interact with the system. I'm trying to simulate the possible fail modes - failing WiFi, failing internet connection but keeping WiFi available (both in long term and short term failures), and I didn't detect any problem. In both failures (WiFi and Internet), if it is a long term failure I force the ESP reset. In the real site, I never had it working for more than one week, it reports data every minute, and after some days working it stops reporting data. I have also included in the cyclic report the heap size and milis(), and all seems to go well before stopping (with more than 30k heap available). Just as an additional information I'm using LoLin NodeMCU V3 (with ESP-12E), 1M for code and 3M for SPIFFS.

Best regards

Fernando