-->
Page 1 of 2

MQTT serious keepalive bug suspected? Any way around that?

PostPosted: Fri Jan 22, 2016 6:27 pm
by bubba198
Hi everyone,

I am doing my testing using NodeMcu 0.9.6 (Doit.am Version) build 20150701 powered by Lua 5.1.4

I've used MQTT and everything works as expected except I noticed some times the MQTT module would loose it's connection with the broker but NOT notify its internal m:on("offline", function(con) function so effectively the lua interpreter and from there -- the script do NOT know the low-level client has lost connection to the broker. The broker knows and publishes /LWT but that doesn't do me any good since I have a frozen client so to speak. The only way out is a reboot. Interestingly if I try to manually (from the console) reconnect I get stdin:1: already connected which clearly is NOT credible.

I then setup more of a lab testing environment where I can block the IP to the broker at will in order to facilitate testing. That lead to a sad results: I can make it loose connection to the broker in about 15 seconds and from that point forward the client is frozen. Obviously re-establishing connectivity to the broker does not fix this situation since all along the thing believes that it is connected so it never kicks the re-connect function. By "loose connectivity" I mean it stops receiving messages on the subscribed feeds and it acts as if published messages go just fine except no sign of those at the broker. For all intents and purposes there's no meaningful communication any longer between the MQTT client and the broker.

Can anyone share observations specifically on the scenarios where MQTT looses connectivity to the broker and effectively renders the module frozen in terms of MQTT functionality?

Thank you

Re: MQTT serious keepalive bug suspected? Any way around tha

PostPosted: Sun Jan 24, 2016 5:20 pm
by devsaurus
On a recent build from dev I observe that the m:on("offline", ...) callback is fired as soon as the server terminates eg.
Reconnecting is a bit tricky but feasible. That's my approach:

Code: Select allfunction connect(client)
    client:lwt("status", "offline", 2, 1)

    -- prepare another connection trial
    tmr.alarm(tmr_num, 1000, tmr.ALARM_SINGLE, function() connect(client) end)

    client:connect(broker_ip, 1883, 0, 0, function(conn)
        --print("connected, subscribing...")
        -- stop previously scheduled trial
        tmr.stop(tmr_num)
        tmr.unregister(tmr_num)

        ...

    end)
end

m:on("offline", function(client)
     --print("reconnecting...")
     -- close stale connection before reconnecting
     client:close()
     connect(client)
end)

Re: MQTT serious keepalive bug suspected? Any way around tha

PostPosted: Mon Jan 25, 2016 10:50 am
by bubba198
Thanks devsaurus

However I don't have tmr.unregister(tmr_num) as valid timer call. U checked the doc and there isn't tmr.unregister(x) as a valid timer module call. Could it be something else?

http://www.nodemcu.com/docs/timer-module/

Re: MQTT serious keepalive bug suspected? Any way around tha

PostPosted: Mon Jan 25, 2016 2:36 pm
by devsaurus
bubba198 wrote:http://www.nodemcu.com/docs/timer-module/

Hm, I see. These are probably outdated. The current docs for the firmware in development (this is what you get with http://nodemcu-build.com/) are at https://nodemcu.readthedocs.org/en/dev/.