-->
Page 1 of 2

nodemcu-firmware sntp weirdness

PostPosted: Wed Jul 29, 2020 12:58 pm
by gws
Posting here rather than as a bug report because I do not have a MCVE, have not tried additional SNTP servers, have not tried the non-lua OEM nodemcu firmware, do not have a packet dump, etc (omg, I feel like the most terrible engineer)... but I'm wondering if atypical SNTP client behavior is a thing that others have seen. My searching hasn't turned anything up, but a quick ping to the community feels helpful to push me down the path of discovery. I tend to believe that my busybox ntp server is more reliable than the sntp client here, which may be a terrible assumption. It also does not seem like time slews slowly on the client, which makes me think the client is misbehaving, even if the server returns jumps in time.

In a nutshell, I occasionally see time jump around wildly across several syncs (offset>=-4295s) before coming back. In the log below (taken from mqtt output), the two nodes are reporting back their sntp return values via mqtt, so the timestamp on the left is local linux time (accurate). I have manually confirmed that the output of rtctime.get() jumps around on these nodes when the ntp offset is large. There are usually only a couple of strange offsets before time returns to normal, but in the case below it lasted several hours before returning (back to normal @ ~13:25).

Code: Select allJul 28 06:29:01 node2/debug sntp sync success: curr=1595917741s offset=25086us server=192.168.2.1 delay=4226us
Jul 28 06:29:14 node1/debug sntp sync success: curr=1595917754s offset=27651us server=192.168.2.1 delay=4333us
Jul 28 06:45:42 node2/debug sntp sync success: curr=1595918742s offset=-4295s server=192.168.2.1 delay=112731us
Jul 28 06:45:54 node1/debug sntp sync success: curr=1595918754s offset=110634us server=192.168.2.1 delay=3952us
Jul 28 07:02:26 node2/errors sntp sync failed! (pre-MQTT) type=4 err=nil
Jul 28 07:02:39 node1/errors sntp sync failed! (pre-MQTT) type=4 err=nil
Jul 28 07:19:01 node2/debug sntp sync success: curr=1595920741s offset=-8590s server=192.168.2.1 delay=6423us
Jul 28 07:19:14 node1/debug sntp sync success: curr=1595920754s offset=-4295s server=192.168.2.1 delay=4165us
Jul 28 07:35:41 node2/debug sntp sync success: curr=1595921741s offset=-111417us server=192.168.2.1 delay=4898us
Jul 28 07:35:54 node1/debug sntp sync success: curr=1595921754s offset=-4295s server=192.168.2.1 delay=3631us
Jul 28 07:52:22 node2/debug sntp sync success: curr=1595961395s offset=-184899us server=192.168.2.1 delay=313262us
Jul 28 07:52:34 node1/debug sntp sync success: curr=1595927049s offset=-4296s server=192.168.2.1 delay=4928us
Jul 28 08:09:01 node2/debug sntp sync success: curr=1595923741s offset=-42949s server=192.168.2.1 delay=3463us
Jul 28 08:09:14 node1/debug sntp sync success: curr=1595923754s offset=-4295s server=192.168.2.1 delay=3753us
Jul 28 08:25:46 node2/errors sntp sync failed! (pre-MQTT) type=4 err=nil
Jul 28 08:25:59 node1/errors sntp sync failed! (pre-MQTT) type=4 err=nil
Jul 28 08:42:21 node2/debug sntp sync success: curr=1595925741s offset=-4295s server=192.168.2.1 delay=3601us
Jul 28 08:42:34 node1/debug sntp sync success: curr=1595925754s offset=-8590s server=192.168.2.1 delay=62850us
Jul 28 08:59:01 node2/debug sntp sync success: curr=1595926741s offset=-4295s server=192.168.2.1 delay=3936us


Code: Select all> for k,v in pairs(node.info("sw_version")) do print (k,v) end
git_branch      master
git_release     3.0-master_20200610
git_commit_id   8d091c476edf6ae2977a5f2a74bf5824d07d6183
node_version_minor      0
git_commit_dts  202006092026
node_version_revision   0
node_version_major      3
> for k,v in pairs(node.info("build_config")) do print (k,v) end
ssl     true
number_type     float
modules adc,bit,dht,file,gpio,i2c,mqtt,net,node,ow,rtcmem,rtctime,sntp,spi,tls,tmr,uart,wifi,ws2801,ws2812,ws2812_effects
lfs_size        0


Server is openwrt, CHAOS CALMER (15.05.1, r48532) busybox ntpd.

Re: nodemcu-firmware sntp weirdness

PostPosted: Thu Jul 30, 2020 1:49 pm
by gws
Oh gosh, as frequently happens, attempting to clarify and document the problem may have inadvertently led to a solution. New question: does anyone NOT do this AND use sntp on nodemcu-firmware?

wifi.sta.sleeptype (wifi.NONE_SLEEP)

Disabling wifi sleep from its default state seems to have made a difference - 3 of 3 nodes have been up around 24 hours with NO time glitches (including the less frequent but annoying sntp.sync type 4 network timeout error). I had assumed that power saving was okay to leave on, but if the sntp.sync code does not leave time to spin up the radio, that's problematic.

I'll leave this up as a breadcrumb for anyone receiving sntp.sync() type 4 errors or large offsets, and at some point lobby the devs on github to update the documentation to suggest disabling modem sleeping when using SNTP. Please do chime in if you have any knowledge about this or are affected. Cheers!

Re: nodemcu-firmware sntp weirdness

PostPosted: Thu Jul 30, 2020 2:31 pm
by marcelstoer
I am sorry, no idea. I assume other (non-IoT) devices sourcing their time from the NTP pool don't show anything remotely similar?

Try to create a MCVE logging time to console. Ideally also try to eliminate modules you don't need for the MCVE to rule out side effects. Try with and without rtctime (i.e. only sntp) to see identify the culprit.

Re: nodemcu-firmware sntp weirdness

PostPosted: Fri Jul 31, 2020 2:46 pm
by marcelstoer
gws wrote:I had assumed that power saving was okay to leave on, but if the sntp.sync code does not leave time to spin up the radio, that's problematic


Not sure I fully understand why can't you trigger the sync manually every time you (re-)connect the WiFi?