On those systems CPU load is the percentage of time spent running tasks. ESP8266 has no "tasks" as such* hence the only sensible figure you could get using that argument is 0% - ALL the CPU power is available to you. (*I'm excluding the "background" WiFi and any RTOS SDK for the sake of simplicity). Another argument is that since yours is the only code and is always running, the load is always 100%
So the first question must be: "what are you trying to measure / show"?
If the answer is to show how much time is my cpu doing "useful work" then YOU must define what "useful work" is. No MCU or software framework can know that - which is why there is no standard or "easy" way to do this, because it's user-defined.
I have firmware that does something like this and even shows a rolling graph in the web UI...but the firmware includes its own scheduler, so I have a useful metric: Tsched / Tloop *100 i.e. the percentage of time spent in scheduled tasks as a proportion of time in the main loop NOT running a scheduled task. In effect the main loop becomes the "idle" task. (See video https://www.youtube.com/watch?v=i9hjpYnfQoc @ 4:00 onwards. N.B. The firmware no longer includes CPU% mainly because there are much better / more useful metrics)
But see what just happened there - I am talking again about task scheduling and idle time: like an OS. It is only in this context that the question makes any sense at all. With just your own code, the question is meaningless - hence perhaps why there have been no answers - the simple truth is: there isn't one!
I suspect what the OP is asking is how to measure the amount of time spent inside certain functions, i.e. real-time profiling. he will need to "instrument" those functions and maintain his own figures - there is no automatic way to do this. If he takes the difference between cycle count on entry and exit then he can comparte this with max possible cycles in a given "time slice" by knowing the clock speed and can thus show % of CPU used by the instrumented functions (only) in that time slice. He would also need to ensure that all function exit points were timed, or that there was always a single common exit point.
This would require something like a Ticker function once per second (the actual length of the slice is irrelevant) and a global variable cyclesUsed. Each instrumented function adds its own no. of cycles used (Ccount EXIT - Ccount ENTRY) to the global. On each tick, calculate the proportion of used cycles vs total available cycles in one time slice. If he wanted accurate values he would also need to subtract the number of cycles used by the ticker function and % calculation. If he doesn't care about accuracy, then it again begs the question: what is he trying to measure and why?
So while that is a working exmaple of real-time profiling, the figure it produces is not very helpful as... what can you do with it? what differences / changes can you make based upon it? what is it actually telling you? It tells you nothing about true CPU load unless you a) instrument every single function b) edit the whole body of the ESP core to instrument all those functions too.
I hope this helps explain the difficulties in answering the question. The only better answers will come when we know what the OP thinks he wants to measure.