Saturday, January 14, 2023

Internet of Crosstrainers, part 2

Introduction

As mentioned in the original post here, there were some issues in the described implementation. I had some difficulties to find the time and energy to fix them but now it's here. What's more, I have finally managed to share my first project on github! Please don't ask me why it took so long.

Simple fixes 

The first issue was related to the upload being stuck in case of any WiFi issues. The simple solution is to add a task with a timeout delay. The task should just end the operation regardless of whether the upload was completed or not. In esp-idf it looks pretty much like this.

static void timeout_task(void* arg) {
  vTaskDelay(1000ULL * UPLOAD_TIMEOUT_S / portTICK_PERIOD_MS);
  power_off();
}

The power_off() function should do the same things as what the device would normally do after an upload. This however brings up a new issue, which is loss of data since the device simply shuts down without uploading or storing the data. The answer is to store the data of course. The easiest solution is obviously to have a ring buffer to store the measurements data in the RTC memory of the ESP. This way the upload can be retried next time the device starts up.

The functionality of the device should thus be changed a little. First all the data should go through the ring buffer, since the device might timeout in the middle of upload. This can be implemented in other ways, but that's how I did it. After that the data upload should continue until the ring buffer is empty. Also it's important to check whether the buffer is full before adding any data. This shouldn't be an issue though because the ESP has lots of memory, because the size of data is small and because this is just a backup feature.

Another quite important detail is to add some kind of magic word to the RTC memory, since I'm not really sure how well it retains data and when it is actually cleared. If the magic word doesn't match, the data is lost and the buffer counters (which should also be in the RTC memory) should be cleared.

There is one thing missing here however. The original idea was that the ESP doesn't need to know what time is, only how long the exercise has lasted. The Google Apps Script should create a timestamp based on the upload time. In this case the data is stored without a timestamp and then uploaded later on, so the actual exercise start time is lost. The only solution would be to implement an RTC in the ULP code but I was simply too lazy to do that. The timestamp can be added manually and it doesn't need to be that accurate. So let's leave it at that.

New findings

While cleaning up stuff, I decided to update the PlatformIO ESP platform to the latest version. The platform includes esp-idf framework and other stuff. This update caused much more fun than I anticipated. First of all the upload would randomly end up with a stack overflow. Secondly the mentioned stack overflow would cause a boot loop with a "divide by zero" error. Luckily I realised that I should fix the boot loop first and then deal with the occasional stack overflow. While fixing with these I found another way to get into a completely unrelated boot loop. Well not really a boot loop. The device would operate quite normally without crashing, but still in a way it really shouldn't. However let me explain these in chronological order.

The first boot loop was an interesting one. I made sure that there is no division by zero in my code. For example when averaging the load using the number of revolutions (in case there are zero revolutions for some reason). What I didn't expect was that the division by zero is caused by an example code from the esp-idf page. The function rtc_clk_cal() will simply return 0 after a restart caused by stack overflow. That is a nice thing to know. This is also completely unnecessary code that I just left in for future reference. So that's one possible explanation for device being seemingly stuck as I have observed before.

The stack overflow itself was probably caused by doing the upload directly in the main function without a separate task. At this point I was lazy again and simply made one more array global so that there would be less memory usage in the main. Not the proper way to do it but I tested it multiple times and it seemed to work.

The last boot loop was the most interesting one. I stumbled upon it by accident and had to figure out how to reproduce it. I then figured out that it was caused by simply holding down the IO0 button, which is used for reading the reed switch state. I then spent a few hours trying to figure out why it caused a boot loop. At first I thought that it's because the previous state of the reed pin* is always reset to zero and thus a new edge is detected at each startup of the ULP. I tried to fix that but it didn't solve the problem. I tried to fix it in another way and it did something but not in a way I intended. Which lead to a revelation.

I originally understood that the ULP core would be shut down after starting the main CPU, but apparently I was wrong. The conclusion was that the ULP program continues to function in the background and immediately wakes up the main CPU after it has initiated deep sleep. It shouldn't, but it does. I first checked from the code that it's exactly how it works. I then remembered that I have the diagram based on which this ASM code was written. I checked the diagram and the issue was also very obviously seen there. I have no idea how did I do such a silly mistake but that is now fixed and the code got actually shorter by three lines. Below is the fixed diagram.

Fixed diagram of the ULP functionality

* This idea by itself was a mistake. The state is reset to zero and it first has to go high to be able to detect a falling edge. I realised it while writing this and decided to leave it in.

Final words

This was a rather fun update with lots of revelations. The good thing is that the major issues have now been fixed and that updating the ESP platform to 5.3.0 didn't cause any issues (related to the platform itself that is). The greatest thing however is that I've managed to share the source on github. Hopefully it will be much easier for me to do in the future.

No comments:

Post a Comment

Internet of Crosstrainers, part 2

Introduction As mentioned in the original post here , there were some issues in the described implementation. I had some difficulties to fin...