Aug-10-2022, 08:42 PM
Hey everybody,
I have some trouble finding an elegant solution to my parallelism problem. The task is quite simple, I get data, I process data, I output the result. This runs forever, or at least as long as the program is meant to run. Since this is the sole purpose of the program (and in fact, the entire rasp pi this is running on) I don't mind that this completely uses all CPU power, I even want that.
The interesting part is, the processing step is largely unchanging. Basically some values are computed (based on the data) and then the final result is computed with the input and the params. Now these two steps differ in runtime a lot, the first one taking about 5secs, whereas the second one only around 0.05s. Luckily though, the are params almost unchanging, since the input data always has the same characteristics.
So what I obviously do, is calculating the params once on the initial data and then only applying them in the loop. However, what I wanna do to account for possible drifts in the params, is recalculating them periodically.
I read a lot about parallelism, concurrency, asyncio. While I'd love a solution with asyncio, I don't see how non-preemptive scheduling helps with such a CPU-bound task (correct me if I'm wrong). Another simple idea is, to just run the recalculation part in a separate thread. That works fine, however due to the whole GIL and threading problem, this would slow down my program too much (I need the main processing to run in about 0.2s max). So I landed at multiprocessing and I found a working solution, yet I find it very ugly.
The main problem being, that the computation of the params actually has to return the result to the main process. My approach is to check in the main loop if a recalculation takes place, if not, start a new daemon process. Since I can't just join that process in the main loop, I create a thread which joins the process and then updates the parameters. As if that doesn't sound complicated enough, the question is how to transfer the result. Right now I'm using the Value Object from the multiprocessing module, but this supports only basic data types or custom ctypes structures. My example code is the following (the sleeps are just to slow everything a little down):
Now, thanks for reading this far, my questions are the following: Is this the right approach? Or is there a better or more elegant way to get this done regarding the concurrency? And second, what about the result communication? The example works, but that's just a single integer, I have multiple ints, a numpy array and some more stuff.
Thanks again,
Plexian
I have some trouble finding an elegant solution to my parallelism problem. The task is quite simple, I get data, I process data, I output the result. This runs forever, or at least as long as the program is meant to run. Since this is the sole purpose of the program (and in fact, the entire rasp pi this is running on) I don't mind that this completely uses all CPU power, I even want that.
The interesting part is, the processing step is largely unchanging. Basically some values are computed (based on the data) and then the final result is computed with the input and the params. Now these two steps differ in runtime a lot, the first one taking about 5secs, whereas the second one only around 0.05s. Luckily though, the are params almost unchanging, since the input data always has the same characteristics.
So what I obviously do, is calculating the params once on the initial data and then only applying them in the loop. However, what I wanna do to account for possible drifts in the params, is recalculating them periodically.
I read a lot about parallelism, concurrency, asyncio. While I'd love a solution with asyncio, I don't see how non-preemptive scheduling helps with such a CPU-bound task (correct me if I'm wrong). Another simple idea is, to just run the recalculation part in a separate thread. That works fine, however due to the whole GIL and threading problem, this would slow down my program too much (I need the main processing to run in about 0.2s max). So I landed at multiprocessing and I found a working solution, yet I find it very ugly.
The main problem being, that the computation of the params actually has to return the result to the main process. My approach is to check in the main loop if a recalculation takes place, if not, start a new daemon process. Since I can't just join that process in the main loop, I create a thread which joins the process and then updates the parameters. As if that doesn't sound complicated enough, the question is how to transfer the result. Right now I'm using the Value Object from the multiprocessing module, but this supports only basic data types or custom ctypes structures. My example code is the following (the sleeps are just to slow everything a little down):
Now, thanks for reading this far, my questions are the following: Is this the right approach? Or is there a better or more elegant way to get this done regarding the concurrency? And second, what about the result communication? The example works, but that's just a single integer, I have multiple ints, a numpy array and some more stuff.
Thanks again,
Plexian