Apr-26-2017, 07:16 PM
I have stumbled several times in my career on cases when badly written Python code - or duplicated processing - were bottlenecks in data processing project - those infamous 3%.
In one case, analysis code running over hundreds of millions lines of text was comparing one-digit integer with threshold value. Removing conversion to integer speeded up comparison 11 times. Those were microseconds, but still. Removing conversion was a simple matter.
In another case, graphical implementation was slowed down to a crawl due to duplicates in code, that was very WET - as opposed to DRY . I suggested caching mechanism to solve the issue - I was patted on the shoulder, and majorly ignored.
In one case, analysis code running over hundreds of millions lines of text was comparing one-digit integer with threshold value. Removing conversion to integer speeded up comparison 11 times. Those were microseconds, but still. Removing conversion was a simple matter.
In another case, graphical implementation was slowed down to a crawl due to duplicates in code, that was very WET - as opposed to DRY . I suggested caching mechanism to solve the issue - I was patted on the shoulder, and majorly ignored.
Test everything in a Python shell (iPython, Azure Notebook, etc.)
- Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.