Python Forum
Suggestion on how to speed up this code?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Suggestion on how to speed up this code?
#1
tester_list = []
def day_test(df_row):
    day_start_time = time.time()
    tester = completedf[(completedf['date'] == unique_dates[df_row['b_index'].iloc[0]]) & (completedf['volwagroup'] == df_row.index[0][0]) & (completedf['int_dur'] == df_row.index[0][1]) & (completedf['int_calc'] == df_row.index[0][2]) & (completedf['stopgroup'] == df_row.index[0][3]) & (completedf['daylossgrp'] == df_row.index[0][4]) & (completedf['timegroup'] == df_row.index[0][5])]
    if tester.empty: ror = 0
    else: ror = tester['%ROR'].iloc[0]
    print('Finished day testing ' + unique_dates[df_row['b_index'].iloc[0]] + ' sample_dur: ' + str(df_row['sample_dur'].iloc[0]) + ' @ ' + datetime.now().strftime(progressformat) + ' - Execution time (HH:MM:SS.xx) : ' + timer(day_start_time))
    tester_list.append((df_row['sample_dur'].iloc[0], ror))


backtestdf_complete.groupby(['b_index', 'sample_dur']).apply(day_test)
test_df = pd.DataFrame(tester_list, columns=['day_dur', '%ROR'])
So in the above code I do a groupby operation on backtestdf_complete (a pandas dataframe), and use a .apply() to run function day_test on it. The day_test function runs a boolean filtering expression on dataframe completedf, and I append two entries from the filtered dataframe into a list which I later convert into another dataframe called test_df.

So right now for my testing sample, completedf has about 1.2 million rows, and this function runs about 10,000 times in the .apply() - it takes like an hour and 15 min to do. FYI for my boolean indexing, each filter will only return one single, non sequential, row from completedf. Basically I'm picking out 10,000ish rows from a 1.2 mil row dataframe, and the rows are not sequential nor at any fixed interval.

In another version of this I've tried parellelizing this using joblib, but that's actually a bit slower - each function executes fast, but it gets called 10,000 is times. So with the joblib overhead, it's slightly slower than calling it singlethread. Singlthread takes about .2-.25 seconds per function, joblib seems like .25-.3 seconds.

I've also considered getting rid of the function altogether and just using a for loop, then I get rid of the function overhead, but I know how slow loops can be vs using a .apply() on a dataframe...

So suggestions? Is there anny better way to pick out about 10000ish rows from 1.2 million? I will have bigger sample sizes in future so I want to make this efficient.
Reply
#2
Solved my own issue actually - I converted the relevant columns in completedf to a multi-index, now the operation is virtually instantaneous.

def day_test(df_row):
    day_start_time = time.time()
    try:
        tester = completedf.loc[(df_row.index[0][0], df_row.index[0][1], df_row.index[0][2], df_row.index[0][3], df_row.index[0][4], df_row.index[0][5], unique_dates[df_row['b_index'].iloc[0]])]
        ror = tester['%ROR']
    except: ror = 0
    #print('Finished day testing ' + unique_dates[df_row['b_index'].iloc[0]] + ' sample_dur: ' + str(df_row['sample_dur'].iloc[0]) + ' @ ' + datetime.now().strftime(progressformat) + ' - Execution time (HH:MM:SS.xx) : ' + timer(day_start_time))
    tester_list.append((df_row['sample_dur'].iloc[0], ror))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Can you give me some suggestion about PCEP Newbie1114 0 1,051 Oct-14-2021, 03:02 PM
Last Post: Newbie1114
  instagram followers name without suggestion for you jacklee26 1 3,220 Oct-02-2021, 04:57 AM
Last Post: ndc85430
  Speed up code with second process help samuelbachorik 0 1,441 Sep-04-2021, 09:31 AM
Last Post: samuelbachorik
  Python module speed or python speed in general Enrique6 1 1,878 May-04-2020, 06:21 PM
Last Post: micseydel
  Optimization suggestion Julia 2 1,771 Mar-29-2020, 12:02 PM
Last Post: Julia
  Need suggestion how to handle this error farrukh 1 2,325 Dec-21-2019, 03:21 PM
Last Post: DeaD_EyE
  working code, suggestion required for improvement anna 18 8,676 Dec-29-2017, 01:24 PM
Last Post: buran
  Creating a program that records speed in a speed trap astonavfc 7 7,433 Nov-07-2016, 06:50 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020