Python Forum
Get Difference whenever transition value increases - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Get Difference whenever transition value increases (/thread-40258.html)



Get Difference whenever transition value increases - v_mn - Jun-30-2023

I have following code:

import pandas as pd
from datetime import datetime

# Create the DataFrame
data = {'dt': ['1990-06-05 15:59:57', '1990-06-05 21:59:54', '1990-02-20 14:19:08', '1990-03-01 23:01:44',
                '1990-05-27 21:59:55', '1990-05-27 22:59:55', '1990-05-28 02:59:57', '1990-05-28 22:59:56',
                '1990-05-29 02:59:56', '1990-05-30 21:59:54', '1990-06-10 22:59:54', '1990-06-11 02:59:55',
                '1990-06-11 09:59:55', '1990-06-11 22:59:55'],
        'type': ['event'] * 14,
        'value': [1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0]}
df = pd.DataFrame(data)

# Identify transitions from 1 to 0
m = (df['value'].diff() == -1)
df['dt']=pd.to_datetime(df['dt'])
df.sort_values(['dt'], inplace=True)
df.reset_index(drop=True, inplace=True)
# Identify transitions from 1 to 0
# Count transitions per group
df['Transition_Count'] = m.groupby((df['type'] != df['type'].shift()).cumsum()).cumsum()

# Calculate difference between 'dt' values for increasing transitions
df['Transition_Difference'] = df.groupby('Transition_Count')['dt'].diff()



# Filter rows where transition occurs and 'Transition_Difference' is not null
transition_df = df[(df['Transition_Count'] > 0) & ~df['Transition_Difference'].isnull()]



print(transition_df[['Transition_Count', 'dt', 'Transition_Difference']])
I want a dataframe (transition_df), where all the rows where the value of 'value' column changes from 1 to 0. Exception is: whenever there are two 0s in the column, it should take the last occurence of 0, not the one just below it. The resulting dataframe i have has two entries missing: '1990-05-28 22:59:56', '1990-05-30 21:59:54'. How can I write mask condition to achieve the required result ?