Pandas hierarchical sum - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Pandas hierarchical sum (/thread-41500.html) |
Pandas hierarchical sum - mariostg - Jan-26-2024 I have a data structure that looks like so What I am trying to achieve is create an extra column that would be a sum based on the hierarchy of the leftmost column. Exemple : 1.1 would have a total of spent for all rows that begins with 1.1. (15) 1.2.1 and 1.2.1.0 would have a total of 12 I have done it without using pandas, but wondering if a simple method could be achieve with pandas. RE: Pandas hierarchical sum - mariostg - Jan-26-2024 This is what I come up with, attempting to answer my one question. I was hoping to avoid loop. There must be a cleaner way. data = { "Cost Elements": [ "2184AA", "2184DA", "2184A6", "8484XA", "8484YA", "2184BE", "2184B1", "8484B2", "8484B3", ], "parent": [None, "2184AA", "2184DA", "2184A6", "2184A6", "2184AA", "2184BE", "2184BE", "2184B1"], "spent": [0, 0, 0, 10, 5, 0, 0, 4, 8], "total": [0, 0, 0, 0, 0, 0, 0, 0, 0], } index = ["1.1", "1.1.1", "1.1.1.0", "1.1.1.0.1", "1.1.1.0.2", "1.2.1", "1.2.1.0", "1.2.1.0.1", "1.2.1.0.2"] df = pd.DataFrame(data, index=index) df.reset_index(inplace=True) grouped = df.groupby(df["parent"]) sequences = df.index for s in sequences: # print(f"Descendants of {df.iat[s, 0]}") descendants = df[df["index"].str.startswith(df.iat[s, 0])] # print(descendants) # print(f"Sum:{descendants['spent'].sum()}") df.iat[s, 4] = descendants["spent"].sum() print("++++++++++++++++") print(df) print("++++++++++++++++") |