Python Forum
Pandas hierarchical sum - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Pandas hierarchical sum (/thread-41500.html)



Pandas hierarchical sum - mariostg - Jan-26-2024

I have a data structure that looks like so
Output:
Cost Elements parent spent 1,1 2184AA None 0 1.1.1 2184DA 2184AA 0 1.1.1.0 2184A6 2184DA 0 1.1.1.0.1 8484XA 2184A6 10 1.1.1.0.2 8484YA 2184A6 5 1.2.1 2184BE 2184AA 0 1.2.1.0 2184B1 2184BE 0 1.2.1.0.1 8484B2 2184BE 4 1.2.1.0.2 8484B3 2184B1 8
What I am trying to achieve is create an extra column that would be a sum based on the hierarchy of the leftmost column.
Exemple :
1.1 would have a total of spent for all rows that begins with 1.1. (15)
1.2.1 and 1.2.1.0 would have a total of 12

I have done it without using pandas, but wondering if a simple method could be achieve with pandas.


RE: Pandas hierarchical sum - mariostg - Jan-26-2024

This is what I come up with, attempting to answer my one question. I was hoping to avoid loop. There must be a cleaner way.

data = {
    "Cost Elements": [
        "2184AA",
        "2184DA",
        "2184A6",
        "8484XA",
        "8484YA",
        "2184BE",
        "2184B1",
        "8484B2",
        "8484B3",
    ],
    "parent": [None, "2184AA", "2184DA", "2184A6", "2184A6", "2184AA", "2184BE", "2184BE", "2184B1"],
    "spent": [0, 0, 0, 10, 5, 0, 0, 4, 8],
    "total": [0, 0, 0, 0, 0, 0, 0, 0, 0],
}
index = ["1.1", "1.1.1", "1.1.1.0", "1.1.1.0.1", "1.1.1.0.2", "1.2.1", "1.2.1.0", "1.2.1.0.1", "1.2.1.0.2"]

df = pd.DataFrame(data, index=index)
df.reset_index(inplace=True)

grouped = df.groupby(df["parent"])

sequences = df.index
for s in sequences:
    # print(f"Descendants of {df.iat[s, 0]}")
    descendants = df[df["index"].str.startswith(df.iat[s, 0])]
    # print(descendants)
    # print(f"Sum:{descendants['spent'].sum()}")
    df.iat[s, 4] = descendants["spent"].sum()
print("++++++++++++++++")
print(df)
print("++++++++++++++++")