Pandas hierarchical sum - Printable Version

Pandas hierarchical sum - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Pandas hierarchical sum (/thread-41500.html)

Pandas hierarchical sum - mariostg - Jan-26-2024

I have a data structure that looks like so

Output:            Cost Elements  parent  spent
1,1              2184AA    None      0
1.1.1            2184DA  2184AA      0
1.1.1.0          2184A6  2184DA      0
1.1.1.0.1        8484XA  2184A6     10
1.1.1.0.2        8484YA  2184A6      5
1.2.1            2184BE  2184AA      0
1.2.1.0          2184B1  2184BE      0
1.2.1.0.1        8484B2  2184BE      4
1.2.1.0.2        8484B3  2184B1      8

What I am trying to achieve is create an extra column that would be a sum based on the hierarchy of the leftmost column.
Exemple :
1.1 would have a total of spent for all rows that begins with 1.1. (15)
1.2.1 and 1.2.1.0 would have a total of 12

I have done it without using pandas, but wondering if a simple method could be achieve with pandas.

RE: Pandas hierarchical sum - mariostg - Jan-26-2024

This is what I come up with, attempting to answer my one question. I was hoping to avoid loop. There must be a cleaner way.

data = {
    "Cost Elements": [
        "2184AA",
        "2184DA",
        "2184A6",
        "8484XA",
        "8484YA",
        "2184BE",
        "2184B1",
        "8484B2",
        "8484B3",
    ],
    "parent": [None, "2184AA", "2184DA", "2184A6", "2184A6", "2184AA", "2184BE", "2184BE", "2184B1"],
    "spent": [0, 0, 0, 10, 5, 0, 0, 4, 8],
    "total": [0, 0, 0, 0, 0, 0, 0, 0, 0],
}
index = ["1.1", "1.1.1", "1.1.1.0", "1.1.1.0.1", "1.1.1.0.2", "1.2.1", "1.2.1.0", "1.2.1.0.1", "1.2.1.0.2"]

df = pd.DataFrame(data, index=index)
df.reset_index(inplace=True)

grouped = df.groupby(df["parent"])

sequences = df.index
for s in sequences:
    # print(f"Descendants of {df.iat[s, 0]}")
    descendants = df[df["index"].str.startswith(df.iat[s, 0])]
    # print(descendants)
    # print(f"Sum:{descendants['spent'].sum()}")
    df.iat[s, 4] = descendants["spent"].sum()
print("++++++++++++++++")
print(df)
print("++++++++++++++++")