How is pandas modifying all rows in an assignment - python-newbie question - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How is pandas modifying all rows in an assignment - python-newbie question (/thread-41216.html) |
How is pandas modifying all rows in an assignment - python-newbie question - markm74 - Nov-28-2023 Hi all, This may be a data science question or just a Python newbie question, but I'm trying to fundamentally understand how pandas modifies all records in this assignment: import pandas as pd df = pd.read_csv('example.csv') print(df.head()) df['input'] = 'TEXT1: ' + df.context + ' TEXT2: ' + df.target + ' TEXT3: ' + df.anchor print(df.head())The csv has existing columns called 'context', 'target' and 'anchor' and I'm adding 'input' as a column in the above code. With what looks like a single concatenated string assignment, pandas has created a new column for all rows and modified the column according to the logic in the string concatenation. I'm coming to python from other languages, so is this a pythonic thing, or has pandas overridden object property assignment and they're using the expression as a shortcut to modify all rows? In another language you'd just end up with df['input'] as a property with a single string value - or it might throw an error because e.g. df.context isn't a variable that can be concatenated. Thanks in advance, Mark. RE: How is pandas modifying all rows in an assignment - python-newbie question - deanhystad - Nov-28-2023 It is doing exactly what I would expect. What were you expecting? Maybe this will help explain. import pandas as pd df = pd.DataFrame( {"context": list("ABC"), "target": list("DEF"), "anchor": list("GHI")} ) new_series = "TEXT1: " + df.context + " TEXT2: " + df.target + " TEXT3: " + df.anchor print("target", df.target, "", sep="\n") print("new_series", new_series, "", sep="\n") df.target is a Series, an array like object that is the "target" row the df dataframe.The result of this operation that uses multiple Series is a new Series. new_series = "TEXT1: " + df.context + " TEXT2: " + df.target + " TEXT3: " + df.anchor |