Python Forum
How is pandas modifying all rows in an assignment - python-newbie question - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How is pandas modifying all rows in an assignment - python-newbie question (/thread-41216.html)



How is pandas modifying all rows in an assignment - python-newbie question - markm74 - Nov-28-2023

Hi all,

This may be a data science question or just a Python newbie question, but I'm trying to fundamentally understand how pandas modifies all records in this assignment:

import pandas as pd
df = pd.read_csv('example.csv')
print(df.head())
df['input'] = 'TEXT1: ' + df.context + ' TEXT2: ' + df.target + ' TEXT3: ' + df.anchor
print(df.head())
The csv has existing columns called 'context', 'target' and 'anchor' and I'm adding 'input' as a column in the above code. With what looks like a single concatenated string assignment, pandas has created a new column for all rows and modified the column according to the logic in the string concatenation. I'm coming to python from other languages, so is this a pythonic thing, or has pandas overridden object property assignment and they're using the expression as a shortcut to modify all rows? In another language you'd just end up with df['input'] as a property with a single string value - or it might throw an error because e.g. df.context isn't a variable that can be concatenated.

Thanks in advance,

Mark.


RE: How is pandas modifying all rows in an assignment - python-newbie question - deanhystad - Nov-28-2023

It is doing exactly what I would expect. What were you expecting?

Maybe this will help explain.
import pandas as pd

df = pd.DataFrame(
    {"context": list("ABC"), "target": list("DEF"), "anchor": list("GHI")}
)
new_series = "TEXT1: " + df.context + " TEXT2: " + df.target + " TEXT3: " + df.anchor
print("target", df.target, "", sep="\n")
print("new_series", new_series, "", sep="\n")
Output:
target 0 D 1 E 2 F Name: target, dtype: object new_series 0 TEXT1: A TEXT2: D TEXT3: G 1 TEXT1: B TEXT2: E TEXT3: H 2 TEXT1: C TEXT2: F TEXT3: I dtype: object
df.target is a Series, an array like object that is the "target" row the df dataframe.

The result of this operation that uses multiple Series is a new Series.
new_series = "TEXT1: " + df.context + " TEXT2: " + df.target + " TEXT3: " + df.anchor