Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
please help
#6
Can also use Polars to speed things up.
So for 1-GB file .csv Pandas use ca 13.5 seconds versus 350 milliseconds in Polars.

In pandas your could be like this.
Then if this is slow use Polars or and other opinion is Dask
import pandas as pd

def extract_data(csv_file):
    # Use chunksize to read the file in chunks
    chunksize = 10000  
    for chunk in pd.read_csv(csv_file, chunksize=chunksize):  
        data = chunk.iloc[:, [1, 3, 5]].values.tolist()
        process_data(data)

def process_data(data):
    for row in data:
        print(row)

csv_file = 'large_file.csv'
extract_data(csv_file)
Gribouillis likes this post
Reply


Messages In This Thread
please help - by natalie321 - Apr-10-2024, 03:52 AM
RE: please help - by Larz60+ - Apr-10-2024, 07:52 AM
RE: please help - by Gribouillis - Apr-10-2024, 08:58 AM
RE: please help - by paul18fr - Apr-10-2024, 09:22 AM
RE: please help - by deanhystad - Apr-10-2024, 09:34 AM
RE: please help - by snippsat - Apr-10-2024, 10:54 AM
RE: please help - by Pedroski55 - Apr-11-2024, 06:05 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020