please help

***snippsat*** · (This post was last modified: Apr-10-2024, 10:54 AM by snippsat.)

Can also use Polars to speed things up.
So for 1-GB file .csv Pandas use ca 13.5 seconds versus 350 milliseconds in Polars.

In pandas your could be like this.
Then if this is slow use Polars or and other opinion is Dask

import pandas as pd

def extract_data(csv_file):
    # Use chunksize to read the file in chunks
    chunksize = 10000  
    for chunk in pd.read_csv(csv_file, chunksize=chunksize):  
        data = chunk.iloc[:, [1, 3, 5]].values.tolist()
        process_data(data)

def process_data(data):
    for row in data:
        print(row)

csv_file = 'large_file.csv'
extract_data(csv_file)

please help

User Panel Messages

Announcements