How to read a 6 GB csv file with pandas

How to read a 6 GB csv file with pandas

Asked on December 19, 2018 in Pandas.
Add Comment


  • 2 Answer(s)

    Try the below code for the best solution:

    chunks=pd.read_table('aphro.csv',chunksize=1000000,sep=';',\
            names=['lat','long','rf','date','slno'],index_col='slno',\
            header=None,parse_dates=['date'])
     
    df=pd.DataFrame()
    %time df=pd.concat(chunk.groupby(['lat','long',chunk['date'].map(lambda x: x.year)])['rf'].agg(['sum']) for chunk in chunks)
    
    Answered on December 19, 2018.
    Add Comment

    There are few cases apart form chunking:

    1.Large data size due to empty/repeated columns:

    In this case you can save memory by referring to columns in categories and only mentioning needed columns by referring in pd.read_csv with usecols parameter.

    2.Are slicing, manipulating and exporting involved in the workflow ?

    In this case you can use dask.dataframe to slice or performing operations. The dask function performs chunking silently also supporting a subset of API pandas.

    3.Chunks can be also be used to read line by line.

    Answered on December 19, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.