Convert categorical data in pandas dataframe

Convert categorical data in pandas dataframe

Asked on December 24, 2018 in Pandas.
Add Comment


  • 2 Answer(s)

    The code dataframe[‘c’].cat.codes makes it easier:

    The function select_dtypes can be used to select all the column with a particular datatype.

    In [75]: df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'), 'col3':list('ababb')})
     
    In [76]: df['col2'] = df['col2'].astype('category')
     
    In [77]: df['col3'] = df['col3'].astype('category')
     
    In [78]: df.dtypes
    Out[78]:
    col1 int64
    col2 category
    col3 category
    dtype: object
    

    Applying the function .cat.codes can get the following result:

    In [80]: cat_columns = df.select_dtypes(['category']).columns
     
    In [81]: cat_columns
    Out[81]: Index([u'col2', u'col3'], dtype='object')
     
    In [83]: df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)
     
    In [84]: df
    Out[84]:
       col1 col2 col3
     0    1   0   0
     1    2   1   1
     2    3   2   0
     3    4   0   1
     4    5   1   1
    
    Answered on December 24, 2018.
    Add Comment

    The below code works perfectly:

    pandas.factorize( ['B', 'C', 'D', 'B'] )[0]
    

    yields,

    [0, 1, 2, 0]
    
    Answered on December 24, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.