pandas unique values multiple columns

pandas unique values multiple columns

Asked on December 21, 2018 in Pandas.
Add Comment


  • 2 Answer(s)

    One can use the function pd.unique where input array returns unique values or dataframe column or index.

    The input should be a 1d array and thus the multiple columns will be combined into a single column:

    >>> pd.unique(df[['Col1', 'Col2']].values.ravel('K'))
    array(['Bob', 'Joe', 'Bill', 'Mary', 'Steve'], dtype=object)
    

    A view of multi dimensional array can be viewed using the function ravel(). The argument ‘K’ is used to flatten the array.This method is better than default one that uses ‘C’.

    There is another solution that uses np.uniqueThis does not uses the function ravel() and eventhough it is slower than pd.unique as it uses a sort bases algorithm. This can be felt when using larger DataFrames:

    >>> df1 = pd.concat([df]*100000, ignore_index=True) # DataFrame with 500000 rows
    >>> %timeit np.unique(df1[['Col1', 'Col2']].values)
    1 loop, best of 3: 1.12 s per loop
     
    >>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel('K'))
    10 loops, best of 3: 38.9 ms per loop
     
    >>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel()) # ravel using C order
    10 loops, best of 3: 49.9 ms per loop
    
    Answered on December 21, 2018.
    Add Comment

    Consider a DataFrame below:

    >>> df
      a b
    0 a g
    1 b h
    2 d a
    3 e e
    

    Concatenate the required columns and use unique function:

    >>> pandas.concat([df['a'], df['b']]).unique()
    array(['a', 'b', 'd', 'e', 'g', 'h'], dtype=object)
    
    Answered on December 21, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.