python - Renaming column names in Pandas

ID : 310

viewed : 439

Tags : pythonpandasreplacedataframerenamepython





Top 5 Answer for python - Renaming column names in Pandas

vote vote

96

RENAME SPECIFIC COLUMNS

Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}) # Or rename the existing DataFrame (rather than creating a copy)  df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True) 

Minimal Code Example

df = pd.DataFrame('x', index=range(3), columns=list('abcde')) df     a  b  c  d  e 0  x  x  x  x  x 1  x  x  x  x  x 2  x  x  x  x  x 

The following methods all work and produce the same output:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns') df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method    df2     X  Y  c  d  e 0  x  x  x  x  x 1  x  x  x  x  x 2  x  x  x  x  x 

Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True) df     X  Y  c  d  e 0  x  x  x  x  x 1  x  x  x  x  x 2  x  x  x  x  x   

From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified. See v0.25 rename() docs.


REASSIGN COLUMN HEADERS

Use df.set_axis() with axis=1 and inplace=False (to return a copy).

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False) df2     V  W  X  Y  Z 0  x  x  x  x  x 1  x  x  x  x  x 2  x  x  x  x  x 

This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).

You can also assign headers directly:

df.columns = ['V', 'W', 'X', 'Y', 'Z'] df     V  W  X  Y  Z 0  x  x  x  x  x 1  x  x  x  x  x 2  x  x  x  x  x 
vote vote

85

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]}) >>> df    $a  $b 0   1  10 1   2  20  >>> df.columns = ['a', 'b'] >>> df    a   b 0  1  10 1  2  20 
vote vote

78

The rename method can take a function, for example:

In [11]: df.columns Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)  In [12]: df.rename(columns=lambda x: x[1:], inplace=True)  In [13]: df.columns Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object) 
vote vote

67

As documented in Working with text data:

df.columns = df.columns.str.replace('$', '') 
vote vote

51

Pandas 0.21+ Answer

There have been some significant updates to column renaming in version 0.21.

  • The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
  • The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.

Examples for Pandas 0.21+

Construct sample DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4],                     '$c':[5,6], '$d':[7,8],                     '$e':[9,10]})     $a  $b  $c  $d  $e 0   1   3   5   7   9 1   2   4   6   8  10 

Using rename with axis='columns' or axis=1

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns') 

or

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1) 

Both result in the following:

   a  b  c  d   e 0  1  3  5  7   9 1  2  4  6  8  10 

It is still possible to use the old method signature:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}) 

The rename function also accepts functions that will be applied to each column name.

df.rename(lambda x: x[1:], axis='columns') 

or

df.rename(lambda x: x[1:], axis=1) 

Using set_axis with a list and inplace=False

You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False) 

or

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False) 

Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?

There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.

The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.

# new for pandas 0.21+ df.some_method1()   .some_method2()   .set_axis()   .some_method3()  # old way df1 = df.some_method1()         .some_method2() df1.columns = columns df1.some_method3() 

Top 3 video Explaining python - Renaming column names in Pandas







Related QUESTION?