Reindexing in Python Pandas

Reindexing is used to change the row labels and column labels of a DataFrame.

It means to conform the data to match a given set of labels along a particular axis.

It helps us to perform Multiple operations through indexing like –

· To insert a missing value (NaN) markers in label locations where no data for the label existed before.

· To reorder the existing data to match a new set of labels.

Example:

import pandas as pd

import numpy as np

N=20

data = pd.DataFrame({

'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),

'x': np.linspace(0,stop=N-1,num=N),

'y': np.random.rand(N),

'C': np.random.choice(['Low','Medium','High'],N).tolist(),

'D': np.random.normal(100, 10, size=(N)).tolist()

})

#reindexing the DataFrame

data_reindexed = data.reindex(index=[0,2,5], columns=['A', 'C', 'B'])

print(data_reindexed)

Output:

A C B

0 2016-01-01 High NaN

2 2016-01-03 Low NaN

5 2016-01-06 High NaN

How to Reindex to Align with Other Objects?

Lets us consider if you want to take an object and reindex its axes and labeled the same as another object.

Take an example to get better understanding

Example:

import pandas as pd

import numpy as np

data1 = pd.DataFrame(np.random.randn(10,3),columns=['column1','column2','column3'])

data2 = pd.DataFrame(np.random.randn(7,3),columns=['column1','column2','column3'])

data1 = data1.reindex_like(data2)

print(data1)

Output:

column1 column2 column3

0 0.271240 0.201199 -0.151743

1 -0.269379 0.262300 0.019942

2 0.685737 -0.233194 -0.652832

3 -1.416394 -0.587026 1.065789

4 -0.590154 -2.194137 0.707365

5 0.393549 1.801881 -2.529611

6 0.062660 -0.996452 -0.029740

Note − Here, the data1 DataFrameis altered and reindexed like data2. If the column names do not should be matched NaN will be added for the entire column label.

How to Fill values while ReIndexing?

We can also fill the missing value while we are reindexing the dataset.

Pandas reindex() method takes an optionalparameter which helps to fill the values. The parameters are as follows-

· pad/ffill – It will fill values in the forward direction.

· bfill/backfill – It will fill the values backward direction.

· nearest – It will fill the values from the nearest index values.

Example

import pandas as pd

import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])

df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's

print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values

print ("Data Frame with Forward Fill:")

print (df2.reindex_like(df1,method='ffill'))

Output

col1 col2 col3

0 -1.046918 0.608691 1.081329

1 -0.396384 -0.176895 -1.896393

2 NaN NaN NaN

3 NaN NaN NaN

4 NaN NaN NaN

5 NaN NaN NaN

Data Frame with Forward Fill:

col1 col2 col3

0 -1.046918 0.608691 1.081329

1 -0.396384 -0.176895 -1.896393

2 -0.396384 -0.176895 -1.896393

3 -0.396384 -0.176895 -1.896393

4 -0.396384 -0.176895 -1.896393

5 -0.396384 -0.176895 -1.896393

Note – In the above example the last four rows are padded.

How to Limit on Filling values while Reindexing?

Reindex() function also takes a parameter “limit” which is used to maximum count of the consecutive matches.

Let’s understand with an example-

import pandas as pd

import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])

df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's

print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values

print ("Data Frame with Forward Fill limiting to 1:")

print(df2.reindex_like(df1,method='ffill',limit=1))

Output

col1 col2 col3

0 0.824697 0.122557 -0.156242

1 0.528174 -1.140847 -1.158778

2 NaN NaN NaN

3 NaN NaN NaN

4 NaN NaN NaN

5 NaN NaN NaN

Data Frame with Forward Fill limiting to 1:

col1 col2 col3

0 0.824697 0.122557 -0.156242

1 0.528174 -1.140847 -1.158778

2 0.528174 -1.140847 -1.158778

3 NaN NaN NaN

4 NaN NaN NaN

5 NaN NaN NaN

Note – In the above we can observe that only the 7th row is filled by the preceding 6th row. Then, the rows are left as they are.

How to Rename in Python?

Python provides a rename() method which allows us to relabel an axis based on the same mapping (a dict or a Series) or an arbitrary function.

Let’s take an example to understand

import pandas as pd

import numpy as np

data1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])

print(data1)

print ("After renaming the rows and columns:")

print(data1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},

index = {0 : 'apple', 1 : 'banana', 2 : 'mango'}))

Output

col1 col2 col3

0 0.047170 0.378306 -1.198150

1 1.183208 -2.195630 -0.798192

2 0.256581 0.627994 -0.674260

3 0.240853 1.677340 1.497613

4 0.820688 0.920151 -1.431485

5 -0.010474 -0.228373 -0.392640

After renaming the rows and columns:

c1 c2 col3

apple 0.047170 0.378306 -1.198150

banana 1.183208 -2.195630 -0.798192

mango 0.256581 0.627994 -0.674260

3 0.240853 1.677340 1.497613

4 0.820688 0.920151 -1.431485

5 -0.010474 -0.228373 -0.392640

This rename() method provides an inplace named parameter, which by default is False and copies the underlying data. Pass inplace=True to rename the data in place.

Insideaiml is one of the best platforms where you can learn Python, Data Science, Machine Learning, Artificial Intelligence & showcase your knowledge to the outside world.

#insideaiml #python #pythonpandas #artificialintelligence #machinelearning #datascience

Artificial Intelligence 2021

Search This Blog

Tuples in Python