Reindexing is used to change the row labels and column labels of a DataFrame.
It means to conform the data to match a given set of labels along a particular axis.
It
helps us to perform Multiple operations through indexing like –
·
To insert a missing value (NaN)
markers in label locations where no data for the label existed before.
·
To reorder the existing data to
match a new set of labels.
Example:
import
pandas as pd
import
numpy as np
N=20
data =
pd.DataFrame({
'A':
pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C':
np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10,
size=(N)).tolist()
})
#reindexing
the DataFrame
data_reindexed
= data.reindex(index=[0,2,5], columns=['A', 'C', 'B'])
print(data_reindexed)
Output:
A C B
0 2016-01-01 High NaN
2 2016-01-03 Low NaN
5 2016-01-06 High NaN
How to Reindex to Align with Other Objects?
Lets us consider
if you want to take an object and reindex its axes and labeled the same as
another object.
Take an example
to get better understanding
Example:
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.randn(10,3),columns=['column1','column2','column3'])
data2 =
pd.DataFrame(np.random.randn(7,3),columns=['column1','column2','column3'])
data1 = data1.reindex_like(data2)
print(data1)
Output:
column1 column2 column3
0
0.271240 0.201199 -0.151743
1 -0.269379 0.262300
0.019942
2
0.685737 -0.233194 -0.652832
3 -1.416394 -0.587026 1.065789
4 -0.590154 -2.194137 0.707365
5
0.393549 1.801881 -2.529611
6 0.062660 -0.996452 -0.029740
Note − Here, the data1 DataFrameis altered and reindexed like data2. If the column names do not should
be matched NaN will be added for the entire column label.
How to Fill values while ReIndexing?
We can also fill the missing value while we
are reindexing the dataset.
Pandas reindex() method takes an optionalparameter which helps to fill the values. The parameters are as follows-
·
pad/ffill –
It will fill values in the forward direction.
·
bfill/backfill –
It will fill the values backward direction.
·
nearest –
It will fill the values from the nearest index values.
Example
import
pandas as pd
import
numpy as np
df1 =
pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 =
pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
#
Padding NAN's
print(df2.reindex_like(df1))
# Now
Fill the NAN's with preceding Values
print
("Data Frame with Forward Fill:")
print
(df2.reindex_like(df1,method='ffill'))
Output
col1 col2 col3
0 -1.046918 0.608691
1.081329
1 -0.396384 -0.176895 -1.896393
2
NaN NaN NaN
3
NaN NaN NaN
4
NaN NaN NaN
5
NaN NaN NaN
Data Frame with Forward Fill:
col1 col2 col3
0 -1.046918 0.608691
1.081329
1 -0.396384 -0.176895 -1.896393
2 -0.396384 -0.176895 -1.896393
3 -0.396384 -0.176895 -1.896393
4 -0.396384 -0.176895 -1.896393
5 -0.396384 -0.176895 -1.896393
Note –
In the above example the last four rows are padded.
How to Limit on Filling values while
Reindexing?
Reindex() function also takes a parameter
“limit” which is used to maximum count of the consecutive matches.
Let’s understand with an example-
import
pandas as pd
import
numpy as np
df1 =
pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 =
pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
#
Padding NAN's
print(df2.reindex_like(df1))
# Now
Fill the NAN's with preceding Values
print
("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))
Output
col1 col2
col3
0
0.824697 0.122557 -0.156242
1
0.528174 -1.140847 -1.158778
2
NaN NaN NaN
3
NaN NaN NaN
4
NaN NaN NaN
5
NaN NaN NaN
Data Frame with Forward Fill limiting
to 1:
col1 col2 col3
0
0.824697 0.122557 -0.156242
1
0.528174 -1.140847 -1.158778
2
0.528174 -1.140847 -1.158778
3
NaN NaN NaN
4
NaN NaN NaN
5
NaN NaN NaN
Note –
In the above we can observe that only the 7th row is filled by the preceding
6th row. Then, the rows are left as they are.
How to Rename in Python?
Python provides a rename() method which allows
us to relabel an axis based on the same mapping (a dict or a Series) or an
arbitrary function.
Let’s take an example to understand
import
pandas as pd
import
numpy as np
data1 =
pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(data1)
print
("After renaming the rows and columns:")
print(data1.rename(columns={'col1'
: 'c1', 'col2' : 'c2'},
index =
{0 : 'apple', 1 : 'banana', 2 : 'mango'}))
Output
col1 col2 col3
0
0.047170 0.378306 -1.198150
1
1.183208 -2.195630 -0.798192
2
0.256581 0.627994 -0.674260
3
0.240853 1.677340 1.497613
4
0.820688 0.920151 -1.431485
5 -0.010474 -0.228373 -0.392640
After renaming the rows and columns:
c1 c2
col3
apple
0.047170 0.378306 -1.198150
banana
1.183208 -2.195630 -0.798192
mango
0.256581 0.627994 -0.674260
3
0.240853 1.677340 1.497613
4
0.820688 0.920151 -1.431485
5
-0.010474 -0.228373 -0.392640
This rename() method provides an inplace named
parameter, which by default is False and copies the underlying data. Pass inplace=True to
rename the data in place.
Comments
Post a Comment