About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Sunday, May 19, 2019

How the object is handled when inplace=True is passed vs. when inplace=False, in Pandas dataframe

Let’s understand inplace with below code block where it returns None df3

df3=df1.set_index('account_no',inplace=True, drop=False)
#df3.set_index('account_no',inplace=False, drop=True)
print("Index value of DF3 : \n", df3)

Set_index: set_index method of dataframe is used to set the index of dataframe using existing column or array.
Syntax: dataframe.set_index(key,append=False, inplace =False,drop=True)

Let play with the datasource of


Let’s understand two important parameter inplace and drop.

When inplace is set to “True”  it tells change need to be done in original dataframe and nothing gets returned as object
Whereas when inplace is set to “False” it changes in the copy of dataframe and returned as object without affecting original dataframe.

Lets check with example

#inplace is set to True
df3=df1.set_index('account_no',inplace=True, drop=False)
#df3.set_index('account_no',inplace=False, drop=True)
print("DF3 with inplace set to True : \n", df3)

print("\nDF1 Data : \n", df1)

Output:

DF3 with inplace set to True :
 None

DF1 Data :
             account_no  branch  city_code customer_name  amount
account_no                                                    
2112              2112  3212.0      321.0       Sidhika   19000
2119              2119     NaN      215.0      Prayansh   12000
2115              2115  4321.0      212.0       Rishika   15000
2435              2435  2312.0        NaN      Sagarika   13000
2356              2356  7548.0      256.0           NaN   15000

So here we can see the output of DF3 is None because df1.set_index uses parameter inplace=”True” which says that changes need to be done in original dataframe and returns nothing.
df3=df1.set_index('account_no',inplace=True, drop=False)

Now lets change the inplace to Flase, by definition it says that it should change in the copy of dataframe object and returns an df object

#inplace is set to True
df3=df1.set_index('account_no',inplace=False, drop=False)
#df3.set_index('account_no',inplace=False, drop=True)
print("DF3 with inplace set to False : \n", df3)

print("\nDF1 Data : \n", df1)

Output:
DF3 with inplace set to False :
             account_no  branch  city_code customer_name  amount
account_no                                                    
2112              2112  3212.0      321.0       Sidhika   19000
2119              2119     NaN      215.0      Prayansh   12000
2115              2115  4321.0      212.0       Rishika   15000
2435              2435  2312.0        NaN      Sagarika   13000
2356              2356  7548.0      256.0           NaN   15000

DF1 Data :
    account_no  branch  city_code customer_name  amount
0        2112  3212.0      321.0       Sidhika   19000
1        2119     NaN      215.0      Prayansh   12000
2        2115  4321.0      212.0       Rishika   15000
3        2435  2312.0        NaN      Sagarika   13000
4        2356  7548.0      256.0           NaN   15000

With the output we can see the changes are done in copy of original dataframe without affecting the original one.

Complete Program:

import pandas as pd
import numpy as np

df1 = pd.read_csv(
"NullFilterExample.csv")

print('Original DF rows \n',df1 , '\n')

#inplace is set to False
df3=df1.set_index('account_no',inplace=False, drop=False)
#df3.set_index('account_no',inplace=False, drop=True)
print("DF3 with inplace set to False : \n", df3)

print("\nDF1 Data : \n", df1)


#inplace is set to True
df4=df1.set_index('account_no',inplace=True, drop=False)
#df3.set_index('account_no',inplace=False, drop=True)
print("DF3 with inplace set to True : \n", df4)

print("\nDF1 Data : \n", df1)



Data Science with…Python J
Post Reference: Vikram Aristocratic Elfin Share

Sunday, April 21, 2019

Reindexing Dataframe after rows filter in Pandas and preserving previous index

I have an excel sheet with below records

Lets filter out all those rows where customer name contain “ika”

import pandas as pd
import numpy as np

df1 = pd.read_csv(
"NullFilterExample.csv")

print('Original DF rows \n',df1 , '\n')

#implementing filter  not like condition
df2 = df1[~df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name not contain ika \n',df2)

Output
Original DF rows
    account_no  branch  city_code customer_name  amount
0        2112  3212.0      321.0       Sidhika   19000
1        2119     NaN      215.0      Prayansh   12000
2        2115  4321.0      212.0       Rishika   15000
3        2435  2312.0        NaN      Sagarika   13000
4        2356  7548.0      256.0           NaN   15000

Rows where customer Name not contain ika
    account_no  branch  city_code customer_name  amount
1        2119     NaN      215.0      Prayansh   12000
4        2356  7548.0      256.0           NaN   15000

Now here if you see the output of df2, you will there are two rows with index 1 and 4, which simply indicates that it require reindexing. Lets put the logic of reindexing

#reindexing DF2 dataframe
df3 = df2.reset_index(drop=True)
print('After reindexing of DF3 \n',df3)

Output:
After reindexing of DF3
    account_no  branch  city_code customer_name  amount
0        2119     NaN      215.0      Prayansh   12000
1        2356  7548.0      256.0           NaN   15000

Now here if you see in output, the index are correct and in sequence i.e. 0 and 1.

Now lets check the syntax
df2.reset_index(drop=True)

there is a parameter “drop=True”, this actually drops the existing index on the rows and create new one starting with 0.

But what if we want to preserve the actual index of the rows… just simple remove the optional parameter  “drop=True”

#what if we remove "drop=True" parameter og reset index

df4 = df2.reset_index()
print('After reindexing of DF2 \n',df4)

Output
After reindexing of DF2
    index  account_no  branch  city_code customer_name  amount
0      1        2119     NaN      215.0      Prayansh   12000
1      4        2356  7548.0      256.0           NaN   15000

So here you can see, it creates a new column called “index”, and preserve the existing index numbering.

Full code:
import pandas as pd
import numpy as np

df1 = pd.read_csv(
"NullFilterExample.csv")

print('Original DF rows \n',df1 , '\n')

#implementing filter  not like condition
df2 = df1[~df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name not contain ika \n',df2)

#reindexing DF2 dataframe
df3 = df2.reset_index(drop=True)
print('After reindexing of DF2 \n',df3)

#what if we remove "drop=True" parameter og reset index

df4 = df2.reset_index()
print('After reindexing of DF2 \n',df4)



Data Science with…Python :
Post Reference: Vikram Aristocratic Elfin Share

Fetch rows on the basis of condition in Pandas Dataframe

I have an excel sheet with below records
Here we are trying to implement various filter criteria
·          Implementing value search
·          Implementing like condition
·          Implementing not like condition


And while doing these we will try to ignore the NULL condition with the help of “na=Falase” parameter. Lets code it  

import pandas as pd
import numpy as np

df1 = pd.read_csv(
"NullFilterExample.csv")

print('Original DF rows \n',df1 , '\n')

#implementing value search
df2=df1[df1.customer_name == 'Rishika']
print('Rows where customer Name like Rishika \n',df2)

#implementing like condition
df3 = df1[df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name contain ika \n',df3)

#implementing not like condition
df4 = df1[~df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name not contain ika \n',df4)

Output:
Original DF rows
    account_no  branch  city_code customer_name  amount
0        2112  3212.0      321.0       Sidhika   19000
1        2119     NaN      215.0      Prayansh   12000
2        2115  4321.0      212.0       Rishika   15000
3        2435  2312.0        NaN      Sagarika   13000
4        2356  7548.0      256.0           NaN   15000

Rows where customer Name like Rishika
    account_no  branch  city_code customer_name  amount
2        2115  4321.0      212.0       Rishika   15000

Rows where customer Name contain ika
    account_no  branch  city_code customer_name  amount
0        2112  3212.0      321.0       Sidhika   19000
2        2115  4321.0      212.0       Rishika   15000
3        2435  2312.0        NaN      Sagarika   13000

Rows where customer Name not contain ika
    account_no  branch  city_code customer_name  amount
1        2119     NaN      215.0      Prayansh   12000
4        2356  7548.0      256.0           NaN   15000

Data Science with…Python :) 

Post Reference: Vikram Aristocratic Elfin Share