Aristocratic Elfin Share: April 2019

Sunday, April 21, 2019

Reindexing Dataframe after rows filter in Pandas and preserving previous index

I have an excel sheet with below records

Lets filter out all those rows where customer name contain “ika”

Output

Original DF rows

account_no branch city_code customer_name amount

0 2112 3212.0 321.0 Sidhika 19000

1 2119 NaN 215.0 Prayansh 12000

2 2115 4321.0 212.0 Rishika 15000

3 2435 2312.0 NaN Sagarika 13000

4 2356 7548.0 256.0 NaN 15000

Rows where customer Name not contain ika

account_no branch city_code customer_name amount

1 2119 NaN 215.0 Prayansh 12000

4 2356 7548.0 256.0 NaN 15000

Now here if you see the output of df2, you will there are two rows with index 1 and 4, which simply indicates that it require reindexing. Lets put the logic of reindexing

#reindexing DF2 dataframe
df3 = df2.reset_index(drop=True)
print('After reindexing of DF3 \n',df3)

Output:

After reindexing of DF3

account_no branch city_code customer_name amount

0 2119 NaN 215.0 Prayansh 12000

1 2356 7548.0 256.0 NaN 15000

Now here if you see in output, the index are correct and in sequence i.e. 0 and 1.

Now lets check the syntax

df2.reset_index(drop=True)

there is a parameter “drop=True”, this actually drops the existing index on the rows and create new one starting with 0.

But what if we want to preserve the actual index of the rows… just simple remove the optional parameter “drop=True”

#what if we remove "drop=True" parameter og reset index

df4 = df2.reset_index()
print('After reindexing of DF2 \n',df4)

Output

After reindexing of DF2

index account_no branch city_code customer_name amount

0 1 2119 NaN 215.0 Prayansh 12000

1 4 2356 7548.0 256.0 NaN 15000

So here you can see, it creates a new column called “index”, and preserve the existing index numbering.

Full code:

import pandas as pd
import numpy as np

df1 = pd.read_csv("NullFilterExample.csv")

print('Original DF rows \n',df1 , '\n')

#implementing filter not like condition
df2 = df1[~df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name not contain ika \n',df2)

#reindexing DF2 dataframe
df3 = df2.reset_index(drop=True)
print('After reindexing of DF2 \n',df3)

#what if we remove "drop=True" parameter og reset index

df4 = df2.reset_index()
print('After reindexing of DF2 \n',df4)

Data Science with…Python :

Post Reference: Vikram Aristocratic Elfin Share

Fetch rows on the basis of condition in Pandas Dataframe

I have an excel sheet with below records

Here we are trying to implement various filter criteria

· Implementing value search

· Implementing like condition

· Implementing not like condition

And while doing these we will try to ignore the NULL condition with the help of “na=Falase” parameter. Lets code it

import pandas as pd
import numpy as np

df1 = pd.read_csv("NullFilterExample.csv")

print('Original DF rows \n',df1 , '\n')

#implementing value search
df2=df1[df1.customer_name == 'Rishika']
print('Rows where customer Name like Rishika \n',df2)

#implementing like condition
df3 = df1[df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name contain ika \n',df3)

#implementing not like condition
df4 = df1[~df1.customer_name.str.contains('ika', na=False)]
print('Rows where customer Name not contain ika \n',df4)

Output:

Original DF rows

account_no branch city_code customer_name amount

0 2112 3212.0 321.0 Sidhika 19000

1 2119 NaN 215.0 Prayansh 12000

2 2115 4321.0 212.0 Rishika 15000

3 2435 2312.0 NaN Sagarika 13000

4 2356 7548.0 256.0 NaN 15000

Rows where customer Name like Rishika

account_no branch city_code customer_name amount

2 2115 4321.0 212.0 Rishika 15000

Rows where customer Name contain ika

account_no branch city_code customer_name amount

0 2112 3212.0 321.0 Sidhika 19000

2 2115 4321.0 212.0 Rishika 15000

3 2435 2312.0 NaN Sagarika 13000

Rows where customer Name not contain ika

account_no branch city_code customer_name amount

1 2119 NaN 215.0 Prayansh 12000

4 2356 7548.0 256.0 NaN 15000

Data Science with…Python :)

Post Reference: Vikram Aristocratic Elfin Share

Saturday, April 20, 2019

Removing Null Value rows from Dataframe in Pandas

I have an excel sheet with below records

Here if you see second and fourth rows having null value, so our objective to remove these rows from process, lets see how we can do with panda package of python

import pandas as pd
import numpy as np

df1 = pd.read_csv("NullFilterExample.csv")

print('Original DF with NULL value in rows \n',df1 , '\n')

for col in df1.columns:
df1=df1[df1[col].notnull()]

print('After removing rows with NULL Value \n',df1)

Output:

Original DF with NULL value in rows

account_no branch city_code customer_name amount

0 2112 3212.0 321.0 Sidhika 19000

1 2119 NaN 215.0 Prayansh 12000

2 2115 4321.0 212.0 Rishika 15000

3 2435 2312.0 NaN Sagarika 13000

After removing rows with NULL Value

account_no branch city_code customer_name amount

0 2112 3212.0 321.0 Sidhika 19000

2 2115 4321.0 212.0 Rishika 15000

Data Science with…Python J

Post Reference: Vikram Aristocratic Elfin Share

Aristocratic Elfin Share

Pages

About Me

Search This Blog

Sunday, April 21, 2019

Reindexing Dataframe after rows filter in Pandas and preserving previous index

Fetch rows on the basis of condition in Pandas Dataframe

Saturday, April 20, 2019

Removing Null Value rows from Dataframe in Pandas