At Agira, Technology Simplified, Innovation Delivered, and Empowering Business is what we are passionate about. We always strive to build solutions that boost your productivity.

Top Panda Functions You Should Know

  • By Preethi
  • April 13, 2020
  • 854 Views

Python is one of the versatile and easy to learn software languages. Here we are going to concentrate on Pandas. Pandas is a software library written in Python language. It is one of the go-to software for data analysis and data manipulation which is nothing but organizing data into a user-friendly format. This library is built with functions like NumPy, SciPy, and Matplotlib. Pandas is mainly used in data analysis and data manipulations which are nothing but organizing data into a more user-friendly format. All you need to have to work on Pandas is given below.

Install Pandas in Python

  •  Run this command to install pandas.
  $pip install pandas
  • A decent knowledge of python. If you are new to python I would recommend you go through some basic tutorials on python and continue further.

Python Basics: Lists, Dictionaries, & Booleans | Python

Goals of this blog

Understanding and playing around with basic commands in Pandas. For better understanding, you can use the repo here.
Before getting into the commands, let’s quickly go through the basic terms generally used in Pandas. Dataframe is nothing but a table with multiple columns, it can be of single dimension or multi-dimensional and series is one single column of the data frame.

ALSO READ: MongoDB In Golang With Examples – A Beginner’s Guide

Now, let’s start by reading the CSV file. A CSV (comma separated values) files are actually tables in the text version. It is basically separated by commas.

Read a CSV file

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head(3))

Output:

#                         Name Type 1  ... Speed  Generation Legendary
0      1              Bulbasaur Grass  ... 45           1      False
1      2                Ivysaur Grass  ... 60           1      False
2      3               Venusaur Grass  ... 80           1      False
3      3               Venusaur Grass  ... 80           1      False
4      4             Charmander   Fire ... 65           1      False
..   ...                   ... ... ... ...         ... ...
795  719                Diancie Rock  ... 50           6       True
796  719                Diancie Rock  ... 110           6       True
797  720               Confined Psychic  ... 70           6       True
798  720                Unbound Psychic  ... 80           6       True
799  721              Volcanion Fire  ... 70           6       True

To read the first few values, we use the head function.

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head(3))

Output:

#          Name Type 1  Type 2  ...  Sp. Def  Speed Generation  Legendary
0  1  Bulbasaur  Grass Poison  ... 65     45           1      False
1  2    Ivysaur  Grass Poison  ... 80     60           1      False
2  3   Venusaur  Grass Poison  ... 100     80           1      False

To read a few values from the bottom we use tail functions

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.tail(3))

Output:

#                        Name Type 1  ... Speed  Generation Legendary
797  720              Confined Psychic  ... 70           6       True
798  720               Unbound Psychic  ... 80           6       True
799  721             Volcanion Fire  ... 70           6       True

To read the column name in CSV file

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.columns)

Output:

Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk',
      'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')

To read specific column name

import pandas as pd
df = pd.read_csv('sample.csv')
print(df['Name'])

Output:

0                  Bulbasaur
1                    Ivysaur
2                   Venusaur
3                   Venusaur
4                 Charmander
              ...
795                  Diancie

To read specific row

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.iloc[1])

Output:

#                   2
Name          Ivysaur
Type 1          Grass
Type 2         Poison
HP                 60
Attack             62
Defense            63
Sp. Atk            80
Sp. Def            80

To print the value of  specific row and columns

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.iloc[1,2])

Output:

Grass

To view only the results for a particular condition

import pandas as pd
df = pd.read_csv('sample.csv')
test = df.loc[df['Type 1']=='Grass']
print(test)

Output: 

  #                   Name Type 1  ... Speed  Generation Legendary
0      1              Bulbasaur Grass  ... 45           1      False
1      2              Ivysaur Grass  ... 60           1      False
2      3               Venusaur Grass  ... 80           1      False
3      3               Venusaur Grass  ... 80           1      False
48    43                 Oddish Grass ...    30           1      False

To  find the statistics for numerical columns

df.mean() This will return the mean of all columns
df.corr() This will return the correlation between columns in a DataFrame
df.count() This will return the number of non-null values in each DataFrame column
df.max() This will return the highest value in each column
df.min() This will return  the lowest value in each column
df.median() This will return the median of each column
df.std() This will return the standard deviation of each column
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.describe())  (describe function will give all the details all at once)

Output:

count  800.000000  800.000000  800.000000  ...  800.000000  800.000000   800.00000
mean   362.813750   69.258750   79.001250  ...   71.902500   68.277500     3.32375
std    208.343798   25.534669   32.457366  ...   27.828916   29.060474     1.66129
min      1.000000    1.000000    5.000000  ...   20.000000    5.000000     1.00000
25%    184.750000   50.000000   55.000000  ...   50.000000   45.000000     2.00000
50%    364.500000   65.000000   75.000000  ...   70.000000   65.000000     3.00000
75%    539.250000   80.000000  100.000000  ...   90.000000   90.000000     5.00000
max    721.000000  255.000000  190.000000  ...  230.000000  180.000000     6.00000

To add the column in the existing data frame.

For example, let’s say the sum of some columns 

import pandas as pd
df = pd.read_csv('sample.csv')
df['total']=df['HP']+df['Attack']
print(df)

Output: 

 import pandas as pd
df = pd.read_csv('sample.csv')
df['total']=df['HP']+df['Attack']
print(df)

To delete the column

 import pandas as pd
df = pd.read_csv('sample.csv')
df = df.drop(columns=['total'])
print(df)

Output:

 #                         Name Type 1  ... Sp. Def  Speed Legendary
0      1              Bulbasaur Grass  ... 65     45      False
1      2                Ivysaur Grass  ... 80     60      False
2      3               Venusaur Grass  ... 100     80      False
3      3               Venusaur Grass  ... 120     80      False
4      4             Charmander   Fire ... 50     65      False

To add multiple columns

In the following code, you might notice ‘:’ which refers to all the rows and that 4:9 refers to from column 4 to column 9.

import pandas as pd
df = pd.read_csv('sample.csv')
df['total']=df.iloc[:,4:9].sum(axis=1)
print(df)

Output:

#                          Name Type 1  ... Generation  Legendary total
0      1              Bulbasaur Grass  ... 1      False   273
1      2                Ivysaur Grass  ... 1      False   345
2      3               Venusaur Grass  ... 1      False   445
3      3               Venusaur Grass  ... 1      False   545
4      4             Charmander   Fire ...     1      False   244
..   ...                   ... ... ...   ... ... ...
795  719                Diancie Rock  ... 6       True 550
796  719                Diancie Rock  ... 6       True 590

To  filter the data

import pandas as pd
df = pd.read_csv('sample.csv')
test = df.loc[(df['Type 1'] == "Grass") & (df['Type 2'] == "Poison")]
print(df)

Output: 

#                          Name Type 1  ... Speed  Generation Legendary
0      1              Bulbasaur Grass  ... 45           1      False
1      2                Ivysaur Grass  ... 60           1      False
2      3               Venusaur Grass  ... 80           1      False
3      3               Venusaur Grass  ... 80           1      False
4      4             Charmander   Fire ... 65           1      False

Here in this blog, basic commands used in pandas are covered to help you better understand the essential Panda functions.
Do you find it interesting? you might also like these articles. Top 10 Best Tech Companies For Employees To Work In The USA In 2020 and Top 10 IT Staffing and Recruiting Agencies in the USA.
If you have a business idea in your mind and in search of a reliable web development company, you are in the right place. Hire the best Python developers in the industry from Agira technologies.

Looking for a Tech partner to dominate the digital world?

Preethi

Preethi is an enthusiactic developer and a quick learner. She loves reading books and gardening.