Essential Pandas Code for Beginners

Welcome to an interesting library in Python! I'm Brahma 👋, a passionate software developer. I am documenting my learning journey through a series of blog posts. Stay tuned!!

Introduction

Pandas, unlike the ones in the cover picture 😂, can be complicated at times to understand and implement. So, I am going to share Top 20 must know codes to call yourself a LinkedIn Pandas expert 😂. Jokes aside let's delve into this.

Creating a Pandas DataFrame

The first step before performing any operation on some data is importing it. Ofc!!😂

The if-you-don't-have-data method:

Sounds funny 😂. Coz it is!!
```
 data = pd.DataFrame(np.arange(10).reshape(5,2), index=['Row1','Row2','Row3','Row4','Row5'], columns=['Column1','Column2'])
 data.head()
```
So, let's assume Kaggle is an alien for you then you can use the above method to generate a demo dataset. Ofc won't advise that but it's okay if you don't have a dataset 🫠.

The all-customised method:
Using dictionaries and lists to create a dataframe.

 import pandas as pd

 # Creating a DataFrame from a dictionary
 data = {
     'Name': ['Alice', 'Bob', 'Charlie'],
     'Age': [25, 30, 35],
     'City': ['New York', 'Los Angeles', 'Chicago']
 }
 df = pd.DataFrame(data)
 print(df)

 # Creating a DataFrame from a list of lists
 data = [
     ['Alice', 25, 'New York'],
     ['Bob', 30, 'Los Angeles'],
     ['Charlie', 35, 'Chicago']
 ]
 df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
 print(df)

No wonder why would you do that 🫠.

The famous method:

Importing data from a csv file.

 # Reading the CSV file into a DataFrame
 df = pd.read_csv('data.csv')
 print(df)
 df = pd.read_csv('data.csv', index_col=0) # basically makes the 1st column as the index column
 print(df)

The jsonified method:

As the name suggests its importing from a json file.

 import pandas as pd
 # Read JSON file into a DataFrame
 df = pd.read_json('data.json')
 # Display the DataFrame
 print(df)

Accessing the Data

Thelocmethod:
```
 df.loc[row_label, column_label]
```
Theilocmethod (my fav 😍):
```
 df.iloc[row_index, column_index]
```

Inspecting the Data

head&tail:

 import pandas as pd

 # Create a sample DataFrame
 data = {
     'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
     'Age': [25, 30, 35, 40, 45],
     'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
 }
 df = pd.DataFrame(data)

 # View the first few rows
 print("First few rows:")
 print(df.head())

 # View the last few rows
 print("\nLast few rows:")
 print(df.tail())

This is probably the 1st thing anyone does on receiving a dataset.

The Summary:

info and describe

 # Get information about the DataFrame
 print("DataFrame info:")
 print(df.info())

 # Get summary statistics for numerical columns
 print("\nSummary statistics:")
 print(df.describe())

Miscelaneous:

 import pandas as pd

 # Create a sample DataFrame
 data = {
     'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
     'Age': [25, 30, 35, 40, 45],
     'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
 }
 df = pd.DataFrame(data)

 # Get the values of the DataFrame as a NumPy array
 print("DataFrame values:")
 print(df.values)

 # Get the counts of unique values
 print("Value counts:")
 print(df['Name'].value_counts())

 # Get the unique values in the 'City' column
 print("Unique cities:")
 print(df['City'].unique())

 # Inspect data types of columns
 print("Data types of columns:")
 print(df.dtypes)

Cleaning the Data

The Null Detector:

 import pandas as pd

 # Create a sample DataFrame with missing values
 data = {
     'Name': ['Alice', 'Bob', 'Charlie', None, 'Emily'],
     'Age': [25, None, 35, 40, 45],
     'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', None]
 }
 df = pd.DataFrame(data)

 # Detect missing values
 print("Missing values:")
 print(df.isnull())

The Null Remover:

 # Drop rows with any missing values
 print("Drop rows with any missing values:")
 print(df.dropna())

 # Drop columns with any missing values
 print("Drop columns with any missing values:")
 print(df.dropna(axis=1))

The Null Filler:

 # Fill missing values with a specified value
 print("Fill missing values with 0:")
 print(df.fillna(0))

 # Forward fill missing values
 print("Forward fill missing values:")
 print(df.fillna(method='ffill'))

The Convertor:

Convert data types of columns.

 # Convert the 'Age' column to integers
 print("Convert 'Age' column to integers:")
 df['Age'] = df['Age'].fillna(0)  # Fill missing values first
 print(df['Age'].astype(int))

The Doglapan Detector:**

Doglapan aka**duplicates**

 # Create a sample DataFrame with duplicates
 data = {
     'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Emily'],
     'Age': [25, 30, 35, 25, 45],
     'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Miami']
 }
 df = pd.DataFrame(data)

 # Detect duplicate rows
 print("Duplicate rows:")
 print(df.duplicated())

The Doglapan Remover:**

 # Drop duplicate rows
 print("Drop duplicate rows:")
 print(df.drop_duplicates())

Manupulating the Data

The filter:

 import pandas as pd

 # Sample DataFrame
 data = {
     'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
     'Age': [25, 30, 35, 40, 45],
     'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
 }
 df = pd.DataFrame(data)

 # Filter rows where Age is greater than 30
 filtered_df = df[df['Age'] > 30]
 print("Filtered DataFrame:\n", filtered_df)

The sorted method:

 # Sort by Age in ascending order
 sorted_df = df.sort_values(by='Age')
 print("DataFrame sorted by Age:\n", sorted_df)

 # Sort by index in descending order
 sorted_index_df = df.sort_index(ascending=False)
 print("\nDataFrame sorted by index:\n", sorted_index_df)

Adding & Removing Columns:

 # Add a new column
 df['Salary'] = [70000, 80000, 90000, 100000, 110000]
 print("DataFrame with new column:\n", df)

 # Remove a column
 df = df.drop(columns=['City'])
 print("\nDataFrame after removing column:\n", df)

The Aggregator:

 # Aggregation using sum
 sum_df = df.groupby('Name').sum()
 print("Sum Aggregation:\n", sum_df)

 # Aggregation using mean
 mean_df = df.groupby('Name').mean()
 print("\nMean Aggregation:\n", mean_df)

 # Aggregation using count
 count_df = df.groupby('Name').count()
 print("\nCount Aggregation:\n", count_df)

Transforming the Data

apply and map :

 import pandas as pd

 # Sample DataFrame
 df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})

 # Applying a function to each element of a column
 df['A_squared'] = df['A'].apply(lambda x: x ** 2)

 # Mapping values of a column to new values
 df['B_mapped'] = df['B'].map({5: 'Five', 6: 'Six', 7: 'Seven', 8: 'Eight'})

 print(df)

The Vectors:

 # Vectorized addition of two columns
 df['A_plus_B'] = df['A'] + df['B']

 print(df)

Conclusion

So, that's all folks. These are the 20 important Pandas (sounds weird 😂).

Keep coding, keep learning, and enjoy the endless possibilities that Python has to offer!

That's all folks. Leave a like and some lovely critics in the comments😁.

Signing off!!!👋

Top 20 Must-Know Pandas Code for Newbies

Introduction

Creating a Pandas DataFrame

Accessing the Data

Inspecting the Data

Cleaning the Data

Manupulating the Data

Transforming the Data

Conclusion

Comments

More from this blog

Tech Rewind Episode 2: Monopoly Slaps, Mars Water, and Midlife Crisis Alerts!

TechRoast S1E1: Tech Rewind: Big Tech's Blunders, AI's Brain Boost, and a Crypto King's Confession

Getting Started with Statistical Learning: Insights from Chapter 1 of ISLR

Machine Learning for Dummies

Gradio Basics: How to Use Gradio for Dummies

Command Palette

Introduction

Creating a Pandas DataFrame

Accessing the Data

Inspecting the Data

Cleaning the Data

Manupulating the Data

Transforming the Data

Conclusion

Comments

More from this blog