Top 20 Must-Know Pandas Code for Newbies

Photo by Stone Wang on Unsplash

Top 20 Must-Know Pandas Code for Newbies

Welcome to an interesting library in Python! I'm Brahma πŸ‘‹, a passionate software developer. I am documenting my learning journey through a series of blog posts. Stay tuned!!

Introduction

Pandas, unlike the ones in the cover picture πŸ˜‚, can be complicated at times to understand and implement. So, I am going to share Top 20 must know codes to call yourself a LinkedIn Pandas expert πŸ˜‚. Jokes aside let's delve into this.

Creating a Pandas DataFrame

The first step before performing any operation on some data is importing it. Ofc!!πŸ˜‚

  1. The if-you-don't-have-data method:

    Sounds funny πŸ˜‚. Coz it is!!

     data = pd.DataFrame(np.arange(10).reshape(5,2), index=['Row1','Row2','Row3','Row4','Row5'], columns=['Column1','Column2'])
     data.head()
    

    So, let's assume Kaggle is an alien for you then you can use the above method to generate a demo dataset. Ofc won't advise that but it's okay if you don't have a dataset 🫠.

  2. The all-customised method:
    Using dictionaries and lists to create a dataframe.

     import pandas as pd
    
     # Creating a DataFrame from a dictionary
     data = {
         'Name': ['Alice', 'Bob', 'Charlie'],
         'Age': [25, 30, 35],
         'City': ['New York', 'Los Angeles', 'Chicago']
     }
     df = pd.DataFrame(data)
     print(df)
    
     # Creating a DataFrame from a list of lists
     data = [
         ['Alice', 25, 'New York'],
         ['Bob', 30, 'Los Angeles'],
         ['Charlie', 35, 'Chicago']
     ]
     df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
     print(df)
    

    No wonder why would you do that 🫠.

  3. The famous method:

    Importing data from a csv file.

     # Reading the CSV file into a DataFrame
     df = pd.read_csv('data.csv')
     print(df)
     df = pd.read_csv('data.csv', index_col=0) # basically makes the 1st column as the index column
     print(df)
    
  4. The jsonified method:

    As the name suggests its importing from a json file.

     import pandas as pd
     # Read JSON file into a DataFrame
     df = pd.read_json('data.json')
     # Display the DataFrame
     print(df)
    

Accessing the Data

  1. Thelocmethod:

     df.loc[row_label, column_label]
    
  2. Theilocmethod (my fav 😍):

     df.iloc[row_index, column_index]
    

Inspecting the Data

  1. head&tail:

     import pandas as pd
    
     # Create a sample DataFrame
     data = {
         'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
         'Age': [25, 30, 35, 40, 45],
         'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
     }
     df = pd.DataFrame(data)
    
     # View the first few rows
     print("First few rows:")
     print(df.head())
    
     # View the last few rows
     print("\nLast few rows:")
     print(df.tail())
    

    This is probably the 1st thing anyone does on receiving a dataset.

  2. The Summary:

    info and describe

     # Get information about the DataFrame
     print("DataFrame info:")
     print(df.info())
    
     # Get summary statistics for numerical columns
     print("\nSummary statistics:")
     print(df.describe())
    
  3. Miscelaneous:

     import pandas as pd
    
     # Create a sample DataFrame
     data = {
         'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
         'Age': [25, 30, 35, 40, 45],
         'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
     }
     df = pd.DataFrame(data)
    
     # Get the values of the DataFrame as a NumPy array
     print("DataFrame values:")
     print(df.values)
    
     # Get the counts of unique values
     print("Value counts:")
     print(df['Name'].value_counts())
    
     # Get the unique values in the 'City' column
     print("Unique cities:")
     print(df['City'].unique())
    
     # Inspect data types of columns
     print("Data types of columns:")
     print(df.dtypes)
    

Cleaning the Data

  1. The Null Detector:

     import pandas as pd
    
     # Create a sample DataFrame with missing values
     data = {
         'Name': ['Alice', 'Bob', 'Charlie', None, 'Emily'],
         'Age': [25, None, 35, 40, 45],
         'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', None]
     }
     df = pd.DataFrame(data)
    
     # Detect missing values
     print("Missing values:")
     print(df.isnull())
    
  2. The Null Remover:

     # Drop rows with any missing values
     print("Drop rows with any missing values:")
     print(df.dropna())
    
     # Drop columns with any missing values
     print("Drop columns with any missing values:")
     print(df.dropna(axis=1))
    
  3. The Null Filler:

     # Fill missing values with a specified value
     print("Fill missing values with 0:")
     print(df.fillna(0))
    
     # Forward fill missing values
     print("Forward fill missing values:")
     print(df.fillna(method='ffill'))
    
  4. The Convertor:

    Convert data types of columns.

     # Convert the 'Age' column to integers
     print("Convert 'Age' column to integers:")
     df['Age'] = df['Age'].fillna(0)  # Fill missing values first
     print(df['Age'].astype(int))
    
  5. The Doglapan Detector:**

    Doglapan aka**duplicates**

     # Create a sample DataFrame with duplicates
     data = {
         'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Emily'],
         'Age': [25, 30, 35, 25, 45],
         'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Miami']
     }
     df = pd.DataFrame(data)
    
     # Detect duplicate rows
     print("Duplicate rows:")
     print(df.duplicated())
    
  6. The Doglapan Remover:**

     # Drop duplicate rows
     print("Drop duplicate rows:")
     print(df.drop_duplicates())
    

Manupulating the Data

  1. The filter:

     import pandas as pd
    
     # Sample DataFrame
     data = {
         'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
         'Age': [25, 30, 35, 40, 45],
         'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
     }
     df = pd.DataFrame(data)
    
     # Filter rows where Age is greater than 30
     filtered_df = df[df['Age'] > 30]
     print("Filtered DataFrame:\n", filtered_df)
    
  2. The sorted method:

     # Sort by Age in ascending order
     sorted_df = df.sort_values(by='Age')
     print("DataFrame sorted by Age:\n", sorted_df)
    
     # Sort by index in descending order
     sorted_index_df = df.sort_index(ascending=False)
     print("\nDataFrame sorted by index:\n", sorted_index_df)
    
  3. Adding & Removing Columns:

     # Add a new column
     df['Salary'] = [70000, 80000, 90000, 100000, 110000]
     print("DataFrame with new column:\n", df)
    
     # Remove a column
     df = df.drop(columns=['City'])
     print("\nDataFrame after removing column:\n", df)
    
  4. The Aggregator:

     # Aggregation using sum
     sum_df = df.groupby('Name').sum()
     print("Sum Aggregation:\n", sum_df)
    
     # Aggregation using mean
     mean_df = df.groupby('Name').mean()
     print("\nMean Aggregation:\n", mean_df)
    
     # Aggregation using count
     count_df = df.groupby('Name').count()
     print("\nCount Aggregation:\n", count_df)
    

Transforming the Data

  1. apply and map :

     import pandas as pd
    
     # Sample DataFrame
     df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
    
     # Applying a function to each element of a column
     df['A_squared'] = df['A'].apply(lambda x: x ** 2)
    
     # Mapping values of a column to new values
     df['B_mapped'] = df['B'].map({5: 'Five', 6: 'Six', 7: 'Seven', 8: 'Eight'})
    
     print(df)
    
  2. The Vectors:

     # Vectorized addition of two columns
     df['A_plus_B'] = df['A'] + df['B']
    
     print(df)
    

Conclusion

So, that's all folks. These are the 20 important Pandas (sounds weird πŸ˜‚).

Keep coding, keep learning, and enjoy the endless possibilities that Python has to offer!

That's all folks. Leave a like and some lovely critics in the comments😁.

Signing off!!!πŸ‘‹

Β