Automating Tedious Manual Data Entry with Python: A Step-by-Step Guide

In today’s data-driven world, managing and processing large datasets efficiently is crucial for businesses of all sizes. However, manual data entry can be time-consuming, error-prone, and monotonous. Fortunately, automation can significantly streamline these processes, saving valuable time and reducing the likelihood of mistakes. In this blog post, i’ll walk you through a Python script that automates the tedious task of data entry, specifically focusing on transposing inventory data from a CSV file to a formatted Excel spreadsheet.

Introduction
Prerequisites
Understanding the Workflow
Step-by-Step Breakdown
Benefits of Automation
Conclusion

Introduction

Manual data entry is often necessary for businesses to manage their inventories, track sales, and perform various analyses. However, this process can be tedious and inefficient, especially when dealing with large datasets. Automating data entry not only accelerates the workflow but also minimizes errors, ensuring data integrity. In this guide, we’ll explore a Python script designed to automate the transformation and formatting of inventory data, turning a cumbersome manual task into a seamless automated process.

Prerequisites

Before diving into the code, ensure you have the following:

Python Installed: Make sure you have Python 3.x installed on your machine. You can download it from the officialwebsite.
Required Libraries: The script utilizes several Python libraries. You can install them using pip:
bash
```
pip install pandas numpy openpyxl chardet
```
- pandas: For data manipulation and analysis.
- numpy: For numerical operations.
- openpyxl: For working with Excel files.
- chardet: For detecting file encoding.
Input CSV File: Prepare your CSV file (e.g., inventory_data.csv) containing the inventory data.
Output Directory: Decide where you want to save the transformed Excel file (e.g., transformed_inventory.xlsx).

Understanding the Workflow

The script performs the following key tasks:

Import Libraries: Load necessary Python libraries.
Define Transformation Function: Create a function to transpose data from wide to long format.
Set File Paths: Specify the locations of the input CSV and output Excel files.
Detect File Encoding: Automatically detect the encoding of the CSV file to ensure it’s read correctly.
Read CSV File: Load the CSV data into a pandas DataFrame.
Transform Data: Transpose the DataFrame to a long format with specific headers.
Save to Excel: Export the transformed data to an Excel file with proper formatting.
Display Preview: Show a preview of the transformed data.

Let’s delve into each of these steps in detail.

Step-by-Step Breakdown

1. Importing Necessary Libraries

python

import pandas as pd
import numpy as np
import openpyxl

pandas: Essential for data manipulation and analysis.
numpy: Provides support for large, multi-dimensional arrays and matrices.
openpyxl: Enables reading and writing Excel files.

2. Defining the Data Transformation Function

python

def transpose_inventory_data(df):
    """
    Transposes inventory data from wide to long format with specific headers for all rows.

    Parameters:
    df (DataFrame): The DataFrame containing the data.

    Returns:
    DataFrame: The transposed DataFrame for all rows.
    """
    # Define the exact headers in order
    headers = [
        'Item ID', 'Description', 'Category', 'Batch Number',
        'Location', 'Quantity', 'Unit Price', 'Total Value',
        'Supplier', 'Order Date', 'Delivery Date', 'Status'
    ]

    # Ensure all headers are present in the DataFrame
    missing_headers = [h for h in headers if h not in df.columns]
    if missing_headers:
        raise ValueError(f"The following headers are missing in the DataFrame: {missing_headers}")

    # Reset index to get a 'Row Index' column
    df = df.reset_index().rename(columns={'index': 'Row Index'})

    # Melt the DataFrame from wide to long format
    df_melted = df.melt(id_vars=['Row Index'], value_vars=headers, var_name='Header Name', value_name='Values')

    # Format numeric values to remove trailing zeros
    def format_values(x):
        try:
            if pd.notna(x) and isinstance(x, (int, float)) and x == int(x):
                return int(x)
            else:
                return x
        except:
            return x

    df_melted['Values'] = df_melted['Values'].apply(format_values)

    return df_melted

Function Breakdown:

Headers Definition: Specifies the exact order and names of headers expected in the data.
Header Validation: Checks if all required headers are present in the input DataFrame. Raises an error if any are missing.
Index Resetting: Adds a ‘Row Index’ column to keep track of the original row numbers.
Data Melting: Transforms the DataFrame from wide to long format using pandas.melt(), which is essential for certain types of data analysis and visualization.
Value Formatting: Cleans up numeric values by removing unnecessary trailing zeros, enhancing data readability.

3. Setting File Paths

python

# Example usage
# Update these paths to the actual locations of your files
input_file = r"C:/path/to/your/input/inventory_data.csv"
output_file = r"C:/path/to/your/output/transformed_inventory.xlsx"

Input File Path: Path to the CSV file containing the original inventory data.
Output File Path: Path where the transformed Excel file will be saved.

Note: Replace the paths with the actual locations of your files. Ensure that the paths do not expose any sensitive company information.

4. Detecting File Encoding

python

# Detect the file encoding
try:
    import chardet
    with open(input_file, 'rb') as file:
        rawdata = file.read(100000)
    result = chardet.detect(rawdata)
    detected_encoding = result['encoding']
    print(f"Detected encoding: {detected_encoding}")
except ImportError:
    detected_encoding = 'utf-8'
    print("chardet module not found, defaulting to 'utf-8' encoding.")

Purpose:

Encoding Detection: Ensures that the CSV file is read correctly by identifying its encoding. This is crucial for preventing errors related to character encoding mismatches.

How It Works:

Attempt to Import chardet: If available, it reads a portion of the file to guess the encoding.
Fallback Mechanism: If chardet isn’t installed, it defaults to 'utf-8'.

5. Reading the CSV File

python

# Read the CSV file with the detected encoding
try:
    df = pd.read_csv(input_file, encoding=detected_encoding)
except UnicodeDecodeError:
    # Try a different encoding if the detected one doesn't work
    df = pd.read_csv(input_file, encoding='latin1')
print(f"DataFrame shape after reading CSV: {df.shape}")

Steps:

Attempt to Read CSV: Uses the detected encoding to read the CSV file into a pandas DataFrame.
Error Handling: If a UnicodeDecodeError occurs (due to incorrect encoding), it retries with 'latin1' encoding.
Output DataFrame Shape: Prints the shape of the DataFrame to confirm successful loading.

6. Transforming the Data

python

# Execute the transformation
result_df = transpose_inventory_data(df)
print(f"Transposed DataFrame shape: {result_df.shape}")

Process:

Data Transformation: Calls the previously defined transpose_inventory_data function to reshape the data.
Output Transformed Shape: Prints the shape of the transformed DataFrame to verify the operation.

7. Saving the Transformed Data to Excel

python

# Save to Excel with proper formatting
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
    # Write the DataFrame to Excel
    result_df.to_excel(writer, index=False, sheet_name='Sheet1')

    # Get the worksheet
    worksheet = writer.sheets['Sheet1']

    # Adjust column widths
    worksheet.column_dimensions['A'].width = 10  # Row Index column
    worksheet.column_dimensions['B'].width = 25  # Header Name column
    worksheet.column_dimensions['C'].width = 30  # Values column

    # Add borders and formatting
    thin_border = openpyxl.styles.Border(
        left=openpyxl.styles.Side(style='thin'),
        right=openpyxl.styles.Side(style='thin'),
        top=openpyxl.styles.Side(style='thin'),
        bottom=openpyxl.styles.Side(style='thin')
    )
    for row in worksheet.iter_rows(min_row=1, max_row=result_df.shape[0]+1, min_col=1, max_col=3):
        for cell in row:
            cell.border = thin_border

Steps:

Initialize Excel Writer: Uses pandas.ExcelWriter with the openpyxl engine to write the DataFrame to an Excel file.
Write DataFrame to Excel: Exports result_df to Sheet1 without including the DataFrame index.
Adjust Column Widths: Sets specific widths for each column to enhance readability.
- Column A (Row Index): Width set to 10.
- Column B (Header Name): Width set to 25.
- Column C (Values): Width set to 30.
Add Borders: Applies thin borders to all cells in the worksheet for a clean and professional look.

8. Displaying a Preview

python

# Display the result
print("Preview of the transposed data:")
print(result_df.head(50).to_string(index=False))

Purpose:

Preview Output: Prints the first 50 rows of the transformed DataFrame to the console, allowing you to verify the transformation before delving into the Excel file.

Complete Script

For your convenience, here’s the complete Python script with sample headers and generic file paths:

python

import pandas as pd
import numpy as np
import openpyxl

def transpose_inventory_data(df):
    """
    Transposes inventory data from wide to long format with specific headers for all rows.

    Parameters:
    df (DataFrame): The DataFrame containing the data.

    Returns:
    DataFrame: The transposed DataFrame for all rows.
    """
    # Define the exact headers in order
    headers = [
        'Item ID', 'Description', 'Category', 'Batch Number',
        'Location', 'Quantity', 'Unit Price', 'Total Value',
        'Supplier', 'Order Date', 'Delivery Date', 'Status'
    ]

    # Ensure all headers are present in the DataFrame
    missing_headers = [h for h in headers if h not in df.columns]
    if missing_headers:
        raise ValueError(f"The following headers are missing in the DataFrame: {missing_headers}")

    # Reset index to get a 'Row Index' column
    df = df.reset_index().rename(columns={'index': 'Row Index'})

    # Melt the DataFrame from wide to long format
    df_melted = df.melt(id_vars=['Row Index'], value_vars=headers, var_name='Header Name', value_name='Values')

    # Format numeric values to remove trailing zeros
    def format_values(x):
        try:
            if pd.notna(x) and isinstance(x, (int, float)) and x == int(x):
                return int(x)
            else:
                return x
        except:
            return x

    df_melted['Values'] = df_melted['Values'].apply(format_values)

    return df_melted

# Example usage
# Update these paths to the actual locations of your files
input_file = r"C:/path/to/your/input/inventory_data.csv"
output_file = r"C:/path/to/your/output/transformed_inventory.xlsx"

# Detect the file encoding
try:
    import chardet
    with open(input_file, 'rb') as file:
        rawdata = file.read(100000)
    result = chardet.detect(rawdata)
    detected_encoding = result['encoding']
    print(f"Detected encoding: {detected_encoding}")
except ImportError:
    detected_encoding = 'utf-8'
    print("chardet module not found, defaulting to 'utf-8' encoding.")

# Read the CSV file with the detected encoding
try:
    df = pd.read_csv(input_file, encoding=detected_encoding)
except UnicodeDecodeError:
    # Try a different encoding if the detected one doesn't work
    df = pd.read_csv(input_file, encoding='latin1')
print(f"DataFrame shape after reading CSV: {df.shape}")

# Execute the transformation
result_df = transpose_inventory_data(df)
print(f"Transposed DataFrame shape: {result_df.shape}")

# Save to Excel with proper formatting
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
    # Write the DataFrame to Excel
    result_df.to_excel(writer, index=False, sheet_name='Sheet1')

    # Get the worksheet
    worksheet = writer.sheets['Sheet1']

    # Adjust column widths
    worksheet.column_dimensions['A'].width = 10  # Row Index column
    worksheet.column_dimensions['B'].width = 25  # Header Name column
    worksheet.column_dimensions['C'].width = 30  # Values column

    # Add borders and formatting
    thin_border = openpyxl.styles.Border(
        left=openpyxl.styles.Side(style='thin'),
        right=openpyxl.styles.Side(style='thin'),
        top=openpyxl.styles.Side(style='thin'),
        bottom=openpyxl.styles.Side(style='thin')
    )
    for row in worksheet.iter_rows(min_row=1, max_row=result_df.shape[0]+1, min_col=1, max_col=3):
        for cell in row:
            cell.border = thin_border

# Display the result
print("Preview of the transposed data:")
print(result_df.head(50).to_string(index=False))

Important: Replace the input_file and output_file paths with the actual locations of your CSV and desired Excel files. Ensure that the headers in the headers list match those in your CSV file to avoid any ValueError.

Benefits of Automation

Automating data entry and transformation offers numerous advantages:

Time Efficiency: Reduces the time spent on repetitive tasks, allowing employees to focus on more strategic activities.
Accuracy: Minimizes human errors associated with manual data handling.
Consistency: Ensures uniform data formatting and structure, which is essential for reliable analysis.
Scalability: Easily handles large volumes of data without a proportional increase in effort.
Reusability: Once set up, the automation script can be reused for similar tasks, enhancing productivity.

Conclusion

Automating manual data entry tasks not only enhances efficiency but also ensures data integrity and consistency. The Python script outlined in this guide demonstrates how you can transform and format inventory data seamlessly, turning a once tedious process into a streamlined automated workflow. By leveraging powerful libraries like pandas and openpyxl, you can customize and extend this script to suit various data management needs, ultimately empowering your business to operate more effectively in a data-centric environment.

Tip: Always back up your original data before running any automation scripts to prevent accidental data loss. Additionally, consider adding more error handling and logging to make the script robust and easier to debug in different environments.

Happy automating!

Automating Tedious Manual Data Entry with Python: A Step-by-Step Guide

Table of Contents

Introduction

Prerequisites

Understanding the Workflow

Step-by-Step Breakdown

1. Importing Necessary Libraries

2. Defining the Data Transformation Function

3. Setting File Paths

4. Detecting File Encoding

5. Reading the CSV File

6. Transforming the Data

7. Saving the Transformed Data to Excel

8. Displaying a Preview

Complete Script

Benefits of Automation

Conclusion

Share this post

Related Posts