In this post, I’ll walk you through a practical automation that efficiently extracts data from PDF invoices and inputs it into Google Sheets. This process not only streamlines your invoicing but also leverages AI to enhance data accuracy.
This automation can handle both machine-readable and scanned PDFs, ensuring that regardless of the format, the data gets processed accurately. The system takes advantage of advanced invoice parsing technology and AI to refine the data extraction process, making it more efficient and reliable.
With this setup, I can easily create reports, analyze vendor data, and manage invoices without manual data entry. The automation streamlines workflows, allowing for quick access to financial data, which is crucial for decision-making.
Understanding Machine Readable vs. Scanned PDFs
Understanding the difference between machine-readable and scanned PDFs is crucial for effective invoice processing. Machine-readable PDFs contain actual text data that can be selected and copied. These types of documents allow for straightforward data extraction using standard parsing techniques.
In contrast, scanned PDFs are essentially images of documents. The text in these files isn’t selectable, making traditional parsing methods ineffective. For these documents, I use the vision capability of OpenAI’s GPT4o to convert the images into machine-readable text. This additional step is necessary to ensure that all invoices, regardless of their format, are processed correctly.

Setting Up the Make.com Scenario
To begin, I set up a scenario in Make.com specifically for invoice parsing. This scenario monitors a designated Google Drive folder for new PDF files. Whenever a new invoice is added, the automation triggers the parsing process, extracting relevant data and inputting it into Google Sheets.
Initially, I selected the Google Drive module and configured it to watch for new files by their creation time. This ensures that only the latest invoices are processed, preventing duplicates and streamlining the workflow.

Downloading PDF Files from Google Drive
Once the automation detects a new PDF, the next step is to download the file from Google Drive. I manually enter the file ID to ensure the correct document is downloaded. This step is essential as it allows the automation to access the invoice data for parsing.
After downloading the file, the automation prepares to send it to an invoice parsing service. This service will analyze the document and extract key information such as invoice number, date, vendor name, and total amounts.

Choosing an Invoice Parsing Service
For the invoice parsing service, I opted for pdf.co due to its ease of integration with Make.com. It’s user-friendly and offers a free trial, making it accessible for anyone looking to automate their invoice processing. Other alternatives exist, such as Mindy.com, but they may require more complex setups or have limited free pages.
When selecting a parsing service, consider factors such as compatibility with your automation platform, pricing, and the quality of data extraction. The right service can significantly enhance the efficiency of your invoice processing workflow.

Updating Google Sheets with Parsed Data
Once the invoice data is parsed, the next step is to update Google Sheets with this information. I set up a module in Make.com that adds a new row to my designated Google Sheet. The data extracted from the invoice includes fields like invoice number, date, vendor name, and total amount.
For this, I selected the Google Sheets module and configured it to add a row to my accounts payable spreadsheet. It’s important to ensure that the correct headers in the spreadsheet match the data being input. I made sure to specify that the table contains headers, which allows for accurate mapping of extracted data to the appropriate columns.

Mapping Invoice Data to Google Sheets
Mapping the data correctly is crucial. For example, I assigned the invoice number and date fields directly to their corresponding columns in the Google Sheet. Vendor information, such as the street address, city, and country, were also mapped accordingly. I had to combine the street address and city into one field, as the original invoice data did not separate them.
After setting up the mapping, I ran the automation again to ensure everything was working correctly. The data from the parsed invoice populated the Google Sheet as expected, giving me a clear view of the information without needing to enter it manually.

Enhancing Data with OpenAI
After updating Google Sheets with the parsed invoice data, I took an additional step to refine this information using OpenAI’s capabilities. This part of the process helps fill in gaps and infer missing details from the parsed data.
I integrated an OpenAI module where I generated a prompt to request additional information based on the existing data. For example, if the parsed data did not include the currency or tax rate, I used OpenAI to infer these values based on the context provided in the invoice.

Creating the OpenAI Prompt
In crafting the prompt, I included specific instructions for OpenAI to analyze the parsed invoice data. I guided it to focus on extracting structured information, like the currency and tax details. By providing examples of the expected output format, I aimed to improve the accuracy of the inferred data.
After running this module, I reviewed the output to ensure that OpenAI had successfully filled in the missing information. The results were often impressive, with the AI effectively inferring values that aligned with the invoice context.

Handling Scanned PDF Invoices
Dealing with scanned PDF invoices presents unique challenges. These documents often contain text that cannot be selected, requiring special handling. For these invoices, I utilized Optical Character Recognition (OCR) technology to convert the scanned images into machine-readable text.
Once the OCR process was complete, I passed the results to the invoice parsing service to extract the necessary data. This dual-step approach ensures that I can handle both machine-readable PDFs and scanned documents effectively.

Integrating OCR Technology
To implement the OCR technology, I added a step in my Make.com scenario specifically for scanned PDFs. If the initial parsing attempt returned no data, the automation would trigger the OCR process. This step is crucial for ensuring that all invoices, regardless of format, are processed correctly.
After OCR, I would run the parsing service again to extract the relevant invoice details. This additional layer of processing allows me to maintain a high level of data accuracy.

Creating Pivot Tables for Analysis
With all the data now in Google Sheets, I wanted to analyze it further. I created pivot tables to summarize the invoice data effectively. Pivot tables allow me to group invoices by vendor, total amounts, or other categories, providing valuable insights into spending patterns.
To create a pivot table, I selected the relevant data range in Google Sheets and inserted a new pivot table. From there, I could choose to display totals by vendor or even filter by date ranges to evaluate spending over specific periods.
Using Google Sheets Functions
I also leveraged Google Sheets’ built-in functions to enhance my analysis. For example, I used the Google Finance function to convert different currencies to a base currency. This capability is essential if I deal with international vendors and need to compare costs accurately.
The combination of pivot tables and functions provides a powerful way to manage financial data, making it easy to generate reports and track expenses over time. Each time I update the data, the pivot tables refresh automatically, ensuring I always have the latest insights at my fingertips.
Filtering PDF Files
To ensure that only relevant PDF invoices are processed, I implemented a filtering system in my automation. This step helps to avoid processing unnecessary files, such as images or unrelated documents, which could disrupt the workflow.
I added conditions to my Make.com scenario that specifically check for the file type. If the file is not a PDF, the automation skips it. This streamlining helps maintain a clean and efficient processing pipeline.

Document Type Field Implementation
I added a document type field to the Google Sheets to help distinguish between invoices and receipts. This is essential because documents sent together can vary in type, and categorizing them simplifies management.
The document type is inferred using OpenAI. I updated the prompt to extract this information based on the content of the document. If the parsing service fails to determine the document type, the automation will still attempt to classify it using the AI model.
This classification allows me to filter out any documents that are not invoices directly in Google Sheets. It’s a straightforward way to maintain a clean and organized dataset.

Next Steps in Automation
Once the basic automation is set up, it’s time to consider enhancements. I plan to implement additional features that could further streamline the process. For example, setting up alerts or notifications for processed invoices can help keep track of submissions and approvals.
Another idea is to integrate a dashboard that provides a visual overview of the invoice data. This can be done using Google Data Studio or similar tools, which can pull directly from the Google Sheets. It enables better insights into spending and can highlight trends over time.

Final Tips and Considerations
Maintaining the automation requires regular checks. It’s important to review the parsing results periodically to ensure data accuracy. Sometimes, the AI may not infer certain values correctly, so having a human in the loop for validation is really important.
Updating the parsing service and OpenAI prompts as new invoice formats are encountered will also keep the automation effective. As your business evolves, so too will your document types and formats, so staying adaptable is key.
