Extracting information from price lists using AI


About Preskok

Preskok is on a mission to disrupt the car remarketing industry and make it more efficient. They are one of the leaders in this market, with 200mio EUR in revenue per year. As a high-tech company, they recognised the power of AI to build competitive advantage.

We partnered up to support them with AI expertise and identify their main challenges in automating processes of buying and re-selling new and young used cars. There are many opportunities hence we’ve built tight relationship and set ambitious plans.


Detailed understanding of the car market

The car remarketing industry is quite complex due to so many moving parts (e.g. brands, models, versions, engine types, equipment etc.) and numerous markets with various specifics. Knowing this, our client had some key challenges they were facing in the industry.

  • It is a very dynamic market and hence it is hard to analyze and track, making it difficult to thoroughly understand market conditions.
  • Most processes and trades are still manually managed. This is the reason there is very little data available, which is essential for making informed decisions.
  • Hence their main goal is to automate processes of collecting data to better understand the market and offer the best experience on the market.

Iterative approach to solve hard problems

At Pareto AI, we take a holistic approach to helping our clients. We work closely with them to understand their technical challenges and plan out an AI strategy to help them overcome these challenges.

We build tight collaboration with their software engineering team and domain experts. Taking agile approach, with regular (fortnight) demos, we aim to build transparency and focus on business value. This helps to ensure long-term relationship and achieving ambitious plans.


Multiple AI models to automatically parse price lists

We love challenges, so we were excited to partner up with Preskok and help them with ambitious plans to automate the process of extracting information from price lists to build the most comprehensive vehicle database. Here’s how we help them:

  • We built a machine learning model that process price list PDFs and annotate all pages based on extracted text and images to determine what information is available on each page. These annotations are then used by other machine learning models to sufficiently process the data. 
  • One of the main challenges in price lists are tables due to so many different formats. There are some existing tools to extract tables that we used as a basis, but we needed to implement bespoke AI model to sufficiently process and connect information within the tables.
  • Besides tables, there is a lot of text in free format, especially about the additional equipment, which is equally important to understand the full vehicle price. Hence we built another ensemble of machine learning models to parse such data.
  • Since there is no standard format and price lists are constantly changing, we implemented a feedback loop to improve the algorithm as more price lists are processed. This ensures that the algorithm is adapting to new formats and can process new, unseen formats.
  • To ensure scalability of the system, the algorithms are implemented as micro-services in AWS, which also helps to run the system in a cost effective way.

Price lists processed in seconds instead of hours

We are still on the way to fully automate the process of extracting and collecting data from vehicle price lists for our client. The current system is capable of processing PDFs in seconds, which is a significant improvement compared to hours of manual work per price list.

However, there are still many challenges to be solved to get to the fully automated workflow to track and understand the car selling market into detail. The main challenge is to build and train the system of machine learning models to be able to process basically any price list that exists in the market, no matter the brand nor country of origin.