RealdataAPI Store - Browse tools published by our community and use them for your projects right away
logo

AI Product Matcher - Real-Time AI Product Matching

RealdataAPI / ai-product-matcher

Match ecommerce products by gathering them from various websites and comparing them using AI Product Matcher. Use the tool for product matching using AI whenever you want from multiple ecommerce stores for market research, competitive analysis, and dynamic pricing. Our real-time AI product matching tool is available in countries like the USA, UK, UAE, Canada, France, Germany, France, Singapore, Spain, Mexico, etc.

How Does an AI Product Matcher Work?

The AI product matching tool uses a customized model based on machine learning and resolves product mapping issues across digital stores. Use it to discover similar products across various e-commerce sites for competitor analysis, dynamic pricing, and market research. Also, use it to replace manual product mapping. Check out the input section below for detailed settings.

To use the enterprise AI data matching tool, you must have datasets of targeted products to match. If you don't have the dataset, you can scrape those products using any of our scrapers from the store page and use the generated dataset. If you already have those datasets, import them to the console account using API. Note that you can only match English product data.

Contact our enterprise team if you want us to manage your data funnel or design the scraper based on your custom requirements. Meanwhile, check out the below ecommerce data scraper available on our platform:

  • Amazon Product Scraper
  • eBay Scraper
  • Shopify Scraper
  • Walmart Scraper
  • Google Shopping Scraper

How Should be the Input Format? s

Here is how you can prepare the AI product matcher input. Check out the samples of tentative input at the end of the section.

How to Mention Input Datasets?

There are two ways to use the scraper based on dataset format:

Dataset With Candidate Pairs

You may have a dataset containing information rows for two products to compare and match. Here, enter the pair dataset ids in the input pair_dataset_ids. The scraper allows you to enter multiple IDs if you have more than two datasets to match simultaneously. The AI product matching tool will check all the data rows, compare them, and decide their similarities.

Two Different Product Datasets:

In the other case, you may have individual data for each product from different e-commerce stores. Put the dataset ids into the input, like dataset1_ids and dataset2_ids. Then the tool will check both datasets, discover the possibility of product matching in these ecommerce products, and display the output.

How to Mention the Format of Input Dataset?

The next part of the input for the AI product matcher is to update it about the dataset format you will use by representing the scraper input as input_mapping. It would help if you fed the input in JSON format with eshop1 and eshop2 attributes. They describe what factors the scraper will find the required data for the particular ecommerce store. These attributes must contain objects according to the following example:

{ "id": "productUrl", "name": "productName", "price": "currentPrice", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": [ "SKU", "ASIN" ] }

All the attributes of the object mention where to find the required product attributes in the dataset. For example, using AI, you can find the product name in the productName attribute of the dataset you have already given for product matching in ecommerce. Here are the required product attributes:

  • name: Mention the product name here.
  • id: It is the unique product identifier. You must use it for the product matching tool to successfully match products, but not for the machine learning model input.
  • price: it is to mention the selling price of the selected product. If the price is not available, keep it blank. Further, you can include the currency symbol with the price figure. But the product matcher can't match products with different currencies, so you must perform the currency conversion up front.
  • short_description: Many ecommerce stores give short descriptions for products near the name, image, and price. Typically it explains the most significant specifications or features of the product parameters, like 500GB GB hard drive, Intel Core i3, and 32GB RAM for a laptop.
  • long_description: many ecommerce platforms display the textual description of products they get from the manufacturers.
  • specifications: you should give specifications in the JSON formatted arrays containing the product parameters, like color, components, weight, dimensions, etc., that you can get from the product page. It would help if you represented the parameters in JSON objects with value and key properties. Here is an example of the complete product specification:
  • [ { "key": "RAM memory", "value": "16 GB" }, { "key": "CPU", "value": "Intel Core i3" }, { "key": "Display resolution", "value": "1920:1080" } ]
  • code mainly allows you to mention multiple attributes of input datasets in the above sample. The mentioned attribute should have product codes like ASIN, EAN, SKU, etc.

No need to constantly enter each of the above inputs every time; you may not find a few of them in a few e-commerce stores to match products. But if you don't provide them, the matcher may not give you accurate output. Check out the performance section to learn more.

How to Mention the Format of Output Dataset?

Once you specify the input dataset format, you must mention the attributes you want the product matcher to include in the output dataset. You can do it using the scraper input output_mapping, similar to input_mapping, which you can read in the below sample:

{ "eshop1": { "id_source": "productUrl", "name_source": "productName" }, "eshop2": { "id_target": "EAN", "name_target": "productName" } }

Likewise, mention the separate attributes for every e-store. Then, each line will mention the definition of the output datasets, for instance, id_source, and its corresponding input dataset, like productName. Additionally, the resulting dataset will include two attributes for the product pair:

  • predicted_match: It will be one; if the product matcher thinks the two items in the pair are the same and zero if not.
  • predicted_scores: it mentions how much the product matcher thinks about the similarity of two products. The score will be around one if the matcher considers both products the same and near zero if not. If you want to use your threshold score, this output attribute will help you. For instance, you can take the products with high predicted scores if you want a surety.

Precision/Recall Tradeoff Setting

As mentioned above, you can replace manual product matching with AI tools like Product Matcher or improve its efficiency using various settings. To do this, mention the precision/recall tradeoff setting with the precision/recall tradeoff representation in the input form or as the attribute precision_recall in the JSON formatted input. Its output may need to be corrected due to the absence of a flawless machine-learning model. It will enable you to mention issues and mistakes the tool should minimize while in the process. Use any of these two settings.

  • precision: if the tool marks selected products as the same, it will try to ensure the highest accuracy. It gives reliable pairs of products. Since the model must achieve high-level precision, it will mark more genuine products differently.
  • recall: the prototype will try to find the maximum possible product pairs where both products are the same even though there may be a mistake, as it may pick the wrong pairs of different products.

Check out the expected performance section to learn more about particular performance numbers in this readme section.

Example Input for Candidate Pair Datasets

{ "pair_dataset_ids": [ "Insert your dataset IDs here" ], "input_mapping": { "eshop1": { "id": "id1", "name": "name1", "price": "price1", "short_description": "short_description1", "long_description": "long_description1", "specification": "specification1", "code": [ "SKU", "ASIN" ] }, "eshop2": { "id": "id2", "name": "name2", "price": "price2", "short_description": "short_description2", "long_description": "long_description2", "specification": "specification2", "code": [ "EAN", "ASIN" ] } }, "output_mapping": { "eshop1": { "id_source": "id1", "name_source": "name1" }, "eshop2": { "id_target": "id2", "name_target": "name2" } }, "precision_recall": "precision" }

Example Input for Datasets of Two Separate Products

{ "dataset1_ids": [ "Insert your dataset IDs here" ], "dataset2_ids": [ "Insert your dataset IDs here" ], "input_mapping": { "eshop1": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "SKU", "ASIN" ] }, "eshop2": { "id": "productUrl", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specifications", "code": [ "EAN", "ASIN" ] } }, "output_mapping": { "eshop1": { "id_source": "url", "name_source": "name" }, "eshop2": { "id_target": "productUrl", "name_target": "name" } }, "precision_recall": "precision" }

Where to Find Output and What Will It Look Like?

The tool will store the results for real-time product matching in the default dataset of scraper execution. That you can find on the run page of your console account. Export the results in different ways, like manually and using an API in Excel, CSV, or JSON format.

Check out the above subsection for output formats to see more details on the output format.

What is the Accuracy of the AI Product Matcher?

Our team constantly works to make the AI product matcher more and more accurate by experimenting, analyzing, and using the trial and error method, where we gather thousands of manually annotated product pairs from various categories. Besides, we use that data to train the model for better results to deliver the best product-matching services. We have also tailored the separate product pair dataset to feed the tool for the first time after the training. After that, we checked the performance with the unique data. The accuracy of the results relies on the setting precision/recall tradeoff.

We saw that the AI product matcher is around 95 percent precise in giving accurate results. And could find around 60 percent of product pairs with the same products.

Even though we train and test the AI model for precision, accuracy, and recall, we recommend you investigate it before importing the large-scale data into the tool because it may give variable results considering that you use data from different resources.

Important Notes:

  • The data you use to match products have a massive role in the performance of the product matcher. It may not give a compelling performance if you miss any attribute, like code, name, price, etc.
  • We have trained the AI product matching tool to consider multiple color variants of the same product, like a T-shirt.
  • With the underlying ml model changes, the decisions of the tool may vary with various versions. It is due to the improvement in the general performance of the ml model in the future. If you wish to ensure the sound decision of the matcher, use the specific version instead of the latest one.

What is the Cost of Using This AI Product Matcher?

The pricing model depends on pay-per-result, which means you will pay a small amount for specific results. You can check our pricing page to learn more about the detailed pricing. Here, the amount of scraper charges depends on the result and the input type for the product pairs.

  • Candidate pair dataset: a simple case where you'll get the same number of results for selected input rows in the dataset.
  • Two separate product datasets: it is a complex dataset due to the availability of paired products. For example, if you have 100 products from the first store and 50 from the second store, you will get 100×50=5000 results.

If you wish to restrict the results and the budget, you can check it out in the scraper options.

Industries

Check out how industries are using AI Product Matcher around the world.

saas-btn.webp

E-commerce & Retail

You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN> in the program using the token of your scraper. Read about the live APIs with Real Data API docs for more explanation.

import { RealdataAPIClient } from 'RealdataAPI-Client';

// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "dataset1_ids": [
        "GYVCj4hEeqnX3dJyu"
    ],
    "dataset2_ids": [
        "OmzHV4VEByO4KohMF"
    ],
    "input_mapping": {
        "eshop1": {
            "id": "url",
            "name": "name",
            "price": "price",
            "short_description": "shortDescription",
            "long_description": "longDescription",
            "specification": "specification",
            "code": [
                "sku",
                "productModel"
            ]
        },
        "eshop2": {
            "id": "url",
            "name": "name",
            "price": "price",
            "short_description": "shortDescription",
            "long_description": "longDescription",
            "specification": "specification",
            "code": [
                "sku",
                "productModel"
            ]
        }
    }
};

(async () => {
    // Run the Actor and wait for it to finish
    const run = await client.actor("equidem/ai-product-matcher").call(input);

    // Fetch and print Actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from RealdataAPI_client import RealdataAPIClient

# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "dataset1_ids": ["GYVCj4hEeqnX3dJyu"],
    "dataset2_ids": ["OmzHV4VEByO4KohMF"],
    "input_mapping": {
        "eshop1": {
            "id": "url",
            "name": "name",
            "price": "price",
            "short_description": "shortDescription",
            "long_description": "longDescription",
            "specification": "specification",
            "code": [
                "sku",
                "productModel",
            ],
        },
        "eshop2": {
            "id": "url",
            "name": "name",
            "price": "price",
            "short_description": "shortDescription",
            "long_description": "longDescription",
            "specification": "specification",
            "code": [
                "sku",
                "productModel",
            ],
        },
    },
}

# Run the Actor and wait for it to finish
run = client.actor("equidem/ai-product-matcher").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare Actor input
cat > input.json <<'EOF'
{
  "dataset1_ids": [
    "GYVCj4hEeqnX3dJyu"
  ],
  "dataset2_ids": [
    "OmzHV4VEByO4KohMF"
  ],
  "input_mapping": {
    "eshop1": {
      "id": "url",
      "name": "name",
      "price": "price",
      "short_description": "shortDescription",
      "long_description": "longDescription",
      "specification": "specification",
      "code": [
        "sku",
        "productModel"
      ]
    },
    "eshop2": {
      "id": "url",
      "name": "name",
      "price": "price",
      "short_description": "shortDescription",
      "long_description": "longDescription",
      "specification": "specification",
      "code": [
        "sku",
        "productModel"
      ]
    }
  }
}
EOF

# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/equidem~ai-product-matcher/runs?token=$API_TOKEN" /
  -X POST /
  -d @input.json /
  -H 'Content-Type: application/json'

Dataset IDs from the First Store

dataset1_ids Optional Array

It is about the dataset IDs having product information from the first store.

Dataset IDs from the Second Store

dataset2_ids Optional Array

It is about the dataset IDs having product information from the second store.

IDs of Pair Datasets

pair_dataset_ids Optional Array

It is about the dataset IDs containing the product pair information to match.

Mapping of Input Attributes

input_mapping Required Object

It is about mapping objects mentioning the data attribute the product mapping model will use.

Output attributes mapping

output_mapping Optional Object

It is about mapping objects mentioning data attributes you want in the output datasets with their names.

Precision/Recall Tradeoff

precision_recall Optional Enum

Mention your priority about recall or precision.

Options:

recall string, precision string
{
  "dataset1_ids": [
    "GYVCj4hEeqnX3dJyu"
  ],
  "dataset2_ids": [
    "OmzHV4VEByO4KohMF"
  ],
  "input_mapping": {
    "eshop1": {
      "id": "url",
      "name": "name",
      "price": "price",
      "short_description": "shortDescription",
      "long_description": "longDescription",
      "specification": "specification",
      "code": [
        "sku",
        "productModel"
      ]
    },
    "eshop2": {
      "id": "url",
      "name": "name",
      "price": "price",
      "short_description": "shortDescription",
      "long_description": "longDescription",
      "specification": "specification",
      "code": [
        "sku",
        "productModel"
      ]
    }
  },
  "precision_recall": "precision"
}