RealdataAPI Store - Browse tools published by our community and use them for your projects right away
logo

Amazon Product Scraper - Web Scraping Amazon Product Data

RealdataAPI / amazon-product-scraper

You can use the Amazon Product Data Scraper to obtain Amazon product information such as prices, reviews, and ASINs without using Amazon API. This service is available in various countries, including Australia, Canada, Germany, France, Singapore, USA, UK, UAE, India, and others. It is considered the best Amazon Product data scraping service provider.

What is Amazon Data Scraper, and what is its working process?

Amazon data extractor is a data scraping actor that allows you to scrape Amazon product data from product URLs or subcategory URLs.

https://www.amazon.com/s?i=specialty-aps&bbn=16225007011&rh=n%3A16225007011%2Cn%3A1292115011

Generally, Amazon subcategory permalink includes /s after the Amazon domain. Therefore ensure to maintain your URL looks like the above example.

Therefore, add any links in the input and choose as many products as you want to collect. Then export the output schema. You can also get this information directly from the API without logging in to the Real Data API platform.

Why extract products from Amazon?

Extracting products from Amazon can help you

  • Track the performance of Amazon subcategories and categories to put them into context.
  • Improve your messaging and advertising campaigns.
  • Uncover emerging and growing brands to benchmark your product performance inside its category depending on reviews, traffic, and conversions.
  • Utilize Amazon data to stay ahead in competitive intelligence.

For more motivation, explore how data extraction is helping eCommerce analytics to transform.

Is it legal to extract Amazon product data?

You can scrape publicly available Amazon data like product prices, descriptions, or ratings. To know more, you can check out our blog.

How can I extract Product data from Amazon?

You can follow this step-by-step tutorial for the Amazon product data scraping process.

Do you want more Amazon scraping alternatives?

Check out the below Amazon scrapers.

  • Amazon ASINs Scraper
  • Amazon Reviews Scraper
  • Amazon Best Sellers Scraper
Input options

While running this scraper, you must configure what you wish to extract with the process. You can feed an input as a JSON file or in the Real Data API editor. Most input points have default values.

Go to the dedicated input option for detailed examples and descriptions of each input field.

Note the below points while using this actor to scrape Amazon products.

You may not get price information if no sellers are in a particular delivery country. Setting up a specific Real Data API proxy country in the proxy setting must help you. You'll still find the difference in product prices based on the United States. Amazon also displays several offers for your proxy geolocation.

Sample result of Amazon Data Scraper
{
    "title": "SanDisk 1TB Extreme microSDXC UHS-I Memory Card with Adapter - Up to 190MB/s, C10, U3, V30, 4K, 5K, A2, Micro SD Card- SDSQXAV-1T00-GN6MA",
    "url": "https://www.amazon.com/dp/B09X7MPX8L",
    "asin": "B09X7MPX8L",
    "inStock": true,
    "inStockText": "Only 8 left in stock - order soon.       Only 8 left in stock - order soon.",
    "brand": "SanDisk",
    "price": {
        "value": 145.5,
        "currency": "$"
    },
    "listPrice": {
        "value": 299.99,
        "currency": "$"
    },
    "shippingPrice": null,
        "stars": 4.8,
        "starsBreakdown": {
        "5star": 0.86,
        "4star": 0.09,
        "3star": 0.02,
        "2star": 0.01,
        "1star": 0.01
    },
    "reviewsCount": 36704,
    "answeredQuestions": 151,
    "breadCrumbs": "Electronics › Computers & Accessories › Computer Accessories & Peripherals › Memory Cards › Micro SD Cards",
    "thumbnailImage": "https://m.media-amazon.com/images/I/716kSUlHouL.__AC_SX300_SY300_QL70_FMwebp_.jpg",
    "description": null,
    "features": [
    "Save time with card offload speeds of up to 190MB/s powered by SanDisk QuickFlow Technology (Up to 190MB/s read speeds, engineered with proprietary technology to reach speeds beyond UHS-I 104MB/s, requires compatible devices capable of reaching such speeds. Based on internal testing; performance may be lower depending upon host device interface, usage conditions and other factors. 1MB=1,000,000 bytes. SanDisk QuickFlow Technology is only available for 64GB, 128GB, 256GB, 400GB, 512GB, and 1TB capacities. 1GB=1,000,000,000 bytes and 1TB=1,000,000,000,000 bytes. Actual user storage less.)",
    "Pair with the SanDisk Professional PRO-READER SD and microSD to achieve maximum speeds (sold separately)",
    "Up to 130MB/s write speeds for fast shooting (Based on internal testing; performance may be lower depending upon host device interface, usage conditions and other factors. 1MB=1,000,000 bytes.)",
    "4K and 5K UHD-ready with UHS Speed Class 3 (U3) and Video Speed Class 30 (V30) (Compatible device required. Full HD (1920x1080), 4K UHD (3840 x 2160), and 5K UHD (5120 X 2880) support may vary based upon host device, file attributes and other factors. See HD page on SanDisk site. UHS Speed Class 3 (U3) designates a performance option designed to support real-time video recording with UHS-enabled host devices. Video Speed Class 30 (V30), sustained video capture rate of 30MB/s, designates a performance option designed to support real-time video recording with UHS-enabled host devices. See the SD Association’s official website.)",
    "Rated A2 for faster loading and in-app performance (A2 performance is 4000 read IOPS, 2000 write IOPS. Results may vary based on host device, app type and other factors)"
    ],
    "variantAsins": [],
    "reviewsLink": "/SanDisk-Extreme-microSDXC-Memory-Adapter/product-reviews/B09X7MPX8L?reviewerType=all_reviews",
    "delivery": "Thursday, January 26",
    "fastestDelivery": "Sunday, January 22",
    "returnPolicy": "Eligible for Return, Refund or Replacement within 30 days of receipt  Eligible for Return, Refund or Replacement within 30 days of receipt",
    "support": "Free Amazon tech support included",
    "variantAttributes": [],
    "priceVariants": null,
    "seller": {
        "name": "Direct Suppliers US",
        "id": "A210SJF12S88M5",
        "url": "/gp/help/seller/at-a-glance.html/ref=dp_merchant_link?ie=UTF8&seller=A210SJF12S88M5&asin=B09X7MPX8L&ref_=dp_merchant_link&isAmazonFulfilled=1",
        "reviewsCount": null,
        "averageRating": null
    },
    "bestsellerRanks": null,
    "locationText": "Select your address"
}
Integrations with Amazon Data Scraper

You can connect this API with almost all web applications or cloud services using Real Data API integrations. You can connect with Slack, GitHub, Zapier, Make, Google Drive, and Sheets. You can also use Webhooks to conduct event actions, like getting an alert when Amazon API completes the execution.

Executing Amazon Data Scraping Actor with Real Data API

The Real Data API actor offers you programmatic access to the Real Data API platform. The actor is organized about RESTful HTTP points to allow you to schedule, manage, and execute Real Data API actors. The actor also gives you access to all datasets, fetch outputs, track API performance, develop and update versions, etc.

To use the actor using Node.js, try the Real Data API client NPM package, and to use it using Python, try the Real Data API PyPL package.

Visit the Real Data API actor reference document for details, or open the API tab to explore program examples.

Industries

Check out how industries are using Amazon data scraper around the world.

saas-btn.webp

E-commerce & Retail

You should have a Real Data API account to execute the program examples. Replace in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.

import { RealdataAPIClient } from 'RealDataAPI-client';

// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
    token: '',
});

// Prepare actor input
const input = {
    "categoryOrProductUrls": [
        {
            "url": "https://www.amazon.com/s?i=specialty-aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A2811119011&ref=nav_em__nav_desktop_sa_intl_cell_phones_and_accessories_0_2_5_5"
        }
    ],
    "maxItems": 100,
    "proxyConfiguration": {
        "useRealDataAPIProxy": true
    }
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("junglee/amazon-crawler").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from realdataapi_client import RealdataAPIClient

# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("")

# Prepare the actor input
run_input = {
    "categoryOrProductUrls": [{ "url": "https://www.amazon.com/s?i=specialty-aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A2811119011&ref=nav_em__nav_desktop_sa_intl_cell_phones_and_accessories_0_2_5_5" }],
    "maxItems": 100,
    "proxyConfiguration": { "useRealDataAPIProxy": True },
}

# Run the actor and wait for it to finish
run = client.actor("junglee/amazon-crawler").call(run_input=run_input)

# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare actor input
cat > input.json <<'EOF'
{
  "categoryOrProductUrls": [
    {
      "url": "https://www.amazon.com/s?i=specialty-aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A2811119011&ref=nav_em__nav_desktop_sa_intl_cell_phones_and_accessories_0_2_5_5"
    }
  ],
  "maxItems": 100,
  "proxyConfiguration": {
    "useRealDataAPIProxy": true
  }
}
EOF

# Run the actor
curl "https://api.realdataapi.com/v2/acts/junglee~amazon-crawler/runs?token=$API_TOKEN" \
  -X POST \
  -d @input.json \
  -H 'Content-Type: application/json'

Place the Amazon product URLs

productUrls Required Array

Put one or more URLs of products from Amazon you wish to extract.

Max reviews

Max reviews Optional Integer

Put the maximum count of reviews to scrape. If you want to scrape all reviews, keep them blank.

Link selector

linkSelector Optional String

A CSS selector saying which links on the page (< a> elements with href attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the Pseudo-URLs and/or Glob patterns setting. If Link selector is empty, the page links are ignored. For details, see Link selector in README.

Mention personal data

includeGdprSensitive Optional Array

Personal information like name, ID, or profile pic that GDPR of European countries and other worldwide regulations protect. You must not extract personal information without legal reason.

Reviews sort

sort Optional String

Choose the criteria to scrape reviews. Here, use the default HELPFUL of Amazon.

Options:

RECENT,HELPFUL

Proxy configuration

proxyConfiguration Required Object

You can fix proxy groups from certain countries. Amazon displays products to deliver to your location based on your proxy. No need to worry if you find globally shipped products sufficient.

Extended output function

extendedOutputFunction Optional String

Enter the function that receives the JQuery handle as the argument and reflects the customized scraped data. You'll get this merged data as a default result.

{
  "categoryOrProductUrls": [
    {
      "url": "https://www.amazon.com/s?i=specialty-aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A2811119011&ref=nav_em__nav_desktop_sa_intl_cell_phones_and_accessories_0_2_5_5"
    }
  ],
  "maxItems": 100,
  "detailedInformation": false,
  "useCaptchaSolver": false,
  "proxyConfiguration": {
    "useRealDataAPIProxy": true
  }
}