RealdataAPI Store - Browse tools published by our community and use them for your projects right away
logo

Walmart Product Data Scraper - Scrape Walmart Product Data

RealdataAPI / walmart-scraper

With our Walmart Product Data Scraper, you can quickly gather important product information, such as their descriptions, images, feedback, questions, prices, and shipping details. You can customize your search by selecting your preferred country, language, and region for shipping. Our options include popular countries like Australia, Canada, Germany, France, Singapore, the USA, the UK, UAE, and India.

What is Walmart Scraper, and How does it Work?

Since Walmart doesn't have an official API, this scraper must help you scrape Walmart data using it.

The Walmart product data scraper gives you the following advantages.

  • Extract product data: you can extract attributes like seller information, brands, photos, variants, product IDs, and more with the below details.
  • Extract Walmart search results: this scraper allows you to scrape a particular Walmart search result using search terms.
  • Extract and filter any product category: you can share any Walmart product category using any filter you want to scrape.
  • Scrape the source category and collect each product in its subcategory. You can share any large category and allow API to extract its subcategories.
  • Dedicate the maximum page count you need to scrape: if you wish to extract the first five pages, you can scrape them.

Walmart Specific

Don't think much when you get a different product than the browsed one. Walmart is ordering products with a little variety for every buyer.

Updates, Bugs, Fixes, and Changelog

This Walmart Scraper is under development. You can contact us immediately if you face any issues or have any feature requests.

Upcoming Changes
  • Fetch Questions and Answers
  • Change delivery location
  • Performance upgrades
  • Fetching Walmart product reviews

Setup and Usage

Check out the below video to learn how this scraper works.

Start URLs

Here is the link to watch the output video.

Search

Here is the link to watch the output video

Input Parameters

It would help if you gave JSON input to the Walmart scraper containing page lists with the following fields.

Field Type Description
startUrls Array In this option, you must provide only product detail, category detail, or search URLs.
maxItems Integer You can restrict extracted products. It will be helpful when you explore significant subcategories on Walmart.
endPage Integer Final page count you wish to extract with the infinite default value. You can apply this to each list request.
search String You can scrape keywords from the Walmart search engine in this option.
proxy Object Proxy configuration
extendOutputFunction String This option takes the JQuery handle function as an argument and reflects data objects.
outputFilterFunction String This option takes the result item as an argument and reflects mapped data.

It would help if you used any proxy servers to use this solution. You have multiple sources to choose proxies like your own or Real Data API proxies.

Advice

Note that for protecting data API returns all the possible results. It suggests you always use outputFilterFunction.

When you wish to filter against category links, visit Walmart, apply filters over the product category, and copy-paste the URL as startUrl.

If you wish to extract only the first Walmart page of the category or search list, place the link for pages and keep endPage as 1.

With the above approach, you can also retrieve any page intervals. If you feed the 7th page of a Walmart category and decide the endPage factor as 8, you will get only the seventh and eighth page.

Function for Output Filter

The Walmart Scraper uses this function to map output information that the API scrapes from Walmart. It performs the following execution.

data = eval(outputFilterFunction)(data);

Therefore, you can retrieve attributes using this function. The below example shows how to scrape name and ID attributes.

(object) => ({
    id: object.id,
    name: object.name
})

Consumption of Compute Units

We've optimized this API to execute blazing fast and extract more possible products. Hence, it forefronts every product data request. If the source doesn't block this scraper frequently, it will scrape about 50 Walmart products in 120 seconds with 0.3 to 0.5 compute units.

Input Example for Walmart Scraper

{
    "startUrls": [
        {
            "url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
        },
        {
            "url": "https://www.walmart.com/browse/home/"
        },
        {
            "url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
        },
        {
            "url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
        }
    ],
    "search": "apples",
    "endPage": 6,
    "maxItems": 100,
    "outputFilterFunction": "(object) => ({...object})"
}

During the Execution

While executing, this scraper will display output messages sharing what is happening. Every message contains a short label mentioning which product page it scrapes.

After loading items, you must see the event message with the total and loaded item counts for every page.

If you feed the wrong input, it will fail to execute and display the reason for failure in the output.

Walmart Export

While executing, the API saves the output into datasets, with every item unique.

You can get outputs in any coding language like PHP, Node.js, or Python.

Industries

Check out how industries use Walmart Scraper worldwide.

saas-btn.webp

E-commerce & Retail

You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN > in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.

import { RealdataAPIClient } from 'RealdataAPI-Client';

// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
        },
        {
            "url": "https://www.walmart.com/browse/home/"
        },
        {
            "url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
        },
        {
            "url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
        }
    ],
    "maxItems": 50,
    "endPage": 1,
    "extendOutputFunction": ($) => {
        const result = {};
        // Uncomment to add a title to the output
        // result.title = $('title').text().trim();
    
        return result;
    },
    "outputFilterFunction": (object) => ({...object}),
    "proxy": {
        "useRealdataAPIProxy": true
    }
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("epctex/walmart-scraper").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from RealdataAPI_client import RealdataAPIClient

# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")

# Prepare the actor input
run_input = {
    "startUrls": [
        { "url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920" },
        { "url": "https://www.walmart.com/browse/home/" },
        { "url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets" },
        { "url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382" },
    ],
    "maxItems": 50,
    "endPage": 1,
    "extendOutputFunction": """($) => {
    const result = {};
    // Uncomment to add a title to the output
    // result.title = $('title').text().trim();

    return result;
}""",
    "outputFilterFunction": "(object) => ({...object})",
    "proxy": { "useRealdataAPIProxy": True },
}

# Run the actor and wait for it to finish
run = client.actor("epctex/walmart-scraper").call(run_input=run_input)

# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare actor input
cat > input.json <<'EOF'
{
  "startUrls": [
    {
      "url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
    },
    {
      "url": "https://www.walmart.com/browse/home/"
    },
    {
      "url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
    },
    {
      "url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
    }
  ],
  "maxItems": 50,
  "endPage": 1,
  "extendOutputFunction": "($) => {/n    const result = {};/n    // Uncomment to add a title to the output/n    // result.title = $('title').text().trim();/n/n    return result;/n}",
  "outputFilterFunction": "(object) => ({...object})",
  "proxy": {
    "useRealdataAPIProxy": true
  }
}
EOF

# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/epctex~walmart-scraper/runs?token=$API_TOKEN" /
  -X POST /
  -d @input.json /
  -H 'Content-Type: application/json'

Start URLs

startUrls Optional Array

Links to begin with. You should feed product detail or a category URL list

Maximum Item Count

maxItems Optional Integer

Maximum item count that you wish to extract.

Category End Page

endPage Optional Integer

The page serial number you wish to finish the execution with zero end page default value.

Search Keyword

search Optional String

Search keywords you want to explore on the source platform.

Extend Output Function

extendOutputFunction Optional String

This function will merge output with default results.

Output Filter Function

outputFilterFunction Optional String

This function helps to map scraped output results according to your choices.

Proxy Configuration

proxy Required Object

Choose proxy servers to help your crawler.

{
  "startUrls": [
    {
      "url": "https://www.walmart.com/browse/auto-tires/brake-pads/91083_1074765_9038935_4582920"
    },
    {
      "url": "https://www.walmart.com/browse/home/"
    },
    {
      "url": "https://www.walmart.com/search?grid=true&query=Mixed+Bouquets"
    },
    {
      "url": "https://www.walmart.com/ip/Mainstays-Blue-Sunflower-Mix-Bouquet/155345382"
    }
  ],
  "maxItems": 50,
  "endPage": 1,
  "extendOutputFunction": "($) => {/n    const result = {};/n    // Uncomment to add a title to the output/n    // result.title = $('title').text().trim();/n/n    return result;/n}",
  "outputFilterFunction": "(object) => ({...object})",
  "proxy": {
    "useRealdataAPIProxy": true
  }
}