RealdataAPI Store - Browse tools published by our community and use them for your projects right away
logo

Content Checker - Scrape & Monitor Website Content

RealdataAPI / content-checker

Scrape and monitor website content for changes on web pages. Automatically store after and before snapshots and get email alerts using the content checker. Use the content-checking tool in countries like USA, UK, UAE, France, Australia, Germany, Spain, Singapore, Mexico, and more.

What is the Working of the Content Checker?

The content scraper allows you to track website content for any web page. After checking the content for changes, the scraper sends an email alert with after and before screenshots. Use these screenshots and alerts to curate your watchdog for product sales, updates, prices, and competitors or monitor website content changes from selected web pages.

Technically, it scrapes the website content of the textual form using the selector and compares it with the past execution. It runs another scraper to store screenshots and sends them in the email.

Input Of the Content Checker

The content scraping tool needs a content selector, URL, and email id as the input to scrape the website content. Further, you can define a screenshot selector. Alternatively, you can choose a content selector to choose screenshots.

Check out the input tab to learn more about the detailed input description.

The Output of the Content Checker

After execution, the scraper will update the content and screenshots in the key-value store related to the scraper task.

If there is a change in content, the content checker will call another scraper to send email alerts.

Check out the below example of an email alert with changed and previous content with screenshots:

Content Checker with Integrations

You can connect the content checker with any web application or cloud service with the help of integrations available on our platform. Further, you can integrate it with Zapier, Make, Airbyte, Google Drive, Google Sheets, Slack, GitHub, and more. The content-checking tool also allows you to use webhooks to take action for the event's commencement. For example, you can receive an alert after the successful execution of the content-scraping tool.

Using Content Checker with Real Data API

Our scraper gives programmatic permission to access the platform. We have organized it around RESTful HTTP endpoints to allow you to schedule, manage and run the scrapers available on our platform. Real Data API also allows you to retrieve results, scraper performances, update and create scraper versions, access datasets, etc. Use our client NPM package and Client PyPl package to access the scraper using Node.js and Python, respectively.

Need Help Getting Expected Outputs? Develop Customized Scraper

If the content checker can't deliver what you want, you can develop a customized scraper according to your requirements. Multiple scraper templates on our platform support TypeScript, Python, and JavaScript, to begin with. Besides, according to your requirements, you can directly write the code using Crawlee, the open-source library.

If you want to avoid developing it by yourself, contact us for a customized solution for scraping.

Your Feedback on Content Checker

Our team is constantly working on the performance improvement of the scraper. Therefore, if you want to suggest anything or report any bug, please create an issue from the issue tab, or mail us about it.

Industries

Check out how industries are using Content Checker around the world.

saas-btn.webp

E-commerce & Retail

To run the code examples, you need to have an RealdataAPI account. Replace < YOUR_API_TOKEN> in the code with your API token.

import { RealdataAPIClient } from 'RealdataAPI-Client';

// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "url": "https://www.RealdataAPI.com/change-log",
    "contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
    "screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
    "sendNotificationText": "RealdataAPI found a new change!",
    "proxy": {
        "useRealdataAPIProxy": false
    },
    "navigationTimeout": 30000
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("jakubbalada/content-checker").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from RealdataAPI_client import RealdataAPIClient

# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")

# Prepare the actor input
run_input = {
    "url": "https://www.RealdataAPI.com/change-log",
    "contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
    "screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
    "sendNotificationText": "RealdataAPI found a new change!",
    "proxy": { "useRealdataAPIProxy": False },
    "navigationTimeout": 30000,
}

# Run the actor and wait for it to finish
run = client.actor("jakubbalada/content-checker").call(run_input=run_input)

# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare actor input
cat > input.json <<'EOF'
{
  "url": "https://www.RealdataAPI.com/change-log",
  "contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
  "screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
  "sendNotificationText": "RealdataAPI found a new change!",
  "proxy": {
    "useRealdataAPIProxy": false
  },
  "navigationTimeout": 30000
}
EOF

# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/jakubbalada~content-checker/runs?token=$API_TOKEN" /
  -X POST /
  -d @input.json /
  -H 'Content-Type: application/json'

URL to Check

url Required String

Choose the webpage URL you want to monitor.

Monitored Area Selector

contentSelector Required String

It is the CSS selector of the target area you want to track.

Screenshot Selector

screenshotSelector Optional String

It is the CSS selector to take screenshots.

Email Address

sendNotificationTo Required String

Enter your email ID to receive the notification.

Notification Text

sendNotificationText Optional String

It is an optional text to add to the email alert.

Error Notification

informOnError Optional Enum

If there is any error with any selector on the webpage, you will receive an email alert with screenshots.

Options:

false string, true string

Proxy Server Configuration

proxy Optional Object

Choose the relevant proxy server if the source website blocks the scraper or IP address.

Navigation Timeout

navigationTimeout Optional Integer

It is the millisecond duration the page should wait to time out.

How to Retry

retryStrategy Optional Enum

Sometimes, the webpage fails to load correctly, or the source website blocks the scraper. Besides, more than retrying the inaccurate selector may need. However, the blocked page recognition is not 100 percent accurate.

Options:

never-retry string, on-all-errors string, on-block string

Maximum Count of Retries

maxRetries Optional Integer

It is to check how often the scraper must retry the process if there are errors.

{
  "url": "https://www.RealdataAPI.com/change-log",
  "contentSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
  "screenshotSelector": "[class^=change-log__MonthBox-]:nth-of-type(1) ul",
  "sendNotificationText": "RealdataAPI found a new change!",
  "informOnError": "false",
  "proxy": {
    "useRealdataAPIProxy": false
  },
  "navigationTimeout": 30000,
  "retryStrategy": "on-block",
  "maxRetries": 5
}