RealdataAPI Store - Browse tools published by our community and use them for your projects right away
logo

Reddit data Scraper - Scrape Reddit Data

RealdataAPI / reddit-scraper

Scrape Reddit Data such as subreddits, categories, comments, likes, and user profiles, using Reddit Data Scraper and boost your data scraping activity on Reddit. Use the scraped data in multiple business requirements, including data projects, business reports, market research, etc. The Reddit Data Scraper is accessible in Canada, France, Australia, Germany, the USA, the UK, Spain, UAE, Etc.

What does Reddit Web Data Scraper do?

An unofficial Reddit API collects unlimited data from Reddit without authentication. It allows you to extract comments and posts together along with some information about the user without login. We have developed it with Real Data API SDK; you can use it locally or on our platform.

Reddit Scraper permits you to:

  • Scrape top posts from subreddits with community details, including member count, moderator username, category, and URL.
  • Scrape the famous subreddits and leaderboards.
  • Get the Reddit timestamp, usernames, comments, points, posts, and comments URL.
  • Sort extracted data by categories of relevance like Top, Hot, New, etc.
  • Scrape the latest posts, comments, and respective user details.
  • Scrape Reddit data using specific keywords or URLs.

Do You Need Only a Few Results From Reddit?

Try our dedicated free Reddit Scraper if you want to extract Reddit data quickly on a smaller scale. Only enter keywords or Reddit URLs and tap on the scrape option. Remember that free Reddit Scraper can scrape up to 10 comments, 10 posts, 2 leaderboard items, and 2 subreddits for you.

What is the Cost of Using Reddit Scraper?

Reddit Scraper on the Real Data API platform gives you one thousand results for 4 USD platform credits. You can cover it with our free 5 USD platform credit in our monthly free plan.

However, if you want to scrape more Reddit data, you must try our monthly personal plan of 49 USD to get over ten thousand results in a month.

How to Scrape Data From Reddit?

You don't need any coding knowledge or skill to use Reddit Data Scraper API. If you don't know where to begin, follow the below stepwise video tutorial. You can also use this tutorial for Free Reddit Scraper.

How to Use Scraped Data From Reddit?

  • Research your favorite topics and opinions from a wide range of audiences.
  • Monitor discussions about your products and brand across relevant subreddits.
  • Track debates over subjects with high stakes like new technologies, political general news, finance, and more.
  • Track your business mentions or your favorite topic automatically.
  • Explore the latest trends, PR opportunities, and attitudes.
  • Search and scrape Reddit comments to begin and support the sentiment analysis.

Input Parameters

There are two methods to scrape Reddit if you run Reddit Scraper on the Real Data API platform.

  • Using the Start URLs input field -It will collect all the details from any Reddit URL to collect user, post, or community data.
  • Or using the Search Term input field - It will crawl all the Reddit data, including posts, communities, and users for particular search keywords.

How to scrape Reddit data by URLs

Almost any link from Reddit will return a dataset. The scraper will display a message if the URL is not supported before scraping the page.

Input Examples:

These are a few input examples of Reddit URLs that you can scrape.

Note: The scraper will only scrape posts if you use the search link as a startURL parameter. Use the specific URL or search field for Reddit user search and community search.

Scraping Reddit using Search Term

Search Types: It denotes which part of Reddit you are scraping - users, communities, or posts.

Search Term: It is the keyword you want to search on the Reddit search engine. You can add multiple fields or keep only one. If you are using startUrls, don't try this.

Sort Search: It will sort Search outputs by Top, Hot, most comment counts, or Relevance.

Filter by Time or Date: It will categorize the search by the last month, day, week, hour, or year. You can use it only while scraping posts.

To check the entire parameter list, how to set default values, and actual default values, go to the Input Schema tab.

Input Example:

It is an input example of how the scraper will display the input field if you scrape all Reddit users and communities with the keyword parrot. You will see the sorted output by the latest first.

{
  "maxItems": 10,
  "maxPostCount": 10,
  "maxComments": 10,
  "maxCommunitiesAndUsers": 10,
  "maxLeaderBoardItems": 10,
  "scrollTimeout": 40,
  "proxy": {
    "useRealdataAPIProxy": true
  },
  "debugMode": false,
  "searches": ["parrots"],
  "type": "communities_and_users",
  "sort": "new",
  "time": "all"
}

Results

The scraper will store the output in the dataset. The dataset contains each comment, community, list, or user. Once the Reddit API finishes the run, you can export the scraper Reddit data on your device or export it to any web application in multiple usable formats. Check out the below output examples for various input examples.

Example Reddit Post

{
  "id": "ss5c25",
  "title": "Weekly Questions Thread / Open Discussion",
  "description": "For any questions regarding dough, sauce, baking methods, tools, and more, comment below.You can also post any art, tattoos, comics, etc here. Keep it SFW, though.As always, our wiki has a few sauce recipes and recipes for dough.Feel free to check out threads from weeks ago.This post comes out every Monday and is sorted by 'new'.",
  "numberOfVotes": "4",
  "createdAt": "3 days ago",
  "scrapedAt": "2022-01-09T22:52:48.489Z",
  "username": "u/AutoModerator",
  "numberOfComments": "19",
  "mediaElements": [],
  "tag": "HELP",
  "dataType": "post"
}

Example Reddit Comment

{
  "url": "https://www.reddit.com/r/Pizza/comments/sud2hm/tomato_pie_from_sallys_apizza_stamford_ct/t1_hx9k9it",
  "username": "Acct-404",
  "createdAt": "9 h ago",
  "scrapedAt": "2022-03-09T12:52:48.547Z",
  "description": "Raises handUhhhh can I get some cheese on my pizza please?",
  "numberOfVotes": "3",
  "postUrl": "https://www.reddit.com/r/Pizza/comments/sud2hm/tomato_pie_from_sallys_apizza_stamford_ct/",
  "postId": "sud2hm",
  "dataType": "comment"
}

Example Reddit Community

{
  "title": "Pizza",
  "alternativeTitle": "r/Pizza",
  "createdAt": "Created Aug 26, 2008",
  "scrapedAt": "2022-03-09T12:54:42.721Z",
  "members": 366000,
  "moderatos": [
    "6745408",
    "AutoModerator",
    "BotTerminator",
    "DuplicateDestroyer"
  ],
  "url": "https://www.reddit.com/r/pizza/",
  "dataType": "community",
  "categories": ["hot", "new", "top", "rising"]
}

Notes for Developers

Limiting outputs with maxItems

You can set up the maximum ist count you want to scrape the user or inside the community if you need to restrict the search scope. Further, using the parameters below, you can restrict the comment count for every post and community count with the leaderboard numbers.

{
  "maxPostCount": 50,
  "maxComments": 10,
  "maxCommunitiesAndUsers": 5,
  "maxLeaderBoardsItems": 5
}

If you want to prevent a long actor run, you can set max items. Once it reaches the result count you have asked to scrape, it will stop the scraper. Hence you should take care not to trim your outputs.

Visit the Input schema tab to check the entire list of methods to limit Reddit Scraper with maxLeaderBoardItems, maxComments, maxItems, maxCommunitiesAndUsers, and maxPostCount.

Extend Output Function

You can use this scraper function to update the output results of this scraper. You can select the data type you want to scrape from Reddit. The resulting output will merge with the output from this function.

To achieve three different things, you can return the below fields.

  • Remove a field: return the existing field with an undefined value.
  • Add a new field - return the field with an object without the resulting output.
  • Change a field - use a new value to return the existing field.
async () => {
  return {
    pageTitle: document.querySelector("title").innerText,
  };
};

The below example will add the page title to the final object.

{
  "title": "Pizza",
  "alternativeTitle": "r/Pizza",
  "createdAt": "Created Aug 26, 2008",
  "scrapedAt": "2022-03-08T21:57:25.832Z",
  "members": 366000,
  "moderators": [
    "6745408",
    "AutoModerator",
    "BotTerminator",
    "DuplicateDestroyer"
  ],
  "url": "https://www.reddit.com/r/pizza/",
  "categories": ["hot", "new", "top", "rising"],
  "dataType": "community",
  "pageTitle": "homemade chicken cheese masala pasta"
}

Industries

Check out how industries use Reddit Scraper worldwide.

saas-btn.webp

E-commerce & Retail

You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN > in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.

import { RealdataAPIClient } from 'RealdataAPI-Client';

// Initialize the RealdataAPIClient with API token
const client = new RealdataAPIClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/"
        }
    ],
    "maxItems": 10,
    "maxPostCount": 10,
    "maxComments": 10,
    "maxCommunitiesAndUsers": 2,
    "maxLeaderBoardItems": 2,
    "scrollTimeout": 40,
    "proxy": {
        "useRealdataAPIProxy": true
    }
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("trudax/reddit-scraper").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
from RealdataAPI_client import RealdataAPIClient

# Initialize the RealdataAPIClient with your API token
client = RealdataAPIClient("<YOUR_API_TOKEN>")

# Prepare the actor input
run_input = {
    "startUrls": [{ "url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/" }],
    "maxItems": 10,
    "maxPostCount": 10,
    "maxComments": 10,
    "maxCommunitiesAndUsers": 2,
    "maxLeaderBoardItems": 2,
    "scrollTimeout": 40,
    "proxy": { "useRealdataAPIProxy": True },
}

# Run the actor and wait for it to finish
run = client.actor("trudax/reddit-scraper").call(run_input=run_input)

# Fetch and print actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare actor input
cat > input.json <<'EOF'
{
  "startUrls": [
    {
      "url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/"
    }
  ],
  "maxItems": 10,
  "maxPostCount": 10,
  "maxComments": 10,
  "maxCommunitiesAndUsers": 2,
  "maxLeaderBoardItems": 2,
  "scrollTimeout": 40,
  "proxy": {
    "useRealdataAPIProxy": true
  }
}
EOF

# Run the actor
curl "https://api.RealdataAPI.com/v2/acts/trudax~reddit-scraper/runs?token=$API_TOKEN" /
  -X POST /
  -d @input.json /
  -H 'Content-Type: application/json'

Start URLs

startUrls Optional Array

If you have page URLs already that you want to scrape, you can use them here. To use the below search field, remove each starting URL.

Search Term

searches Optional Array

You can share a search term here to search on the Reddit search engine.

Search for Posts

searchPosts Optional Boolean

Using the provided search, it will search to see posts.

Reddit Comment Search

searchComments Optional Boolean

Reddit will search comments using the given search.

Search for communities

searchCommunities Optional Boolean

It will explore communities using the submitted search.

Reddit User Search

searchUsers Optional Boolean

It will search Reddit users using the available search.

Sort Search

sort Optional String

Sort the search by comments, top, relevance, hot, or new.

Options:

"hot","relevance","new","comment","top", etc.

Filter Posts by Date

time Optional String

Categorize posts by last year, month, day, week, or hour.

Options:

"hour","year","all","week","day","month".

Maximum Item Count to Save

maxItems Optional Integer

It will save the maximum item count in the datasets. If you are scraping users and communities, remember that the Scraper will save every category inside the dataset as a different item.

Scraped Post Limits in the Single Page

maxPostCount Optional Integer

The maximum post count that the Scraper will store for every post, community, page, or user link.

Scraped Comment Limits in the Single Page

maxComments Optional Integer

The maximum comments count that the scraper will scrape from every comment page. You can set it to 0 if you are not planning to extract comments.

Limit of User and Community Pages

maxCommunitiesAndUsers Optional Integer

It will scrape the maximum community and user page count if your start URL or search is a user or community type.

Limit Leaderboard Items

maxLeaderBoardItems Optional Integer

It will scrape the limit of leaderboard page communities.

Extended Output Function

extendOutputFunction Optional String

You can write custom JavaScript code to scrape the custom data from the Reddit page.

Page Scroll Timeout

scrollTimeout Optional Integer

Set the second-based timeout to stop the page from scrolling down and exploring new items.

Proxy Configuration

proxy Required Object

Choose a Real Data API proxy server or use your proxy to support the Scraper.

Debug Mode

debugMode Optional Boolean

See detailed logs by activating debug mode.

{
  "startUrls": [
    {
      "url": "https://www.reddit.com/r/pasta/comments/vwi6jx/pasta_peperoni_and_ricotta_cheese_how_to_make/"
    }
  ],
  "searchPosts": true,
  "searchComments": false,
  "searchCommunities": false,
  "searchUsers": false,
  "maxItems": 10,
  "maxPostCount": 10,
  "maxComments": 10,
  "maxCommunitiesAndUsers": 2,
  "maxLeaderBoardItems": 2,
  "scrollTimeout": 40,
  "proxy": {
    "useRealdataAPIProxy": true
  },
  "debugMode": false
}