YouTube Video Data Scraper - YouTube Video Data Collection
RealdataAPI / youTube-video-data-scraper
Using YouTube Video Data Scraper, scrape and store the channel name, number of subscribers, video views, likes, comments, etc. It is a Replacement for YouTube API without any quota or limit. Use it inAustralia, Canada, Germany, France, Singapore, USA, UK, UAE, and India and other countries.
What is YouTube Video Data Scraper and How Does it Work?
It is a simple and easy-to-use YouTube video data scraping tool that allows you to cross the limits of the official data API of YouTube. It crawls YouTube to extract video information from the platform without any limits in the form of units of quotas. It gives unlimited data on:
YouTube search result list depending on preferred search queries.
YouTube channel details, including videos, descriptions, subscriber counts, and more.
It gives information for individual videos from the platform, including the amount likes on videos, release date, view counts, duration, comments, URL, description, and more.
How to Scrape Video Data from YouTube?
The YouTube Video metadata scraper lets you extract YouTube data using video URLs, search terms, channels, or search result pages as input parameters. If you fill in all the input fields, the scraper will prioritize the URL input.
Why Scrape YouTube Video Data?
Track the market: observe the content position in the search appearances and brand mentions, and get insights into competitor actions.
Discover illegal or harmful content and comments.
Filter your search outputs using advanced criteria.
Discover the latest trends and opinions by user comments and content creators.
Collect video subtitles to increase accessibility or offline reading of the video.
Compile product and service-based information from relevant videos to automate purchasing decisions.
Use it as a YouTube video analytics tool.
Can I Scrape YouTube Video Data Legally?
Following personal and copyright data regulatory guidelines, you can scrape YouTube Legally. Our YouTube scraper deals with privacy consent dialogs and cookies on your behalf, so remember that you may get some personal data in your Video data scraper output.
GDPR and other regulations worldwide protect private data. It would help if you only practiced scraping private data for a legit purpose.
If you need to clarify the legitimacy of your data scraping reason, please seek advice from your advocate before starting YouTube video data collection.
What is the cost of using YouTube Video Data Scraper?
We provide 5 USD monthly platform credit in our free plan. That can help you scrape around two thousand YouTube items. Visit our pricing page to extract YouTube video data at scale.
Do I Need to Use Proxy Server to Scrape YouTube Data?
Like other social media data scrapers on our platform, using a proxy server to scrape the required data smoothly using YouTube Video Data Extractor is essential. You can set up your proxy or try our default proxy. However, you can't use any data center proxy server to run this YouTube video scraping tool.
Input Parameters of YouTube Video Data Scraper
You can provide JSON input or use a user-friendly interface in your console account. YouTube Video Scraper identifies the following input fields:
startUrls - put the URL of the YouTube video to extract channels, videos, or search result pages.
searchKeywords - you can use YouTube search terms instead of a link.
maxResults - you can set the required video count you want to scrape from YouTube from each channel or search term.
maxComments - you can restrict maximum video comments from specific YouTube videos you want to extract.
subtitlesLanguage - you can only export the subtitles using the selected language.
downloadSubtitles - Scrape auto-generated or user-created video captions and transform them to .srt data format.
preferAutoGeneratedSubtitles - prioritize auto-generated video subtitles that convert speech to text over user-created subtitles.
proxyConfiguration - you can set up proxy server settings.
saveSubsToKVS - store the scraped video subtitles in a key-value store on our platform.
verboseLog - switch on the verbose logging to track scraper executions accurately for more comprehensive data.
Visit the input tab of the scraper to learn more about the input parameters of the YouTube video analysis tool in detail.
For different input types, here are a few JSON examples.
Data scraping from YouTube Videos by URL
Input a search result page, video link, or YouTube channel:
Feed search keywords you generally search on YouTube to watch the required video:
{"downloadSubtitles":false,"maxResults":10,"preferAutoGeneratedSubtitles":false,"proxyConfiguration":{"useApifyProxy":true},"saveSubsToKVS":false,"searchKeywords":"terminator dark fate trailer","simplifiedInformation":false,"verboseLog":false}
Output Sample of YouTube Scraper
After completing the scraping process successfully, you can save and export the data in multiple formats, including RSS, HTML, XML, CSV, or JSON. Here is an example output in JSON format.
{"title":"Terminator: Dark Fate - Official Trailer (2019) - Paramount Pictures","id":"oxy8udgWRmo","url":"https://www.youtube.com/watch?v=oxy8udgWRmo","viewCount":19826925,"date":"2019-08-29T00:00:00+00:00","likes":144263,"dislikes":null,"location":"DOUBLE DOSE CAFÉ","channelName":"Paramount Pictures","channelUrl":"https://www.youtube.com/c/paramountpictures","numberOfSubscribers":2680000,"duration":"2:34","commentsCount":25236,"details":"<span dir=\"auto\" class=\"style-sco..."}
Important Notes to Customize the YouTube Data Scraper
Extend Output Function
It allows you to eliminate results and introduce different output properties by changing the output shape or using the page variable together:
async({ item })=>{// remove information from the item item.details=undefined;// or delete item.details;return item;}
async({ item, page })=>{// add more info, in this case, the shortLink for the videoconst shortLink =await page.evaluate(()=>{const link =document.querySelector('link[rel="shortlinkUrl"]');if(link){return link.href;}});return{...item, shortLink,}}
async({ item })=>{// omit item, just return nullreturnnull;}
Extend Scraper Function
It allows you to add functionality to the available baseline scraper behavior. For instance, you can enqueue relevant YouTube videos without recursively adding them.
async({ page, request, requestQueue, customData,Apify})=>{if(request.userData.label==='DETAIL'&&!request.userData.isRelated){await page.waitForSelector('ytd-watch-next-secondary-results-renderer');const related =await page.evaluate(()=>{return[...document.querySelectorAll('ytd-watch-next-secondary-results-renderer a[href*="watch?v="]')].map(a=> a.href);});for(const url of related){await requestQueue.addRequest({ url,userData:{label:'DETAIL',isRelated:true,},});}}}
NB: The above function will repeatedly try the same video link if there is any exception.
Do you need to scrape other social media and video data?
We have dedicated and general scrapers to help you scrape video and social media data from various platforms. You can visit the store page and filter the video or social media category to use the relevant scraper.
YouTube Video Scraper with Integrations
Lastly, you can connect the YouTube video data scraper with any web application or cloud service using integrations on our platform. Integrating the scraper with Slack, Google Drive, GitHub, Zapier Airbyte, Google Sheets Make, and other platforms is possible. You can also use webhooks to conduct an action for event occurrence, like getting an alert for the successful execution of the YouTube video data crawler.
Using YouTube Video Scraper with the Real Data API Actor
Our actor gives programmatic platform access. We have organized the actor around RESTful HTTP ends to allow you to schedule, run, and manage our APIs. It also allows you to track performance, create and update scraper versions, retrieve outputs, access datasets, and more.
You can use our client NPM and client PyPl packages to access the actor using Node.js and Python, respectively. Visit the API tab of the scraper to study sample codes.
Share Your Feedback
Our team constantly works to improve scraper performances. If you find any bugs or have technical suggestions or feedback, you can create an issue by visiting the issue tab from your console account.
You should have a Real Data API account to execute the program examples. Replace < YOUR_API_TOKEN > in the program using the token of your actor. Read about the live APIs with Real Data API docs for more explanation.
import{ ApifyClient }from'apify-client';// Initialize the ApifyClient with API tokenconst client =newApifyClient({token:'<YOUR_API_TOKEN>',});// Prepare actor inputconst input ={"searchKeywords":"Crawlee","maxResults":10,"maxResultsShorts":10,"maxResultStreams":10,"extendOutputFunction":async({ data, item, page, request, customData })=>{return item;},"extendScraperFunction":async({ page, request, requestQueue, customData, Apify, extendOutputFunction })=>{},"customData":{},"handlePageTimeoutSecs":3600,"proxyConfiguration":{"useApifyProxy":true,"apifyProxyCountry":"US"}};(async()=>{// Run the actor and wait for it to finishconst run =await client.actor("bernardo/youtube-scraper").call(input);// Fetch and print actor results from the run's dataset (if any)
console.log('Results from dataset');const{ items }=await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item)=>{
console.dir(item);});})();
from apify_client import ApifyClient
# Initialize the ApifyClient with your API token
client = ApifyClient("<YOUR_API_TOKEN>")# Prepare the actor input
run_input ={"searchKeywords":"Crawlee","maxResults":10,"maxResultsShorts":10,"maxResultStreams":10,"extendOutputFunction":"""async ({ data, item, page, request, customData }) => {
return item;
}""","extendScraperFunction":"""async ({ page, request, requestQueue, customData, Apify, extendOutputFunction }) => {
}""","customData":{},"handlePageTimeoutSecs":3600,"proxyConfiguration":{"useApifyProxy":True,"apifyProxyCountry":"US",},}# Run the actor and wait for it to finish
run = client.actor("bernardo/youtube-scraper").call(run_input=run_input)# Fetch and print actor results from the run's dataset (if there are any)for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Place the search query similar to that you search in YouTube's search bar.
Maximum Search Results
maxResultsOptional Integer
Limit the video count you want to extract. To get total outputs, you can keep the input field blank.
Direct URLs
startUrlsOptional Array
Place the YouTube video URL, search result page, or channel. You can upload the Google Sheet or a CSV file with the URL list.
Important Note: if you use this input field, the scraper will ignore the search query input.
Only Collect Basic Channel Information
simplifiedInformationOptional Boolean
If you set it to true, the tool will only collect the available data from the channel page. And the data for separate videos will have limitations.
Save Short Videos
saveShortsOptional Boolean
If correct, the scraper will store short videos from the selected channel.
Maximum Shorts Videos
maxResultsShortsOptional Integer
Set the limit of the short video count you want to scrape from the selected YouTube channel. To scrape unlimited short videos, keep the field empty.
Save Streams
saveStreamssaveStreams
If correct, the scraper will store the live-stream videos from the selected YouTube channel.
Maximum Streams
maxResultStreamsOptional Integer
Set the maximum limit to the stream count of videos you want to extract from the selected channel. Keep the input field blank to get limitless outputs.
Maximum Comments
maxCommentsOptional Integer
Set the maximum limit to the video comments you want to extract from the selected video. Keep the input field blank or feed zero to the input field if you don't want to scrape any comments.
Download Subtitles
downloadSubtitlesOptional Boolean
If you set it to true, the tool will export video subtitles and transform them to .srt data format.
Store Video Subtitles to Key-Value Store
saveSubsToKVSOptional Boolean
If set to true, the crawler will store the subtitles of the downloaded video in the key-value store.
Important Note: you should turn on download video subtitles to use this option.
Subtitle Language
subtitlesLanguageOptional Enum
It is about video subtitle language downloading.
Important Note: You should turn on the download subtitle option to use it.
Options:
en string
de string
it string
fr string
pt string
ko string
ja string
ru string
nl string
es string
Choose Autogenerated Subtitles
preferAutoGeneratedSubtitlesOptional Boolean
If it is true, the scraper will choose auto-generated video subtitles. Remember that you must select subtitle language to use this option.
Extend Output Function
extendOutputFunctionOptional String
Eliminate or add properties on result objects or remove the zero returning a result.
Extend Scraper Function
extendScraperFunctionOptional String
It is an advanced function that permits you to expand the functionality of the default scraper. It allows you to perform page actions manually.
Custom Data
customDataOptional Object
Any YouTube data you wish to add to the Extent scraper or output function
Handle Page Timeout
handlePageTimeoutSecsOptional Integer
You can set up the handlePageTimeout in seconds.
Proxy Configuration
proxyConfigurationRequired Object
Use custom proxies or try relevant proxies from our platform.
Disclaimer : RealData API functions solely as an independent data infrastructure and technology solutions provider. We build customized automation workflows designed to collect publicly accessible web data based exclusively on client instructions. RealData API neither owns proprietary datasets nor engages in the sale or redistribution of extracted information. Our operations are limited strictly to lawful public web data processing and never involve unauthorized access to restricted systems or private networks. Any company names, trademarks, logos, or brand references displayed on this website are used purely for demonstrative and illustrative purposes to showcase our technical capabilities and do not imply endorsement, partnership, or affiliation. Use of our platform and services remains subject to our Terms of Service.