YouTube is making people go viral and actually YouTube itself is going viral. It is the second-largest search engine after Google.
Started in the year 2005 the platform now has over 2 billion monthly active users. YouTube has helped content creators to get exposure and also earn some revenue out of it.
In this article, we will be using the YouTube API with the help of Python programming. By doing so we will be able to extract and scrap data from YouTube which could be used in multiple projects.
Don’t worry if you are confused about the terms we are using for now. They will all be explained in this article as we will move through. So, let us get started.
What is an API?
API is short for Application Programming Interface and it is a software interface that allows programmers to have an efficient way for client-server communication.
Developers often use APIs to build client-server applications. They may have to provide some data and then the API returns the specific information from the backend. Not only the information but multiple operations could be accompanied by APIs.
Here we will be using YouTube API which is provided officially by YouTube itself. It allows developers to retrieve various attributes related to the provided information.
Suppose you wanted to create an application where you will show the most liked video of the entered channel by the user. You could take the help of the API to retrieve that information in no time.
There are various things that an API could extract and depends on the platform and owner. Let us now see what all things we could extract and scrap with YouTube API.
Things You Could Extract From YouTube API?
There is plenty of information and data that you could extract from this YouTube API using Python. We are mentioning some of the important attributes below:
Channel’s Statistics: Return important statistical information about the channels specified.
No. of Videos: Get the total number of Videos uploaded by that YouTube channel.
Total Watch Time: You can get the Total Watch time of any specified channel in Minutes.
Total No. of Subscribers: As the name suggests, it would fetch you the Number of Subscribers with YouTube API.
Snippet: It lets you fetch multiple things from Channel’s data like Description, Title, etc.
Logo: You can get the Logo used by that channel in the same size as it is used.
Content Details: Lets you extract information related to the video including Like count, Dislike count, etc.
There are a lot more things that YouTube API lets you extract from its database. Now we will be looking at some Python code samples that we used to extract important information.
But before that, we will be setting up our machine and its environment. We have used Jupyter Notebook for running up these codes but you can use any other Python IDE.
Working on YouTube API with Python
1. Installation of Google API Client
Google API Client will be used to call the Build Method so we will need to install it first. We have provided three commands below for different platforms used.
For Windows:
pip install google-api-python-client
For Ubuntu
sudo pip install google-api-python-client
For Anaconda
conda install google-api-python-client
2. Importing Libraries
We will need Libraries to work upon our YouTube API extraction. Import them using the code below:
3. Creating Object
We will be creating an Object to access YouTube data. For creating an object you will need an api_key which you can get from here: Get API Key
After getting the API Key use the following code to create an object:
youTubeApiKey=your_youTubeApiKey youtube=build('youtube','v3',developerKey=youTubeApiKey) channelId='UCr2dD3s19bdcw4qjuUTQKiQ'
Here we have used the channel ID for our YouTube channel Rajni Sharma Maths Classes. You can use any channel to extract the data.
For getting the Channel ID just go the any YouTube channel and check the URL. You will find the Channel ID:
Example: https://www.youtube.com/channel/UCr2dD3s19bdcw4qjuUTQKiQ
For the above YouTube URL, the Channel ID is UCr2dD3s19bdcw4qjuUTQKiQÂ
4. Getting Statistics from YouTube API
The statistics will include YouTube Views & Subscribers. We can get channel statistics with the following code:
statdata=youtube.channels().list(part='statistics',id=channelId).execute() stats=statdata['items'][0]['statistics'] stats
This will return a dictionary of items which we will need to extract one by one as shown below.
Output Screen:
Â
Statistics With YouTube API |
a) Total Number of Videos
videoCount=stats['videoCount'] videoCount
b) Total Watch Time
viewCount=stats['viewCount'] viewCount
c) Total Number of Subscribers
suscriberCount=stats['subscriberCount'] suscriberCount
Output Screen:
5. Getting Snippet
Just like the stats, Snippet also contains various important information. We will first create it in the form of a dictionary using YouTube API with Python. Then We will extract all the information.
snippetdata=youtube.channels().list(part='snippet',id=channelId).execute() snippetdataÂ
a) Title of YouTube Channel
title=snippetdata['items'][0]['snippet']['title'] title
The title is the name of the YouTube channel that you have used while giving the Channel ID.
b) YouTube Channel’s Description
description=snippetdata['items'][0]['snippet']['description'] description
The description includes the information that the channel owner has provided.
c) YouTube Channel’s Logo
logo=snippetdata['items'][0]['snippet']['thumbnails']['default']['url'] logo
The logo used by the Channel owner will be fetched through the above code. It will just give you a link to the actual logo image.
Output Screen:
6. Getting Content Details
This is the most interesting and one of our favorite parts of using the YouTube API. With the help of this option, we can scrap information about all the videos of that specific channel.
We have created a Project in the past which showcased Most liked videos, Most Disliked Videos, Most Commented Videos, etc. YouTube API was used for Extracting the Data and Python language was used to code.
Step 1 – Getting All the Video Details
contentdata=youtube.channels().list(id=channelId,part='contentDetails').execute() playlist_id = contentdata['items'][0]['contentDetails']['relatedPlaylists']['uploads'] videos = [ ] next_page_token = None while 1: Â Â Â Â res = youtube.playlistItems().list(playlistId=playlist_id, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â part='snippet', Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â maxResults=50, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â pageToken=next_page_token).execute() Â Â Â videos += res['items'] Â Â Â next_page_token = res.get('nextPageToken') Â Â Â if next_page_token is None: Â Â Â Â Â Â Â break print(videos)
Output Screen:
Step -2: Getting Video ID for each Video:
video_ids = list(map(lambda x:x['snippet']['resourceId']['videoId'], videos)) video_ids
Step 3: Getting Statistics for Each Video
stats = []for i in range(0, len(video_ids), 40): res = (youtube).videos().list(id=','.join(video_ids[i:i+40]),part='statistics').execute() stats += res['items'] print(stats)
Step 4: Collecting All the Information in a List:
title=[ ] liked=[ ] disliked=[ ] views=[ ] url=[ ] comment=[ ] for i in range(len(videos)): Â Â Â Â Â title.append((videos[i])['snippet']['title']) Â Â Â url.append("https://www.youtube.com/watch?v="+(all_videos[i])['snippet']['resourceId']['videoId']) Â Â Â liked.append(int((stats[i])['statistics']['likeCount'])) Â Â Â disliked.append(int((stats[i])['statistics']['dislikeCount'])) Â Â views.append(int((stats[i])['statistics']['viewCount'])) Â Â comment.append(int((stats[i])['statistics']['commentCount']))
Output Screen:
Â
Step 5: Creating a Dataframe for the Collected Data
This is not a necessary step but is done to organize data in a better way. You will need to install Pandas library using the code given below(Windows)
Now use the following code to create a Datafram with Python pandas library:
import pandas as pd data={'title':title,'url':url,'liked':liked,'disliked':disliked,'views':views,'comment':comment} df=pd.DataFrame(data) df
Output Screen:
So here we have created Data to organize that information in a better way. This marks the end of this article on Extracting YouTube Data with Python Using YouTube API.
Here is the link of GitHub for Code
Hope you liked the article do comment on your views on it or if you have got any doubt.
Im working with the API and ive came to an issue, when a channel have a name as a channel ID (example: user/irene9894) it shows back a “Key Error”, i dont understand why is this happening. It can be solved¿
—————————————————————————
NameError Traceback (most recent call last)
in
7 for i in range(len(videos)):
8 title.append((videos[i])[‘snippet’][‘title’])
—-> 9 url.append(“https://www.youtube.com/watch?v=”+(all_videos[i])[‘snippet’][‘resourceId’][‘videoId’])
10 liked.append(int((stats[i])[‘statistics’][‘likeCount’]))
11 disliked.append(int((stats[i])[‘statistics’][‘dislikeCount’]))
NameError: name ‘all_videos’ is not defined
Why?
it means the variables “all_videos” is not defined. You just have to replace it with the good one. If you copy/paste the code it’s “videos”.
#Get statistics for each video
stats = []
for i in range(0, len(video_ids), 40):
res = (youtube).videos().list(
id=’,’.join(video_ids[i:i+40]),
part=’statistics’
).execute()
stats+=res[‘items’]
title, views, comments = [], [], []
for i in range(len(videos)):
title.append((videos[i])[‘snippet’][‘title’])
views.append(int((stats[i])[‘statistics’][‘viewCount’]))
comments.append(int((stats[i])[‘statistics’][‘commentCount’]))
–I’m getting a key error for ‘viewCount’, any idea why?
playlist_id = contentdata[‘items’][0][‘contentDetails’][‘relatedPlaylists’][‘uploads’]
videos = []
next_page_token = None
while 1:
res = youtube.playlistItems().list(
playlistId=playlist_id,
part=’snippet’,
maxResults=50,
pageToken=next_page_token
).execute()
videos += res[‘items’]
next_page_token = res.get(‘nextPageToken’)
if next_page_token is None:
break
print(videos)
#Get video ID for each video
video_ids = list(map(lambda x:x[‘snippet’][‘resourceId’][‘videoId’], videos))
#Get statistics for each video
stats = []
for i in range(0, len(video_ids), 40):
res = youtube.videos().list(
id=’,’.join(video_ids[i:i+40]),
part=’statistics’
).execute()
stats+=res[‘items’]
title, views, comments = [], [], []
for i in range(len(videos)):
title.append((videos[i])[‘snippet’][‘title’])
views.append(int((stats[i])[‘statistics’][‘viewCount’]))
comments.append(int((stats[i])[‘statistics’][‘commentCount’]))
–Getting a key error for viewCount, any idea why?
I need to scrap data only for You tube video a particular video. I have that video link with me how to scrap data only for one video not for whole channel