Extracting YouTube Data With Python Using API


Extracting YouTube Data With Python

YouTube is making people go viral and actually YouTube itself is going viral. It is the second largest search engine after Google.

Started in year 2005 the platform now has over 2 billion monthly active users. YouTube has helped content creators to get exposure and also earn some revenue out of it.

In this article we will be using YouTube API with the help of Python programming. By doing so we will be able to extract and scrap data from YouTube which could be used in multiple projects.

Don't worry if you are confused about the terms we are using for now. They will all be explained in this article as we will move through. So, Lets get started.

What is an API?


API is Short for Application Programming Interface and it is a software interface which allows programmers to have an efficient way for client-server communication.

Developers often use APIs to build client server applications. They may have to provide some data and then the API returns the specific information from the backend. Not only the information but multiple operations could be accompanied by APIs.

Here we will be using YouTube API which is provided officially by YouTube itself. It allows developers to retrieve various attributes related to the provided information.

Suppose you wanted to create an application where you will show most liked video of entered channel by the user. You could take help of the API to retrieve that information in no time.

There are various things that an API could extract and depends on the platform and owner. Let us now see what all things we could extract and scrap with YouTube API.

Which all Things You Could Extract From YouTube API?


There are plenty of information and data that you could extract from this YouTube API using Python. We are mentioning some of the important attributes below:

Channel's Statistics: Return important statistical information about the channels specified.

No. of Videos: Get the total number of Videos uploaded by that YouTube channel.

Total Watch Time: You can get Total Watch time of any specified channel in Minutes.

Total No. of Subscribers: As the name suggests, it would fetch you Number of Subscribers with YouTube api.

Snippet: It lets you fetch multiple things from Channel's data like Description, Title,etc.

Logo: You can get the Logo used by that channel in same size as it is used.

Content Details: Lets you extract information related to the video including Like count, Dislike count, etc.

There are lot more things that YouTube API lets you extract from its database. Now we will be looking at some Python code samples that we used to extract important information.

But before that we will be setting up our machine and its environment. We have used Jupyter Notebook for running up these codes but you can use any other Python IDE.

Working on YouTube API with Python


1. Installation of Google API Client


Google API Client will be used to call the Build Method so we will need to Install it first. We have provide three commands below for different platforms used.

For Windows:

pip install google-api-python-client

For Ubuntu

sudo pip install google-api-python-client
 
For Anaconda

conda install google-api-python-client

2. Importing Libraries


We will need Libraries to work upon our YouTube API extraction. Import them using the code below:

from googleapiclient.discovery import build

3. Creating Object


We will be creating an Object to access YouTube data. For creating an object you will need an api_key which you can get from here: Get API Key

After getting the API Key use the following code to create an object:
youTubeApiKey=your_youTubeApiKey youtube=build('youtube','v3',developerKey=youTubeApiKey) channelId='UCr2dD3s19bdcw4qjuUTQKiQ'

Here we have used the channel ID for our YouTube channel Rajni Sharma Maths Classes. You can use any channel to extract the data.

For getting the Channel ID just go the any YouTube channel and check the URL. You will find the Channel ID:

Example: https://www.youtube.com/channel/UCr2dD3s19bdcw4qjuUTQKiQ

For the above YouTube URL, the Channel ID is UCr2dD3s19bdcw4qjuUTQKiQ 

4. Getting Statistics from YouTube API


The statistics will include YouTube Views & Subscribers. We can get channel statistics with the following code:

statdata=youtube.channels().list(part='statistics',id=channelId).execute()
stats=statdata['items'][0]['statistics']
stats

This will return a dictionary of items which we will need to extract one by one as shown below.

Output Screen:
 
Statistics With YouTube API


a) Total Number of Videos

videoCount=stats['videoCount']
videoCount

b) Total Watch Time

viewCount=stats['viewCount']
viewCount

c) Total Number of Subscribers
suscriberCount=stats['subscriberCount']
suscriberCount


Output Screen:

 

5. Getting Snippet


Just like the stats, Snippet also contain various important information. We will first create it in the form of dictionary using YouTube API with Python. Then We will extract all the information.

snippetdata=youtube.channels().list(part='snippet',id=channelId).execute()
snippetdata 

a) Title of YouTube Channel

title=snippetdata['items'][0]['snippet']['title']
title

Title is the name of YouTube channel that you have used while giving the Channel ID.

b) YouTube Channel's Description

description=snippetdata['items'][0]['snippet']['description']
description

Description includes the information that the channel owner has provided.

c) YouTube Channel's Logo

logo=snippetdata['items'][0]['snippet']['thumbnails']['default']['url']
logo

Logo used by Channel owner will be fetched through the above code. It will just give you a link to the actual logo image.

Output Screen:


6. Getting Content Details


This is the most interesting and one of our favorite part of using YouTube API. With the help of this option we can scrap information about all the videos of that specific channel.

We have created a Project in the past which showcased Most liked videos, Most Disliked Videos, Most Commented Videos, etc. YouTube API was used for Extracting the Data and Python language was used to code.

Step 1 - Getting All the Video Details

contentdata=youtube.channels().list(id=channelId,part='contentDetails').execute()
playlist_id = contentdata['items'][0]['contentDetails']['relatedPlaylists']['uploads']
videos = []
next_page_token = None

while 1:
      res = youtube.playlistItems().list(playlistId=playlist_id,
                                 part='snippet',
                                 maxResults=50,
                                 pageToken=next_page_token).execute()
      videos += res['items']
      next_page_token = res.get('nextPageToken')

      if next_page_token is None:
           break
print(videos)

Output Screen:



Step -2: Getting Video ID for each Video:

video_ids = list(map(lambda x:x['snippet']['resourceId']['videoId'], videos))
video_ids

Step 3: Getting Statistics for Each Video

stats = []

for i in range(0, len(video_ids), 40):
     res = (youtube).videos().list(id=','.join(video_ids[i:i+40]),part='statistics').execute()
     stats += res['items']
print(stats)

Step 4: Collecting All the Information in a List:

title=[]
liked=[]
disliked=[]
views=[]
url=[]
comment=[]

for i in range(len(videos)):
               title.append((videos[i])['snippet']['title'])
               url.append("https://www.youtube.com/watch?v="+(all_videos[i])['snippet']['resourceId']['videoId'])
               liked.append(int((stats[i])['statistics']['likeCount']))
               disliked.append(int((stats[i])['statistics']['dislikeCount']))
               views.append(int((stats[i])['statistics']['viewCount']))
               comment.append(int((stats[i])['statistics']['commentCount']))


Output Screen:
 


Step 5: Creating a Dataframe for the Collected Data

This is not a necessary step but is done to organize data in a better way. You will need to install Pandas library using the code given below(Windows)

pip install pandas

Now use the following code to create a Datafram with Python pandas library:
import pandas as pd data={'title':title,'url':url,'liked':liked,'disliked':disliked,'views':views,'comment':comment}
df=pd.DataFrame(data)
df

Output Screen:



So here we have created a Data to organize that information in a better way.This marks the end of this article on Extracting YouTube Data with Python Using YouTube API.

Here is the link of GitHub for Code

Hope you liked the article do comment your views on it or if you have got any doubt.
Reactions

Post a Comment

0 Comments