Instagram-Scraper with Selenium— Let’s store profile information and the followers list in an Excel-File

First and foremost, what do we want to achieve?

The ultimate goal is to store the data of a random Instagram profile inside an Excel file. For this, we will use Python and the Selenium Library. We will focus on the profile information (first sheet) and the names of the followers (second sheet):

Instagram Profile
Scraping data from Instagram

Structure

  1. Login Process Automation

Before we start to extract profile information, we need to complete the login steps and get access to the target profile. With this being said, the webdriver needs to be imported from the Selenium library and initialized afterwards, so that we can access the Instagram URL to get to the main page.

initializing the webdriver

Now, when we’ve reached Instagram homepage, we have to be able to interact with in input fields (username + password) and the Log In button to provide our credentials. That means we have to take a look at the developer tools and the HTML structure and build our locators:

Inspecting the elements to login

For simplicity reasons, the cookie and notification popup handling will be skipped for now. The complete code for this project can be accessed from the following Github link: https://github.com/LX-schlee

Now, we have successfully accomplished the login process and can now proceed and search for a specific profile we want to extract the neccessary data from:

Let’s build the locator for the searchbox

The searchbox itself changes its internal structure after we’ve clicked on it after the first time. That means, we need to provide two variables in order to be able to interact with it. The profile name itself is stored in one separate variable:

interaction with the searchbox

After we have typed in the profile name inside the searchbox, we need to click on Enter two times. This can be accomplished by a simple while loop. We also have to work with the Keys Class from the Selenium package:

Click Enter to times to get access to the profile

2. Get the Data

After this step we will land on the profile itself. Now we have to take create variables and store the locators and elements inside it. We want to focus on the posts, the followers, the following and the profile header:

information we need to store
store everything inside variables

After we have stored the neccesary information inside and stored everything in variables, we need to move on and get access to the followers list:

We have clicked on the followers and have now access to the list of followers which are stored inside the scrolling bar. In the next step we need to scroll all the way down to be able to store the followers in a new list. Therefore we will use one JavaScript expression in order to be able to scroll to the bottom.

scrolling down to the bottom to capture all followers

Next, we will create on empty list and append the follower names inside it:

appending the follower names inside a new list

Now, we print out our new list and we have stored successfully the follower names.

followers list

Our next mission is to store everything inside one Excel file. But before we do it, we also can take a look on the data by using the pandas Dataframe. We create two dataframes. The first one will print out the profile information, the second one will give us an overview of the followers:

usage of the pandas dataframe

After printing out both dataframes (df1 + df2) we see, that now we have the information we wanted to scrape.

dataframe 1 + dataframe 2

In our last step we will create two sheets inside one new Excel file. The first sheet will contain the profile information, the second stores the follower names. For that we use pandas Excel Writer. Inside it we have to provide the file name and the engine argument:

If you are interested in the step by step tutorial of this project, check out the video version on Youtube:

Links:

--

--

--

Python Enthusiast | IT- Consultant | Focussing on projects related to Data Science and Testautomation

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

An Introduction to Android Instant Apps

C and Basic

Flutter Barcode Plugin — Writing C++ Code for Windows Desktop

{Official//LIVESTREAM} “NFL Week 7” liVe STrEaMs@ Free NFL Match

YOUR COMPLETE ONLINE DESKTOP: Hosted on Server in just $5.00 :)

Constellations of Data || How Computer Science *Is Breaking* Our World

First day at Flatiron Bootcamp

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alexander Schlee

Alexander Schlee

Python Enthusiast | IT- Consultant | Focussing on projects related to Data Science and Testautomation

More from Medium

Handling Shadow DOM in Selenium- Devstringx

Shadow DOM

How to Learn Java for Automation Testers with Practice, Examples

Fixing “comma-separated values” output for multi-line fields in MuleSoft

Salesforce, Tableau and MuleSoft bring new COVID-19 data tools