Instagram-Scraper with Selenium— Let’s store profile information and the followers list in an Excel-File
First and foremost, what do we want to achieve?
The ultimate goal is to store the data of a random Instagram profile inside an Excel file. For this, we will use Python and the Selenium Library. We will focus on the profile information (first sheet) and the names of the followers (second sheet):
- Login Process Automation
Before we start to extract profile information, we need to complete the login steps and get access to the target profile. With this being said, the webdriver needs to be imported from the Selenium library and initialized afterwards, so that we can access the Instagram URL to get to the main page.
Now, when we’ve reached Instagram homepage, we have to be able to interact with in input fields (username + password) and the Log In button to provide our credentials. That means we have to take a look at the developer tools and the HTML structure and build our locators:
For simplicity reasons, the cookie and notification popup handling will be skipped for now. The complete code for this project can be accessed from the following Github link: https://github.com/LX-schlee
Now, we have successfully accomplished the login process and can now proceed and search for a specific profile we want to extract the neccessary data from:
The searchbox itself changes its internal structure after we’ve clicked on it after the first time. That means, we need to provide two variables in order to be able to interact with it. The profile name itself is stored in one separate variable:
After we have typed in the profile name inside the searchbox, we need to click on Enter two times. This can be accomplished by a simple while loop. We also have to work with the Keys Class from the Selenium package:
2. Get the Data
After this step we will land on the profile itself. Now we have to take create variables and store the locators and elements inside it. We want to focus on the posts, the followers, the following and the profile header:
After we have stored the neccesary information inside and stored everything in variables, we need to move on and get access to the followers list:
Next, we will create on empty list and append the follower names inside it:
Now, we print out our new list and we have stored successfully the follower names.
Our next mission is to store everything inside one Excel file. But before we do it, we also can take a look on the data by using the pandas Dataframe. We create two dataframes. The first one will print out the profile information, the second one will give us an overview of the followers:
After printing out both dataframes (df1 + df2) we see, that now we have the information we wanted to scrape.
In our last step we will create two sheets inside one new Excel file. The first sheet will contain the profile information, the second stores the follower names. For that we use pandas Excel Writer. Inside it we have to provide the file name and the engine argument:
If you are interested in the step by step tutorial of this project, check out the video version on Youtube: