Scraping Job Postings with Python + Selenium WebDriver
As always, lets define our project goal first! We will use the Selenium library in order to scrape job information from the UK - Job Site (https://www.jobsite.co.uk/) and store everything in a csv file. This project consists of two parts. First, we will automate the searching process (Job Title: Software Engineer, City: Manchester, Radius: 30 Miles). Afterwards, we want to store the following information inside one csv file:
- Job Title
- City
- Salary
- Contract Type
- Job Description (Snippet)
The final result should look like this (extract):
The job requirements (job title, city, radius) can be changed inside the script. The script will take care of the searching process and it will run automatically. In part 1, the webdriver is initialized and we specify the job requirements (extract):
Now, we will get job results, based on our requirements. To be able to store the relevant data, we need to create variables and capture the necessary elements inside. Since we will work with lists, we have to use the ,,find_elements” method, which comes along with Selenium WebDriver and returns a collection of web elements. If you want to store the results of multiple pages, you have to work with paginations. In our example, we want to get access to the results of three pages (each page provides 20 results- in sum, we will store 60 job entries in the csv file). To work with paginations, you will loop through multiple pages. In our example, we write the number of pages inside the range function
Afterwards, we have to open a csv file and provide the filename (inside the loop). With the write method we can capture our results in the csv file:
Apart from that, we also have to put the pagination part inside the for loop:
Its necessary to locate the ,,next button” to be able to get to the next page and to store the results of the next page. Therefore, we store the element inside a variable and we ,,click on this element”:
Once you have specified your searching criteria, the information you want to extract and how many pages you want to get access to, you are flexible regarding the amount of data you want to store.
If you are interested in the step by step tutorial of this project, check out the video version on Youtube: