Profile based Resume Screening Project (Data Science)

September 12, 2020

Profile based Resume Screening Project (Data Science)

Introduction

Resume writing is not a simple task, there are lot of factors included i.e keywords used, heading text, description and all those factors that make your resume standout from rest of others. People spent hours on formatting and writing their resume and still most of the Resume are not even seen by the hiring authorities.

Today, most of the companies have their own resume screening applications that extract the data from a candidate resume and highlight the important aspects of the resume, thus saving a lot of time. Most of the HR prefer people with more experience and those have deep knowledge in a particular field. So, these resume screening softwares are designed in a specific way to extract required data.

Project Description

In this Project i have build a Profile based resume screening Python program capable of categorizing keywords into seven different concentration areas (e.g. quality/six sigma, operations management, supply chain, project management, data analytics , healthcare systems and Web Development) and determining the one with the highest expertise level in an industrial and systems engineer resume.

Project Content

The project is structured as followed:

Pdf Reading with PdfMiner and storing extracted Text.

Cleaning the data with removing punctuations, numbers, spaces etc.

Calculating Scores for each profile with the help of extracted Text.

Building a Dataframe using Pandas and represent data in form of table.

Using Matplotlib we will visually represent the strength fields of candidate in form of a Pie Chart.

Python Code

We will divide the python Code into various parts for better understanding. If you want to download whole code you can get it from my github link Here.

Part 1) Importing libraries.

#---------------importing necessary libraries------------------#

import pandas as pd
import string,re
import PyPDF2
import textract
import matplotlib.pyplot as plt
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import StringIO
import numpy as np

#-------------------------------------------------------------#

Here, we import the required libraries like pandas, numpy, pdfminer, matplotlib which are required in the process.

Part 2) PDF File opening, reading and extraction of data.

#-------------- Data extraction process -----------------------#

def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = open(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages,
         password=password,caching=caching, check_extractable=True):
        interpreter.process_page(page)

    text = retstr.getvalue()

    fp.close()
    device.close()
    retstr.close()
    return text

#calling the function to open file and get Data
text = convert_pdf_to_txt("Resume.pdf")

Here we are using PdfMiner to open the resume file. Just put the path of your resume file in convert_pdf_to_txt function and you will get the extracted data.

Part 2) Data Cleaning.

#---------------Data Cleaning Process---------------------------#

#convert data into lowercase
text  = text.lower()

#Remove Numeric digits
text = re.sub(r'\d+','',text)

# Removing  punctuation from data
text = text.translate(str.maketrans('','',string.punctuation))

text = text.replace('\n',' ')

text = text.split(' ')

We will now clean the extract text so it can be used properly for our purpose.

Part 3) Creating a Dictionary of Different Profiles and keywords.

#-----------------Preparing a Dictionary containing the keywords for each profile ------------#

keyWords_Dict = {
'Quality/Six Sigma':['black belt','capability analysis','control charts','doe','dmaic',
       'fishbone','gage r&r', 'green belt','ishikawa','iso','kaizen','kpi','lean','metrics',
       'pdsa','performance improvement','process improvement','quality',
       'quality circles','quality tools','root cause','six sigma',
       'stability analysis','statistical analysis','tqm'],      
'Operations management':['automation','bottleneck','constraints','cycle time','efficiency',
     'fmea', 'machinery','maintenance','manufacture','line balancing','oee','operations',
      'operations research','optimization','overall equipment effectiveness',
      'pfmea','process','process mapping','production','resources','safety',
      'stoppage','value stream mapping','utilization'],
'Supply chain':['abc analysis','apics','customer','customs','delivery','distribution','eoq',
     'epq','fleet','forecast','inventory','logistic','materials','outsourcing','procurement',
     'reorder point','rout','safety stock','scheduling','shipping','stock','suppliers',
     'third party logistics','transport','transportation','traffic','supply chain',
     'vendor','warehouse','wip','work in progress'],
'Project management':['administration','agile','budget','cost','direction',
  'feasibility analysis','finance','kanban','leader','leadership','management',
  'pmi','pmp','problem','project','risk','schedule','scrum','stakeholders',
  'milestones','planning'],
'Data analytics':['analytics','api','aws','big data','busines intelligence','clustering','code',
           'coding','data','database','data mining','data science','deep learning','hadoop',
           'hypothesis test','iot','internet','machine learning','modeling','nosql','nlp',
           'predictive','programming','python','r','sql','tableau','text mining',
           'visualuzation'],
'Healthcare':['adverse events','care','clinic','cphq','ergonomics','healthcare',
           'health care','health','hospital','human factors','medical','near misses',
           'patient','reporting system'],
'Web Development':['Jquery','javascript','css','html','angular','react','vue.js','web']              
               }

In the above Dictionary I have added keywords that are most used for that particular profile. You can add more profiles as well as keywords in the dictionary as per your requirement and can have result of those profiles too.

Part 4) Calculating Score for each profile from Candidate Resume.

#-----------Processing the data and getting useful scores for each profile -----------#


#Initialize a list to store scores
scores = []
sigma,operations,supply,pro_mgmt,data_analytic,heath,web = 0,0,0,0,0,0,0

#Now loop over the Data
for profile in keyWords_Dict.keys():
        if(profile == 'Quality/Six Sigma'):
            for word in keyWords_Dict[profile]:
                if(word in text):
                    sigma+=1
            scores.append(sigma)

        elif(profile == 'Operations management') :
            for word in keyWords_Dict[profile]:
                if(word in text):
                    operations+=1
            scores.append(operations)

        elif(profile == 'Supply chain'):
            for word in keyWords_Dict[profile]:
                if(word in text):
                    supply+=1
            scores.append(supply)

        elif(profile == 'Project management'):
            for word in keyWords_Dict[profile]:
                if(word in text):
                    pro_mgmt+=1
            scores.append(pro_mgmt)

        elif(profile == 'Data analytics'):
            for word in keyWords_Dict[profile]:
                if(word in text):
                    data_analytic+=1
            scores.append(data_analytic)

        elif(profile == 'Healthcare'):
            for word in keyWords_Dict[profile]:
                if(word in text):
                    heath+=1
            scores.append(heath)
            
        elif(profile == 'Web Development'):
            for word in keyWords_Dict[profile]:
                if(word in text):
                    web+=1
            scores.append(web)        

Now, we will calculate score for each profile by looping over the dictionary and finding the keywords in our extracted data and updating the scores.

Part 5) Making Dataframe from the data Obtained using Pandas.

#---------Here comes the use of pandas, we will make data frame for our final scores ---------#

dataPresented = pd.DataFrame(scores,index = keyWords_Dict.keys(),columns = ['Scores'])  
dataPresented = dataPresented[(dataPresented.T != 0).any()] #remove fields which have 0 scores
print(dataPresented)

Here we put our scores obtained in pandas dataframe to be used further and also we get scores representation in table form.

# Scores table
                       Scores
Quality/Six Sigma           7
Operations management      10
Supply chain                3
Project management          2
Data analytics              8
Healthcare                  2

Part 5) Making Pie Chart from gathered data.

#--------Now, lets represent the extracted data in a visual way i.e PIE CHART -------------------------------------#


#finding the max scored profile to popout its pie piece
explodeList = [0] * dataPresented.shape[0]
ind =  np.argmax(dataPresented.values)
explodeList[ind] = 0.1
explode = tuple(explodeList)


# Creating Pie Chart
pie = plt.figure(figsize=(10,10))
plt.pie(dataPresented['Scores'], labels=dataPresented.index , explode = explode,
 autopct=lambda p: '{:.1f}%'.format(round(p)) if p > 0 else '',shadow=True,startangle=90)
plt.title('Profile Wise Resume Screening')
plt.axis('equal')
plt.show()

# Save pie chart as a .png file
pie.savefig('Resume_Screening_PieChart.png')

This is the last step of our code, here we make a pie chart from the extracted data and also save a png Image for our reference. Here is the final Outcome as a pie Chart.

Conclusion

So, finally we get the output as a pie chart from our gathered data. This information is beneficial for both Hiring Recruiters as well as for candidates as they can know the important aspects of their resume and can improve it. The whole Source Code is avilable Here.

If you have any doubts or any improvement feel free to add it in comment box.

Have a nice day and enjoy Coding!

Search This Blog

Thug Coders