---
As I've progressed through school and began looking into next steps, whether that be a job in industry or
going to graduate school, there is one thing that has always bugged me: whenever I update my projects or
something on LinkedIn I have to go update it in the correct format on my CV and Resume. I started looking into ways
to interface with LinkedIn and found that the free methods are fairly limited to web scraping. This was an endeavor
where I got to learn about the pain that is scraping dynamicly generated webpages. After a bit of trial and error I found a reliable way of
gathering the information I needed and just had to put it into to a Microsoft Word document. Luckily, there was a library
(pydocx) that made it relatively easy to code the
format I use in my handmade resume. My CV was made using this program as a way to quickly generate a starting template.
A snippet of the generated CV format
There came an issue where LinkedIn started asking if I was a robot and, believe it or not, my robot didn't know how to handle it. To address this I added a check for the correct URL after logging in that makes the program pause until the user solves the CAPTCHA and the URL matches the LinkedIn feed. Another problem I ran into was the profile page not showing all of a section if it was too long, so I changed the code slightly to navigate straight to the detailed pages for the three sections I was scraping.
There are some improvements I would like to make to this if I continue work on it:
- Improve the speed of the scraping
- Support the rest of the profile sections
- Use AI or fuzzy-matching to cater a resume to a job listing
NOTE: After a brief google search while writing this I see there are quite a few solutions, but I'm glad I did this myself as it introduced me to web scraping and helped me get more comfortable formatting Word documents through code.