{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "___\n", "
Content Copyright by Pierian Data
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Advanced Modules Exercise Solutions\n", "\n", "It's time to test your new skills, this puzzle project will combine multiple skills sets, including unzipping files with Python, using os module to automatically search through lots of files.\n", "\n", "## Your Goal\n", "\n", "This is a puzzle, so we don't want to give you too much guidance and instead have you figure out things on your own.\n", "\n", "There is a .zip file called 'unzip_me_for_instructions.zip', unzip it, open the .txt file with Python, read the instructions and see if you can figure out what you need to do!\n", "\n", "**If you get stuck or don't know where to start, here is a [guide/hints](https://docs.google.com/document/d/1JxydUr4n4fSR0EwwuwT-aHia-yPK6r-oTBuVT2sqheo/edit?usp=sharing)**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Unzipping the File\n", "\n", "We can easily use the shutil library to extract and unzip the contents of the .zip file" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import shutil" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "shutil.unpack_archive('unzip_me_for_instructions.zip','','zip')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Read the instructions file\n", "\n", "Let's figure out what we need to do, open the instructions.txt file." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Good work on unzipping the file!\n", "You should now see 5 folders, each with a lot of random .txt files.\n", "Within one of these text files is a telephone number formated ###-###-#### \n", "Use the Python os module and regular expressions to iterate through each file, open it, and search for a telephone number.\n", "Good luck!\n" ] } ], "source": [ "with open('extracted_content/Instructions.txt') as f:\n", " content = f.read()\n", " print(content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Regular Expression to Find the Link\n", "\n", "There are many approaches to take here, but since we know we are looking for a phone number, there should be a digits in the form ###-###-####, so we can easily create a regex expression for this and test it. Once its tested and working, we can figure out how to run it through all the txt documents." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import re" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pattern = r'\\d{3}-\\d{3}-\\d{4}'" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "test_string = \"here is a random number 1231231234 , here is phone number formatted 123-123-1234\"" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['123-123-1234']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.findall(pattern,test_string)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Create a function for regex\n", "\n", "Let's put this inside a function that applies it to the contents of a .txt file, this way we can apply this function to all the txt files in the extracted_content folder." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def search(file,pattern= r'\\d{3}-\\d{3}-\\d{4}'):\n", " f = open(file,'r')\n", " text = f.read()\n", " \n", " if re.search(pattern,text):\n", " return re.search(pattern,text)\n", " else:\n", " return ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: OS Walk through the Files to Get the Link\n", "\n", "Now that we have a basic function to search through the text of the files, let's perform an os.walk through the unzipped directory to find the links hidden somewhere in one of the text files." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import os" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "results = []\n", "for folder , sub_folders , files in os.walk(os.getcwd()+\"\\\\extracted_content\"):\n", " \n", " for f in files:\n", " full_path = folder+'\\\\'+f\n", " \n", " results.append(search(full_path)) " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "719-266-2837\n" ] } ], "source": [ "for r in results:\n", " if r != '':\n", " print(r.group())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "Excellent work! More information on this phone number:\n", "* https://www.npr.org/2011/12/21/144069758/callin-oates-the-hotline-you-dont-need-but-might-call-anyway\n", "* https://twitter.com/CallinOates" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }