Reading data from a directory using Python

Hey Guys,

Feel super excited in sharing this information. I was given a task to go through log files in a directory. Seems super easy but there were a few challenges through the way.

The easiest way was go through a directory and read it in pandas. But had issue where there were comments and there was no straight way in reading it or skipping lines with comments. The way I figured out was to first read the lines and store it in a list and later convert the list to a data frame.


Step 1 : Read the files from the source to local just for faster performance


from shutil import copyfile
from sys import exit
import pandas as pd
import glob
import shutil

for i,x in enumerate(filename):

source = source_path+x
target = target_path+x
dest = shutil.copyfile(source, target)

The above step would copy the files to your local. This step can be skipped if needed

Step 2 : Read through the directory of all the files and append file name while reading.

import os
import pandas as pd

path = 'local_path'
all_files = glob.glob(path + "/*.txt")

for i in all_files:
with open(i, "r") as f:

for line in f.readlines():
if not line.startswith(";"):
j = i+','+line



In the above step , Variable all_files will store all the file names . In my case the directory had files that were txt , ksh etc. While appending data to s , I am passing value i which has the file name information.

Once successful with that step the next step is easy in converting a list to dataframe.

df =pd.DataFrame(s)


and that’s it !!


Take care !!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s