Reading data from a directory using Python

Hey Guys,

Feel super excited in sharing this information. I was given a task to go through log files in a directory. Seems super easy but there were a few challenges through the way.

The easiest way was go through a directory and read it in pandas. But had issue where there were comments and there was no straight way in reading it or skipping lines with comments. The way I figured out was to first read the lines and store it in a list and later convert the list to a data frame.

 

Step 1 : Read the files from the source to local just for faster performance

 

from shutil import copyfile
from sys import exit
import pandas as pd
import glob
import shutil


for i,x in enumerate(filename):

source = source_path+x
target = target_path+x
dest = shutil.copyfile(source, target)

The above step would copy the files to your local. This step can be skipped if needed

Step 2 : Read through the directory of all the files and append file name while reading.

import os
import pandas as pd

path = 'local_path'
all_files = glob.glob(path + "/*.txt")

s=[]
for i in all_files:
with open(i, "r") as f:

for line in f.readlines():
if not line.startswith(";"):
j = i+','+line

s.append(j)

 

In the above step , Variable all_files will store all the file names . In my case the directory had files that were txt , ksh etc. While appending data to s , I am passing value i which has the file name information.

Once successful with that step the next step is easy in converting a list to dataframe.

df =pd.DataFrame(s)

 

and that’s it !!

 

Take care !!

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s