Hey Guys,
Feel super excited in sharing this information. I was given a task to go through log files in a directory. Seems super easy but there were a few challenges through the way.
The easiest way was go through a directory and read it in pandas. But had issue where there were comments and there was no straight way in reading it or skipping lines with comments. The way I figured out was to first read the lines and store it in a list and later convert the list to a data frame.
Step 1 : Read the files from the source to local just for faster performance
from shutil import copyfile from sys import exit import pandas as pd import glob import shutil for i,x in enumerate(filename): source = source_path+x target = target_path+x dest = shutil.copyfile(source, target)
The above step would copy the files to your local. This step can be skipped if needed
Step 2 : Read through the directory of all the files and append file name while reading.
import os
import pandas as pd
path = 'local_path' all_files = glob.glob(path + "/*.txt") s=[] for i in all_files: with open(i, "r") as f: for line in f.readlines(): if not line.startswith(";"): j = i+','+line s.append(j)
In the above step , Variable all_files will store all the file names . In my case the directory had files that were txt , ksh etc. While appending data to s , I am passing value i which has the file name information.
Once successful with that step the next step is easy in converting a list to dataframe.
df =pd.DataFrame(s)
and that’s it !!
Take care !!