python - Get last n lines of a file, similar to tail

ID : 20122

viewed : 29

Tags : pythonfilefile-iotaillogfilepython

Top 5 Answer for python - Get last n lines of a file, similar to tail

vote vote

95

This may be quicker than yours. Makes no assumptions about line length. Backs through the file one block at a time till it's found the right number of '\n' characters.

def tail( f, lines=20 ):     total_lines_wanted = lines      BLOCK_SIZE = 1024     f.seek(0, 2)     block_end_byte = f.tell()     lines_to_go = total_lines_wanted     block_number = -1     blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting                 # from the end of the file     while lines_to_go > 0 and block_end_byte > 0:         if (block_end_byte - BLOCK_SIZE > 0):             # read the last block we haven't yet read             f.seek(block_number*BLOCK_SIZE, 2)             blocks.append(f.read(BLOCK_SIZE))         else:             # file too small, start from begining             f.seek(0,0)             # only read what was not read             blocks.append(f.read(block_end_byte))         lines_found = blocks[-1].count('\n')         lines_to_go -= lines_found         block_end_byte -= BLOCK_SIZE         block_number -= 1     all_read_text = ''.join(reversed(blocks))     return '\n'.join(all_read_text.splitlines()[-total_lines_wanted:]) 

I don't like tricky assumptions about line length when -- as a practical matter -- you can never know things like that.

Generally, this will locate the last 20 lines on the first or second pass through the loop. If your 74 character thing is actually accurate, you make the block size 2048 and you'll tail 20 lines almost immediately.

Also, I don't burn a lot of brain calories trying to finesse alignment with physical OS blocks. Using these high-level I/O packages, I doubt you'll see any performance consequence of trying to align on OS block boundaries. If you use lower-level I/O, then you might see a speedup.


UPDATE

for Python 3.2 and up, follow the process on bytes as In text files (those opened without a "b" in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)).:

eg: f = open('C:/.../../apache_logs.txt', 'rb')

 def tail(f, lines=20):     total_lines_wanted = lines      BLOCK_SIZE = 1024     f.seek(0, 2)     block_end_byte = f.tell()     lines_to_go = total_lines_wanted     block_number = -1     blocks = []     while lines_to_go > 0 and block_end_byte > 0:         if (block_end_byte - BLOCK_SIZE > 0):             f.seek(block_number*BLOCK_SIZE, 2)             blocks.append(f.read(BLOCK_SIZE))         else:             f.seek(0,0)             blocks.append(f.read(block_end_byte))         lines_found = blocks[-1].count(b'\n')         lines_to_go -= lines_found         block_end_byte -= BLOCK_SIZE         block_number -= 1     all_read_text = b''.join(reversed(blocks))     return b'\n'.join(all_read_text.splitlines()[-total_lines_wanted:]) 
vote vote

81

Assumes a unix-like system on Python 2 you can do:

import os def tail(f, n, offset=0):   stdin,stdout = os.popen2("tail -n "+n+offset+" "+f)   stdin.close()   lines = stdout.readlines(); stdout.close()   return lines[:,-offset] 

For python 3 you may do:

import subprocess def tail(f, n, offset=0):     proc = subprocess.Popen(['tail', '-n', n + offset, f], stdout=subprocess.PIPE)     lines = proc.stdout.readlines()     return lines[:, -offset] 
vote vote

70

Here is my answer. Pure python. Using timeit it seems pretty fast. Tailing 100 lines of a log file that has 100,000 lines:

>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10) 0.0014600753784179688 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100) 0.00899195671081543 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=1000) 0.05842900276184082 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10000) 0.5394978523254395 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100000) 5.377126932144165 

Here is the code:

import os   def tail(f, lines=1, _buffer=4098):     """Tail a file and get X lines from the end"""     # place holder for the lines found     lines_found = []      # block counter will be multiplied by buffer     # to get the block size from the end     block_counter = -1      # loop until we find X lines     while len(lines_found) < lines:         try:             f.seek(block_counter * _buffer, os.SEEK_END)         except IOError:  # either file is too small, or too many lines requested             f.seek(0)             lines_found = f.readlines()             break          lines_found = f.readlines()          # we found enough lines, get out         # Removed this line because it was redundant the while will catch         # it, I left it for history         # if len(lines_found) > lines:         #    break          # decrement the block counter to get the         # next X bytes         block_counter -= 1      return lines_found[-lines:] 
vote vote

61

If reading the whole file is acceptable then use a deque.

from collections import deque deque(f, maxlen=n) 

Prior to 2.6, deques didn't have a maxlen option, but it's easy enough to implement.

import itertools def maxque(items, size):     items = iter(items)     q = deque(itertools.islice(items, size))     for item in items:         del q[0]         q.append(item)     return q 

If it's a requirement to read the file from the end, then use a gallop (a.k.a exponential) search.

def tail(f, n):     assert n >= 0     pos, lines = n+1, []     while len(lines) <= n:         try:             f.seek(-pos, 2)         except IOError:             f.seek(0)             break         finally:             lines = list(f)         pos *= 2     return lines[-n:] 
vote vote

56

S.Lott's answer above almost works for me but ends up giving me partial lines. It turns out that it corrupts data on block boundaries because data holds the read blocks in reversed order. When ''.join(data) is called, the blocks are in the wrong order. This fixes that.

def tail(f, window=20):     """     Returns the last `window` lines of file `f` as a list.     f - a byte file-like object     """     if window == 0:         return []     BUFSIZ = 1024     f.seek(0, 2)     bytes = f.tell()     size = window + 1     block = -1     data = []     while size > 0 and bytes > 0:         if bytes - BUFSIZ > 0:             # Seek back one whole BUFSIZ             f.seek(block * BUFSIZ, 2)             # read BUFFER             data.insert(0, f.read(BUFSIZ))         else:             # file too small, start from begining             f.seek(0,0)             # only read what was not read             data.insert(0, f.read(bytes))         linesFound = data[0].count('\n')         size -= linesFound         bytes -= BUFSIZ         block -= 1     return ''.join(data).splitlines()[-window:] 

Top 3 video Explaining python - Get last n lines of a file, similar to tail

Related QUESTION?