linux - How to remove the lines which appear on file B from another file A?

ID : 20095

viewed : 24

Tags : linuxshellseddiffgreplinux

Top 5 Answer for linux - How to remove the lines which appear on file B from another file A?

vote vote

96

If the files are sorted (they are in your example):

comm -23 file1 file2 

-23 suppresses the lines that are in both files, or only in file 2. If the files are not sorted, pipe them through sort first...

See the man page here

vote vote

83

grep -Fvxf <lines-to-remove> <all-lines>

Example:

cat <<EOF > A b 1 a 0 01 b 1 EOF  cat <<EOF > B 0 1 EOF  grep -Fvxf B A 

Output:

b a 01 b 

Explanation:

  • -F: use literal strings instead of the default BRE
  • -x: only consider matches that match the entire line
  • -v: print non-matching
  • -f file: take patterns from the given file

This method is slower on pre-sorted files than other methods, since it is more general. If speed matters as well, see: Fast way of finding lines in one file that are not in another?

Here's a quick bash automation for in-line operation:

remove-lines() (   remove_lines="$1"   all_lines="$2"   tmp_file="$(mktemp)"   grep -Fvxf "$remove_lines" "$all_lines" > "$tmp_file"   mv "$tmp_file" "$all_lines" ) 

GitHub upstream.

usage:

remove-lines lines-to-remove remove-from-this-file 

See also: https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another

vote vote

70

awk to the rescue!

This solution doesn't require sorted inputs. You have to provide fileB first.

awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA 

returns

A C 

How does it work?

NR==FNR{a[$0];next} idiom is for storing the first file in an associative array as keys for a later "contains" test.

NR==FNR is checking whether we're scanning the first file, where the global line counter (NR) equals to the current file line counter (FNR).

a[$0] adds the current line to the associative array as key, note that this behaves like a set, where there won't be any duplicate values (keys)

!($0 in a) we're now in the next file(s), in is a contains test, here it's checking whether current line is in the set we populated in the first step from the first file, ! negates the condition. What is missing here is the action, which by default is {print} and usually not written explicitly.

Note that this can now be used to remove blacklisted words.

$ awk '...' badwords allwords > goodwords 

with a slight change it can clean multiple lists and create cleaned versions.

$ awk 'NR==FNR{a[$0];next} !($0 in a){print > FILENAME".clean"}' bad file1 file2 file3 ... 
vote vote

68

Another way to do the same thing (also requires sorted input):

join -v 1 fileA fileB 

In Bash, if the files are not pre-sorted:

join -v 1 <(sort fileA) <(sort fileB) 
vote vote

50

You can do this unless your files are sorted

diff file-a file-b --new-line-format="" --old-line-format="%L" --unchanged-line-format="" > file-a 

--new-line-format is for lines that are in file b but not in a --old-.. is for lines that are in file a but not in b --unchanged-.. is for lines that are in both. %L makes it so the line is printed exactly.

man diff 

for more details

Top 3 video Explaining linux - How to remove the lines which appear on file B from another file A?

Related QUESTION?