Remove Duplicate Text Lines from single or two files in Linux

Remove Duplicate Text Lines from single or two files in Linux

How to remove duplicate text lines or get only unique text lines in linux? It is very simply achievable by “Sort” and “uniq” command.

a) sort command – sort lines of text files

b) uniq command – report or omit repeated lines

For example we have file name “file1.txt”

# cat file1.txt
one
two
three
one
four
three
eight
ten

Use the following syntax to get unique entry:

$ sort file1.txt | sort -u

Output:
eight
four
one
ten
three
two

Use the following syntax to remove duplicate entry:

$ sort file1.txt | uniq -u

Output:
eight
four
ten
two

Here -u : check for strict ordering, remove all duplicate lines.

How to Delete Common Lines From Two Files in Linux

Now our second point is to remove common lines from two Files, for this we have “comm” Linux command. comm is efficient because it does the job in a single run, without storing the entire file in memory.

The basic syntax of this command is as follows.

comm [-1] [-2] [-3 ] file1 file2

-1 Suppress the output column of lines unique to file1.txt.
-2 Suppress the output column of lines unique to file2.txt.
-3 Suppress the output column of lines duplicated in file1 and file2.txt.

“file1” Name of the first file to compare. “file2” Name of the second file to compare.

Before applying “comm”, we need to sort the input files. So, in order to get the lines unique to file1.txt, we can use a combination of “comm” and “sort” commands as follows.

For example we have file1.txt above mentioned its contain and file2.txt

# cat file1.txt
one
two
three
four
five
six
seven
eight

comm -23 <(sort -u file1.txt) <(sort -u file2.txt) > file3.txt
Output:
more file3.txt
ten