How To Remove Duplicate Text Lines Linux

You need to sort data from a log file, but there are too many duplicate lines. Follow this guide on How To Remove Duplicate Text Lines Linux and you will find out how.

You need to use shell along with the following two Linux command-line utilities to sort and remove duplicate text lines:

  1. sort command– Sort lines of text files in Linux and Unix-like systems.
  2. uniq command– Report or omit repeated lines on Linux or Unix

Removing Duplicate Lines With Sort, Uniq, and Shell Pipes

Use the following syntax:

sort {file-name} | uniq -u
sort file.log | uniq -u

Remove duplicate lines with uniq

Here is a test file called test.txt:

cat garbage.txt

Sample outputs of the cat command:

this is a test
crypto is the future
we hope you like this tutorial
Internet traffic is the flow of data within the entire Internet
this is a test
stock market today

Removing duplicate lines from a text file on Linux

Use the following command to remove all duplicate lines:

$ sort test.txt | uniq -u

Sample output:

crypto is the future
we hope you like this tutorial
Internet traffic is the flow of data within the entire Internet
stock market today

Where,

-u : check for strict ordering, and remove all duplicate lines.

Sort file contents on Linux

Let’s say you have a file named users.txt:

cat users.txt

Sample outputs:

Vivek Gite 24/10/72
Martin Lee 12/11/68
Sai Kumar  31/12/84
Marlena Summer 13/05/76
Wendy Lee  04/05/77
Sayali Gite 13/02/76
Vivek Gite 24/10/72

Let us sort, run:

sort users.txt

Next sort by the last name, run:

sort +2 users.txt

Want to sort in reverse order? Try:

sort -r users.txt

If you want to eliminate any duplicate entries in a file while ordering the file, run:

sort +2 -u users.txt
sort -u users.txt

How to remove duplicate lines on Linux with uniq command

Consider the following file:

cat -n telphone.txt

Sample outputs:

     1  99884123
     2  97993431
     3  81234000
     4  02041467
     5  77985508
     6  97993431
     7  77985509
     8  77985509

The uniq command removes the 8th line from the file and places the result in a file called output.txt:uniq telphone.txt output.txt

Verify it:

cat -n output.txt

How to remove duplicate lines in a .txt file and save the result to the new file

Try any one of the following syntaxes:

sort input_file | uniq > output_file
sort input_file | uniq -u | tee output_file

Conclusion

The sort command is used to order the lines of a text file and uniq filters duplicate adjacent lines from a text file. These commands have many more useful options. I suggest you read the man pages by typing the following man command:

man sort
man uniq

Recent Articles

Related Stories

Stay on op - Ge the daily news in your inbox