You need to sort data from a log file, but there are too many duplicate lines. Follow this guide on How To Remove Duplicate Text Lines Linux and you will find out how.
You need to use shell along with the following two Linux command-line utilities to sort and remove duplicate text lines:
sort command
– Sort lines of text files in Linux and Unix-like systems.uniq command
– Report or omit repeated lines on Linux or Unix
Removing Duplicate Lines With Sort, Uniq, and Shell Pipes
Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u
Remove duplicate lines with uniq
Here is a test file called test.txt:
cat garbage.txt
Sample outputs of the cat command:
this is a test
crypto is the future
we hope you like this tutorial
Internet traffic is the flow of data within the entire Internet
this is a test
stock market today
Removing duplicate lines from a text file on Linux
Use the following command to remove all duplicate lines:
$ sort test.txt | uniq -u
Sample output:
crypto is the future
we hope you like this tutorial
Internet traffic is the flow of data within the entire Internet
stock market today
Where,
-u
: check for strict ordering, and remove all duplicate lines.
Sort file contents on Linux
Let’s say you have a file named users.txt:
cat users.txt
Sample outputs:
Vivek Gite 24/10/72
Martin Lee 12/11/68
Sai Kumar 31/12/84
Marlena Summer 13/05/76
Wendy Lee 04/05/77
Sayali Gite 13/02/76
Vivek Gite 24/10/72
Let us sort, run:
sort users.txt
Next sort by the last name, run:
sort +2 users.txt
Want to sort in reverse order? Try:
sort -r users.txt
If you want to eliminate any duplicate entries in a file while ordering the file, run:
sort +2 -u users.txt
sort -u users.txt
How to remove duplicate lines on Linux with uniq command
Consider the following file:
cat -n telphone.txt
Sample outputs:
1 99884123
2 97993431
3 81234000
4 02041467
5 77985508
6 97993431
7 77985509
8 77985509
The uniq command removes the 8th line from the file and places the result in a file called output.txt:uniq telphone.txt output.txt
Verify it:
cat -n output.txt
How to remove duplicate lines in a .txt file and save the result to the new file
Try any one of the following syntaxes:
sort input_file | uniq > output_file
sort input_file | uniq -u | tee output_file
Conclusion
The sort command is used to order the lines of a text file and uniq filters duplicate adjacent lines from a text file. These commands have many more useful options. I suggest you read the man pages by typing the following man command:
man sort
man uniq