How to remove/delete CTRL-M (^M) characters from text files in Linux and UNIX systems

How to remove/delete CTRL-M (^M) characters from text files in Linux and UNIX systems

Newline


Newline (frequently called line ending, end of line (EOL), line feed, or line break) is a control character or sequence of control characters in a character encoding specification (e.g. ASCII table or EBCDIC ) that is used to signify the end of a line of text and the start of a new one.

The concepts of line feed (LF) and carriage return (CR) are closely associated and can be considered either separately or together. In the physical media of typewriters and printers, two axes of motion, "down" and "across", are needed to create a new line on the page. Although the design of a machine (typewriter or printer) must consider them separately, the abstract logic of software can combine them together as one event. This is why a newline in character encoding can be defined as LF and CR combined into one (commonly called CR+LF or CRLF).

OS designers had to choose how to represent the start of a new line in text in computer files. For various historical reasons, in the Unix/Linux world a single LF character was chosen as the newline marker; MS-DOS chose CR+LF, and Windows inherited this. Thus different platforms use different conventions.

Software applications and operating system representation of a newline with one or two control characters
Operating system Character Encoding Abbreviation hex value dec value Escape Sequence
Linux, Unix, Free BSD, Unix like OS ASCII LF 0A 10 \n
Microsoft Windows, MS-DOS, Symbian OS, Palm OS ASCII CR LF 0D 0A 13 10 \r\n

When you copy text file from windows to linux, you copy new line as \r\n and this escape seqence is saved to file in linux machine. When you print such file with e.g. cat -v file.txt, linux represent \n as a new line, but carriage return (escape seqence \r) still remain in the end of each file.

So, when you copy file from windows to linux/bsd system, ^M (carriage return ) char still remain on the end of every line.

~] cat -v file.txt 
1. line number 1^M
2. line number 2^M
3. line number 3^M
4. line number 4^M

I found a lot of instructions on how to remove all ^M (carriage return ) from a file, but only a few of them were really functional.

Introdution to ASCII


The ASCII table contains letters, numbers, control characters, and other symbols. Each character is assigned a unique 7-bit code. ASCII is an acronym for American Standard Code for Information Interchange.
We need only a ASCII representation of (carriage return char.

Decimal Octal Hex Binary Value Description Carret notation Escape sequence in C
013 015 0D 0000 1101 CR carriage return ^M \r

Howe remove CTRL+M / CTRL^M from file

sed solution

Eescape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.

sed escape sequence
Escape Sequence Description
\r Produces or matches a carriage return (ASCII 13).
\cx Produces or matches CONTROL-x, where x is any character.
\dxxx Produces or matches a character whose decimal ASCII value is xxx.
\oxxx Produces or matches a character whose octal ASCII value is xxx.
\xxx Produces or matches a character whose hexadecimal ASCII value is xx.

the following examples have the same result ( delete ^M (carriage return ) from file

~] sed -i 's/\r//g' file.txt
~] sed -i 's/\cM//g' file.txt
~] sed -i 's/\d013//g' file.txt
~] sed -i 's/\o015//g' file.txt
`] sed -i 's/\x0D//g' file.txt

How add carriage return (CTRL+M or ^M) to file


Another problem is how to add carriage return to end of new line:

sed solution:

~] sed -i 's/$/\x0D/g' file.txt

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus