Author Topic: Linux Commandline Editing - SOLVED  (Read 1037 times)

Linux Commandline Editing - SOLVED
« on: 14 March, 2009, 03:19:27 pm »
On a Linux system, I want to modify the output of a mysql query (redirected to a file) and put the result in another file, and do it automatically.

This should be possible, in that the starting point and target file format are clear.  What I would start with is this:

w1   p1   f1   r1   t1   m1
1.25   1.08   0.66   0.00   0.76   0.92
w   p   f   r   t   m
1.37   1.44   1.29   0.43   1.19   1.36
w2   p2   f2   r2   t2   m2
1.25   1.58   1.25   0.33   1.20   1.39

and in each instance of this query being run, the numbers will be different (though always between 0 and 4 to two decimal places), but the labels will be the same.  I want to:

a) strip out the labels, leaving just the numbers.
b) place commas, rather than tabs or spaces, between the numbers
c) add 'Pre' to the first row or numbers, 'Norm' to the second row and 'Post' to the last row

to end up with:

Pre,1.25,1.08,0.66,0.00,0.76,0.92
Norm,1.37,1.44,1.29,0.43,1.19,1.36
Post,1.25,1.58,1.25,0.33,1.20,1.39

I'm not familiar with the command line editing commands required to accomplish this task.  I presume it would require three passes through the file.

Any help and pointers gratefully received.

Re: Linux Commandline Editing
« Reply #1 on: 14 March, 2009, 03:51:11 pm »
a) strip out the labels

Easier to grab only lines starting with a digit:-

cat foo.txt | grep "^[0-9]"

b) Strip out label lines and replace tabs or spaces with commas:-

cat foo.txt | grep "^[0-9]" | sed -e 's/[ \t][ \t]*/,/g'

This replaces each occurrence of a space or a tab, followed by any number of spaces or tabs, with a single comma, the g means do it multiple times rather than just for the first occurrence.

c) The third is trickier.

Put the following in a file called labels.txt

Pre
Norm
Post

Then do:-

cat foo.txt | grep "^[0-9]" | paste labels.txt - | sed -e 's/[ \t][ \t]*/,/g'
Pre,1.25,1.08,0.66,0.00,0.76,0.92
Norm,1.37,1.44,1.29,0.43,1.19,1.36
Post,1.25,1.58,1.25,0.33,1.20,1.39
"Yes please" said Squirrel "biscuits are our favourite things."

Re: Linux Commandline Editing
« Reply #2 on: 14 March, 2009, 04:04:06 pm »
Some awk code:

Code: [Select]
awk 'BEGIN{h[2]="Pre";h[4]="Norm";h[6]="Post"} NR==2||NR==4||NR==6 {gsub(" +", ",");printf("%s,%s\n",h[NR],$0)}' inputfile.txt

Re: Linux Commandline Editing
« Reply #3 on: 14 March, 2009, 05:27:56 pm »
That is fantastic, Greenbank, and all, ultimately in one line!  I knew sed would be involved, but was struggling with how to even start! :thumbsup:

Thanks, too, philip - don't know awk at all, but it looks interesting!

Re: Linux Commandline Editing
« Reply #4 on: 14 March, 2009, 05:38:15 pm »
Thanks, too, philip - don't know awk at all, but it looks interesting!
awk processes files one line at a time (each line is referred to as a record). The optional BEGIN{} block is executed before reading any records; in this case it sets up elements 2, 4 and 6 of an array called h. The next bit is a pattern that determines which lines to process; NR is the record number or line number so it matches lines 2, 4 and 6. The final {} block is executed when a line is processed; in this case it first does a gsub to change each occurrence of one or more spaces into a comma, then it prints the appropriate element of the h array and the modified line.