简体   繁体   中英

sed command to delete text until match is found for each line of a csv

I have a csv file and I am trying to delete all characters from the beginning of the line till it finds the first occurrence of "2015". I want to do this for each line in the csv file.

My csv file structure is as follows:

Field1 , Field2 , Field3 , Field4
sometext1 , 2015-07-15 , sometext2, sometext3
sometext1 , 2015-07-14 , sometext2, sometext3
sometext1 , 2015-07-13 , sometext2, sometext3

I cannot use the cut command or sed for the first occurrence of a comma because the text in the Field1 sometimes has commas in them too, which is making it complicated for parsing. I figured if I search for the first occurrence of the text 2015 for each line and replace all the preceding characters with nothing, then that should work.

FYI I only want to do this for the FIRST occurrence of 2015 only. There is another text field with 2015 in it within another column and I don't any text prior to that to be affected.

For example, if my original line is:

sometext1,#015,2015-07-10,sometext2,2015,sometext3

I want it to return:

2015-07-10,sometext2,2015,sometext3

Does anyone know the sed command to do this?

Any help will be appreciated!

Thanks

Here is a way to do it with sed assuming "#####" never occurs in a line:

sed -e 's/2015/#####&/'|sed -e 's/.*#####//'

For example:

> echo sometext1,#015,2015-07-10,sometext2,2015,sometext3\
  |sed -e 's/2015/#####&/'|sed -e 's/.*#####//'
2015-07-10,sometext2,2015,sometext3

The first sed command prefixes "#####" to the first occurence of 2015 and the second sed command removes everything from the beginning to the end of the "#####" prefix.

The basic reason for using this two stage method is that sed's regular expression matcher has only greedy wildcards that always pick the longest match and does not support lazy matching which picks the shortest match.

If "#####" may occur in a line a more unlikely string could be substituted for it such as "7z#dNjm_wG8a3!esu@Rhv=".

To do this with sed without Perl-style non-greedy operators, you need to mark the first instance with something you know won't be in the line, as Tris describes. However, that solution requires knowledge of what won't be in the file. Fortunately, you can guarantee that a newline won't be in the line because that's what terminated the line. Thus you can do something like:

sed 's/2015/\n&/;s/.*\n//' input.txt > output.txt

NOTE: this won't modify the header row which you would have to treat specially.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM