简体   繁体   English

sed或awk在一行的最后100个字符中查找字符串或删除该行

[英]Sed or awk Find a string in the last 100 characters of a line or delete the line

first question so hopefully I form it well. 第一个问题,希望我能顺利完成。

I'm looking to match a string, namely "lang":"en" in the last 100 characters of a line and if there's no match, delete the line. 我想在一行的最后100个字符中匹配字符串,即“ lang”:“ en”,如果没有匹配项,请删除该行。

I have tried using sed by doing 我已经尝试通过使用sed

sed '/"lang":"en"/!d' file > output

But unfortunately many lines have that string more than once and I only care about the final occurrence of it. 但是不幸的是,许多行都多次包含该字符串,而我只关心它的最终出现。

I'm learning sed still, but don't know anything about awk and most of my searches have come up with "first/last instance in a file" rather than "in a line" so any help in learning the best method to do this would be great. 我仍在学习sed,但是对awk一无所知,而且我的大部分搜索都使用“文件中的第一个/最后一个实例”而不是“一行”来进行搜索,因此对学习最好的方法有帮助那就太好了。 thanks. 谢谢。

This should work with any Posix awk: 这适用于任何Posix awk:

awk 'match(substr($0,length-99),/"lang":"en"/)' file

You can do it with a simple string find, instead of a regular expression, but the string is more annoying to type: 您可以使用简单的字符串查找(而不是正则表达式)来执行此操作,但是键入该字符串比较烦人:

awk 'index(substr($0,length-99),"\"lang\":\"en\"")' file

Both simply extract the last 100 characters of each line, and if the test pattern is found in the substring, print the line (print is the default action, so the program consists only of the condition.) 两者都仅提取每行的最后100个字符,并且如果在子字符串中找到测试模式,则打印该行(打印是默认操作,因此程序仅由条件组成)。

For a simple regex-based solution, 对于基于正则表达式的简单解决方案,

grep -E '"lang":"en".{0,89}$' file

I subtracted the length of "lang":"en" from the maximum amount, assuming you mean the string must be found entirely within the last 100 characters. 我从最大数量中减去了"lang":"en"的长度,假设您的意思是字符串必须完全位于最后100个字符之内。

This looks like you are attempting to process JSON data, so perhaps you can come up with a better, structure-based rule, and use jq instead. 看起来您正在尝试处理JSON数据,因此也许您可以提出一个更好的基于结构的规则,而改用jq

jq 'select(path["to"]["lang"] == "en")' file

to find "en" in the structure "path": { ... "to": { ..., "lang": "en" ...} } } . 在结构"path": { ... "to": { ..., "lang": "en" ...} } }找到"en" "path": { ... "to": { ..., "lang": "en" ...} } } This will also be robust against newlines in the JSON, spacing variations in "lang": "en" , etc. 对于JSON中的换行符, "lang": "en"等中的间距变化,这也将是可靠的。

sed '/"lang":"en".\{0,89\}$/!d' file > output

在选择的末尾之前添加可能的其他89个字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM