简体   繁体   English

grep 仅前 n 行

[英]grep first n lines only

I'm facing a problem greping the right date within a letter as a document.我在将一封信中的正确日期作为文档时遇到了问题。 Reason is to grep the date of document creation and not any further date within the text.原因是 grep 是文档创建日期,而不是文本中的任何进一步日期。

Usaly the dokument hold information about the company, my address, customer number, bill number.... and the date by when it was created.通常,文档包含有关公司、我的地址、客户编号、帐单编号......以及创建日期的信息。

Mayby a greeting and/or text maybe within dates again.可能会再次在日期内打招呼和/或文本。

Often the date at begin of the document has different look as following.文档开头的日期通常如下所示。

  1. December 1999 instead of 3.12.1999 as example.例如 1999 年 12 月而不是 3.12.1999。

If I grep the date in case of pattern如果我 grep 日期在模式的情况下

'(([0-9][0-9]{,1}\.)\s+('Januar'|'Februar'|'März'|'April'|'Mai'|'Juni'|'Juli'|'August'|'September'|'Oktober'|'November'|'Dezember')\s+([1-9][0-9][0-9][0-9]{1,}))'

sometimes get the wrong date as creation date.有时会得到错误的日期作为创建日期。 Reason is the different writing of dates in the documents.原因是文件中日期的不同写法。 Example 1 is what I usualy get and it works fine as I search for the date (creation date) with correct pattern.示例 1 是我通常得到的,当我搜索具有正确模式的日期(创建日期)时,它工作正常。 Example 2 is in problem as I get a date, but it's NOT creation date which would be the 1st date.示例 2 有问题,因为我得到了一个日期,但它不是第一个日期的创建日期。 I get instead another date matching the pattern out from the text.相反,我从文本中得到另一个与模式匹配的日期。

Example 1示例 1 示例 1

Example 2示例 2 示例 2

I could use different pattern '(([0-9][0-9]{,1}\.)([0-9][0-9]{,1}\.)([1-9][0-9][0-9][0-9]{1,}))' grepping the correct date in example 2 but then I would get same issue for example 1.我可以使用不同的模式'(([0-9][0-9]{,1}\.)([0-9][0-9]{,1}\.)([1-9][0-9][0-9][0-9]{1,}))' grepping 示例 2 中的正确日期,但随后我会遇到与示例 1 相同的问题。

My idea was to search in first n lines only if pattern match take the date otherwise use different pattern.我的想法是仅在模式匹配取日期时才在前 n 行中搜索,否则使用不同的模式。 I don't get the rule for pdfgrep using the first n lines only what would give me the possibility to use different pattern.我没有得到使用前 n 行的pdfgrep规则,只有这让我有可能使用不同的模式。

Has anybody an idea how to fix it?有人知道如何解决吗?

Cheers, bdream干杯,梦想

With GNU grep:使用 GNU grep:

-m NUM : Stop reading a file after NUM matching lines. -m NUM :在 NUM 个匹配行之后停止读取文件。

Alternatively to GNU grep learn to use GNU gawk , specifically designed for such tasks.或者GNU grep学习使用GNU gawk ,专为此类任务而设计。

Consider also learning python or GNU guile (then read SICP ).还可以考虑学习pythonGNU guile (然后阅读SICP )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM