简体   繁体   English

如何根据在线起始字符收集数据?

[英]How to collect data based on starting character on line?

so I'm trying to find a more time-efficient way to "grep/search" lines which begin with a specific character/set of characters.所以我试图找到一种更省时的方法来“grep/search”以特定字符/字符集开头的行。 I have a 50GB file contained with data sorted via the command LC_ALL='C' sort -u data.txt > data_sorted.txt Then lets say I want to find all lines which begin with horse I would currently do LC_ALL='C' grep -i -E "^horse.*" data_sorted.txt我有一个 50GB 的文件,其中包含通过命令LC_ALL='C' sort -u data.txt > data_sorted.txt的数据然后假设我想找到所有以horse开头的行,我目前会执行LC_ALL='C' grep -i -E "^horse.*" data_sorted.txt

The issue I'm facing with this command is that grep doesn't AUTOMATICALLY see (and jump to) lines which begin with horse instead it greps directly 0-9A-Z or whatever it does.我在使用此命令时面临的问题是 grep 不会自动查看(并跳转到)以horse开头的行,而是直接 grep 0-9A-Z 或它所做的任何事情。 Is there an alternate method of collating data and it jumps specifically to the first character of your search query to quicken things up.是否有另一种整理数据的方法,它会专门跳转到搜索查询的第一个字符以加快速度。

This is kind of hard to explain, apologies for any confusion.这有点难以解释,为任何混淆道歉。

One possible approach is to use look(1) .一种可能的方法是使用look(1) while this normally is used to search the system word list dictionary, you can specify a different file, and it does a binary search for lines matching a given prefix.虽然这通常用于搜索系统单词列表字典,但您可以指定一个不同的文件,它会对匹配给定前缀的行进行二进制搜索。

So you might try:所以你可以试试:

look horse data_sorted.txt

(Some versions of look might require the -b option to do a binary search; consult your local man page) (某些版本的look可能需要-b选项来进行二分搜索;请参阅您当地的手册页)

If you want to do a case-insensitive search like in your grep case, the file has to be sorted in a case-insensitive way ( sort -f ) and look needs the -f option.如果您想像在grep大小写中那样进行不区分大小写的搜索,则必须以不区分大小写的方式( sort -f )对文件进行排序,并且look需要-f选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据字符获取子字符串并从右侧开始读取字符串 - How to get substring based on a character and starting to read the string from the right 如何收集文件中关键字之间的所有数据行 - 从换行符开始+结束 - How to collect all lines of data between keywords in a file - starting+ending at linebreaks 匹配不以特殊字符开头的行中的换行符 - Match line breaks in lines that are not starting with a special character 正则表达式:更改以字符开头的行的部分 - Regular expression: Changing part of a line starting with a character 正则表达式以特定字符开头的行块? - Regex for line block starting with specific character? 如何定义字符串开头不存在的字符? - How to define a character is not there at starting of string? 我如何使用sed -i命令替换从文件中特定行开始到另一行的字符出现? - How can i use sed -i command to replace the occurance of character starting from particular line in a file till another line? 如何根据当前行的第一个字符从上一行中删除换行符? - How to remove newline from previous line based on current line's first character? 正则表达式匹配行以空格开头,第一个字符为非数字 - Regex match line starting with whitespace and first character is non-digit 如何基于正则表达式更改XML与文本(字符数据)的匹配 - How to change XML based on regex matches to text (character data)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM