简体   繁体   English

如何仅搜索文本文件中具有特定长度的行

[英]How to search only lines with certain length in text file

I have a large delimited txt file with two columns and about 17 million lines. 我有一个大的带分隔符的txt文件,其中包含两列和大约1700万行。 I have imported it to the database, mistakenly one column in table had shorter size than the text from the file. 我已将其导入数据库,错误地表中的一列大小比文件中的文本短。 ie varchar (4000) instead of varchar (7000) 即varchar(4000)而不是varchar(7000)

about 48 thousands records with longer text has been cut into 40k chars. 约有4.8万条较长文本的记录被切成4万个字符。

How would I replace these without re-importing the files again? 我将如何替换这些而不重新导入文件?

I am thinking if I could be able to filter from txt file only lines with certain length, and remove them, and try to insert update the longer lines. 我在想是否可以从txt文件中仅过滤具有特定长度的行,并将其删除,然后尝试插入更新更长的行。

But How do I select all lines with certain length in a text file? 但是,如何在文本文件中选择具有一定长度的所有行? or which program can do that. 或哪个程序可以做到这一点。

I am using MySQL DB and emEditor for large files text editing. 我正在使用MySQL DB和emEditor进行大文件文本编辑。

Thanks. 谢谢。

Depending a bit on how this is connected to other infrastructure, my guess would be that the easiest and safest way of handeling it is just to drop and reimport the table... 取决于它与其他基础结构的连接方式,我猜想最简单,最安全的处理方法就是删除并重新导入表...

If that for some reason not is an option, I would write a script that goes through the text file, and either just updates the large text field unconditionally or checks if it is of a length that makes an update neccesary (ie > 4000 chars) 如果由于某种原因这不是一个选择,那么我将编写一个遍历文本文件的脚本,然后要么无条件地更新大文本字段,要么检查它的长度是否使更新成为必要(即> 4000个字符)

If there might have been any changes to the table since the data was imported, it is important to check what will be overwritten and that the record really is the one you want to update (depending on how the table is indexed) 如果自导入数据以来对该表进行了任何更改,则重要的是检查将被覆盖的内容,并且该记录确实是您要更新的记录(取决于对表进行索引的方式)

Hope this gave you some starting points. 希望这给您一些起点。

You have my sympathy, been there done it... 你有我的同情,去过那里...

You could also build a query that returns every record where that column contains 4000 bytes. 您还可以构建查询,以返回该列包含4000字节的每条记录。 Which are possibly the ones that were cut when you imported the file. 导入文件时可能剪切了哪些文件。 With that set of records in hand, you could try to find them on the file if you have a line reference on the database table ofc. 有了那组记录,如果数据库表ofc上有行引用,则可以尝试在文件上找到它们。 If there is too many, a simple script could do the trick. 如果太多,一个简单的脚本就可以解决问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM