简体   繁体   中英

saving entire file in VIM

I have a very large CSV file, over 2.5GB, that, when importing into SQL Server 2005, gives an error message " Column delimiter not found " on a specific line (82,449).

The issue is with double quotes within the text for that column, in this instance, it's a note field that someone wrote " Transferred money to ""MIKE"", Thnks ".

Because the file is so large, I can't open it up in Notepad++ and make the change, which brought me to find VIM.

I am very new to VIM and I reviewed the tutorial document which taught me how to change the file using 82,449 G to find the line, l over to the spot, x the double quotes.

When I save the file using :saveas c:\\Test VIM\\Test.csv , it seems to be a portion of the file. The original file is 2.6GB and the new saved one is 1.1GB. The original file has 9,389,222 rows and the new saved one has 3,751,878. I tried using the G command to get to the bottom of the file before saving, which increased the size quite a bit, but still didn't save the whole file; Before using G , the file was only 230 MB.

Any ideas as to why I'm not saving the entire file?

You really need to use a "stream editor", something similar to sed on Linux, that lets you pipe your text through it, without trying to keep the entire file in memory. In sed I'd do something like:

sed 's/""MIKE""/"MIKE"/' < source_file_to_read > cleaned_file_to_write

There is a sed for Windows .

As a second choice, you could use a programming language like Perl, Python or Ruby, to process the text line by line from a file, writing as it searches for the doubled-quotes, then changing the line in question, and continuing to write until the file has been completely processed.

VIM might be able to load the file, if your machine has enough free RAM, but it'll be a slow process. If it does, you can search from direct mode using:

:/""MIKE""/

and manually remove a doubled-quote, or have VIM make the change automatically using:

:%s/""MIKE""/"MIKE"/g

In either case, write, then close, the file using:

:wq

In VIM, direct mode is the normal state of the editor, and you can get to it using your ESC key.

You can also split the file into smaller more manageable chunks, and then combine it back. Here's a script in bash that can split the file into equal parts:

#!/bin/bash

fspec=the_big_file.csv
num_files=10 # how many mini-files you want

total_lines=$(cat ${fspec} | wc -l)
((lines_per_file = (total_lines+num_files-1) / num_files))
split --lines=${lines_per_file} ${fspec} part.
echo "Total Lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l part.*

I just tested it on a 1GB file with 61151570 lines, and each resulting file was almost 100 MB

Edit:

I just realized you are on Windows, so the above may not apply. You can use a utility like simple text splitter a Windows program which does the same thing.

When you're able to open the file without errors like E342: Out of memory! , you should be able to save the complete file, too. There should at least be an error on :w , a partial save without error is a severe loss of data, and should be reported as a bug, either on the vim_dev mailing list or at http://code.google.com/p/vim/issues/list

Which exact version of Vim are you using? Using GVIM 7.3.600 (32-bit) on Windows 7/x64, I wasn't able to open a 1.9 GB file without out of memory . I was able to successfully open, edit, and save (fully!) a 3.9 GB file with the 64-bit version 7.3.000 from here . If you're not using that native 64-bit version yet, give it a try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM