简体   繁体   中英

Perform character operations with shell commands

I need to convert a CSV file exported from Mac Excel 2011 to an importable format recognized by a CMS (the solution should not be related, however the import format is for Drupal Feeds module, although the target).

In order to do this currently I need to perform the following operations in Vim:

:%s/\r/\r/g
:w ++enc=utf8

Which basically means:

  1. Convert carriage returns to some sort of universal format
    • Initially as Excel exports them, the carriage return character is represented by ^M
    • the Vim command :%s/\\r/\\r/g converts them all to a format the CMS recognizes as a carriage return
  2. Convert the character encoding to UTF8.
    • As exported initially, the character set is ASCII Extended or something similar.

Ideally this process will need to be triggered upon uploading the file as part of the import, which means PHP will trigger the process, whether that has any bearing on the process. However I feel more comfortable at this point handling the solution as a shell script or something similar, but of course PHP solutions are welcome if I can figure out how to hook it into Drupal 7 Feeds.

Some untested code:

#!/bin/php
<?php

$replacements = array(
    // Adjust destination char to your liking
    "\r\n" => "\n",
    "\r" => "\n",
    "\n" => "\n",
);

// No risk to split chars: input is single byte
while( $line = fread(STDIN, 10240) ){
    // Normalize line feeds
    $line = strtr($line, $replacements);

    // Convert to UTF-8 (adjust source encoding to your needs)
    $line = iconv('CP1252', 'UTF-8', $line);

    fwrite(STDOUT, $line);
}

Usage:

./fix-csv < input.csv > output.csv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM