將文本（.csv）文件中的數字從一種語言環境格式轉換為另一種語言環境格式？

Question

我有一個.csv文件，其中的數字是根據da_DK語言環境格式化的（即使用逗號代替句點作為小數點分隔符等），所以看起來像這樣：

"5000","0,00","5,25", ....

我想使用命令行應用程序一次性轉換文件中的所有數字，因此輸出是“C”（或POSIX）語言環境（即點/句點用作小數分隔符）：

"5000","0.00","5.25", ....

...保持小數位（即“0,00”應轉換為“0.00”，而不是“0”或“0”）並保持所有其他數據/格式不變。

我知道有numfmt ，它應該允許這樣的東西：

$ LC_ALL=en_DK.utf8 numfmt --from=iec --grouping 22123,11
22.123,11

...但是， numfmt只能在單位之間進行轉換，而不能在區域設置之間進行轉換（一旦指定了LC_ALL ，輸入數字也必須符合它，就像輸出一樣）。

我最終喜歡與CSV無關的東西 - 也就是說，可以通過文本文件進行解析，找到與給定輸入語言環境中的數字格式匹配的所有子字符串（即程序將從類似"5000","0,00","5,25","hello"....的字符串中推斷出來） "5000","0,00","5,25","hello"....三個具體的語言環境的數字串5000 ， 0,00和5,25 ），轉換和替換這些子串，並保留一切不變; 但作為替代方案，我還想了解一種支持CSV的方法（即，逐行解析所有字段，然后檢查每個字段的內容是否與特定於語言環境的數字字符串匹配）。

Answer 1

更新：這會將numbers.numbers轉換為數字數字和數字，數字轉換為數字。任何文本的數字：

sed -e 's/\([0-9]\+\)\.\([0-9]\+\)/\1\2/g' -e 's/\([0-9]\+\),\([0-9]\+\)/\1.\2/g'

Orig string: "AO900-020","Hello","World","5000","0,00","5,25","stk","","1","0,00","Test 2","42.234,12","","","0,00","","","","5,25"
Conv string: "AO900-020","Hello","World","5000","0.00","5.25","stk","","1","0.00","Test 2","42234.12","","","0.00","","","","5.25"

（與OP的perl答案相同的例子i / o）

注意：如果您的csv中有任何未加引號的字段，這將非常糟糕。

Answer 2

好吧，我確實找到了一種在Perl中做到這一點的方法，這並不是一件容易的事。 轉換測試字符串的示例（csv-agnostic）腳本粘貼在下面。 最終打印：

Orig string: "AO900-020","Hello","World","5000","0,00","5,25","stk","","1","0,00","Test 2","42.234,12","","","0,00","","","","5,25"
Conv string: "AO900-020","Hello","World","5000","0.00","5.25","stk","","1","0.00","Test 2","42234.12","","","0.00","","","","5.25"

......這基本上是我想要實現的; 但是這里可能存在邊緣情況，這是不可取的。 也許最好使用像csvfix或csvtool這樣的工具，或者直接在代碼中使用Perl csv庫。

不過，這是代碼：

#!/usr/bin/env perl
use warnings;
use strict;
use locale;
use POSIX qw(setlocale locale_h LC_ALL);
use utf8;
use Number::Format qw(:subs); # sudo perl -MCPAN -e 'install Number::Format'
use Data::Dumper;
use Scalar::Util::Numeric qw(isint); # sudo perl -MCPAN -e 'install Scalar::Util::Numeric'

my $old_locale;

# query and save the old locale
$old_locale = setlocale(LC_ALL);

# list of (installed) locales: bash$ locale -a
setlocale(LC_ALL, "POSIX");

# localeconv() returns "a reference to a hash of locale-dependent info"
# dereference here:
#%posixlocalesettings = %{localeconv()};
#print Dumper(\%posixlocalesettings);

# or without dereference:
my $posixlocalesettings = localeconv();
# the $posixlocalesettings has only 'decimal_point' => '.';
# force also thousands_sep to '', else it will be comma later on, and grouping will be made regardless
$posixlocalesettings->{'thousands_sep'} = '';
print Dumper($posixlocalesettings);

#~ my $posixNumFormatter = new Number::Format %args;    
# thankfully, Number::Format seems to accept as argument same kind of hash that localeconv() returns:
my $posixNumFormatter = new Number::Format(%{$posixlocalesettings});
print Dumper($posixNumFormatter);


setlocale(LC_ALL, "en_DK.utf8");
my $dklocalesettings = localeconv();
print Dumper($dklocalesettings);

# Get some of locale's numeric formatting parameters
my ($thousands_sep, $decimal_point, $grouping) =
#        @{localeconv()}{'thousands_sep', 'decimal_point', 'grouping'};
        @{$dklocalesettings}{'thousands_sep', 'decimal_point', 'grouping'};

# grouping and mon_grouping are packed lists
# of small integers (characters) telling the
# grouping (thousand_seps and mon_thousand_seps
# being the group dividers) of numbers and
# monetary quantities.  The integers’ meanings:
# 255 means no more grouping, 0 means repeat
# the previous grouping, 1-254 means use that
# as the current grouping.  Grouping goes from
# right to left (low to high digits).  In the
# below we cheat slightly by never using anything
# else than the first grouping (whatever that is).
my @grouping = unpack("C*", $grouping);
print "en_DK.utf8: thousands_sep $thousands_sep; decimal_point $decimal_point; grouping " .join(", ", @grouping). "\n";

my $inputCSVString = '"AO900-020","Hello","World","5000","0,00","5,25","stk","","1","0,00","Test 2","42.234,12","","","0,00","","","","5,25"';

# Character set modifiers
# /d, /u , /a , and /l , available starting in 5.14, are called the character set modifiers;
#  /l sets the character set to that of whatever Locale is in effect at the time of the execution of the pattern match.

while ($inputCSVString =~ m/[[:digit:]]+/gl) { # doesn't take locale in account
  print "A Found '$&'.  Next attempt at character " . (pos($inputCSVString)+1) . "\n";
}

print "----------\n";


#~ while ($inputCSVString =~ m/(\d{$grouping[0]}($|$thousands_sep))+/gl) {
#~ while ($inputCSVString =~ m/(\d)(\d{$grouping[0]}($|$thousands_sep))+/gl) {
# match a string that starts with digit, and contains only digits, thousands separators and decimal points
# note - it will NOT match negative numbers
while ($inputCSVString =~ m/\d[\d$thousands_sep$decimal_point]+/gl) {
  my $numstrmatch = $&;
  my $unnumstr = unformat_number($numstrmatch); # should unformat according to current locale ()
  my $posixnumstr = $posixNumFormatter->format_number($unnumstr);
  print "B Found '$numstrmatch' (unf: '$unnumstr', form: '$posixnumstr').  Next attempt at character " . (pos($inputCSVString)+1) . "\n";
}

sub convertNumStr{
  my $numstrmatch = $_[0];
  my $unnumstr = unformat_number($numstrmatch);
  # if an integer, return as is so it doesn't change trailing zeroes, if the number is a label
  if ( (isint $unnumstr) && ( $numstrmatch !~ m/$decimal_point_dk/) ) { return $numstrmatch; }
  #~ print "--- $unnumstr\n";
  # find the length of the string after the decimal point - the precision
  my $precision_strlen = length( substr( $numstrmatch, index($numstrmatch, $decimal_point_dk)+1 ) );
  # must manually spec precision and trailing zeroes here:
  my $posixnumstr = $posixNumFormatter->format_number($unnumstr, $precision_strlen, 1);
  return $posixnumstr;
}

# e modifier to evaluate perl Code
(my $replaceString = $inputCSVString) =~ s/(\d[\d$thousands_sep$decimal_point]+)/"".convertNumStr($1).""/gle;

print "Orig string: " . $inputCSVString . "\n";
print "Conv string: " . $replaceString . "\n";

將文本（.csv）文件中的數字從一種語言環境格式轉換為另一種語言環境格式？

問題描述

2 個解決方案

解決方案1
0 2016-09-27 20:10:06

解決方案2
0 2016-10-06 13:09:30

將文本（.csv）文件中的數字從一種語言環境格式轉換為另一種語言環境格式？

問題描述

2 個解決方案

解決方案1 0 2016-09-27 20:10:06

解決方案2 0 2016-10-06 13:09:30

解決方案1
0 2016-09-27 20:10:06

解決方案2
0 2016-10-06 13:09:30