简体   繁体   English

C++ 读外来文字

[英]Reading in exotic characters in C++

I'm working on a program in C++. Part of it reads the columns of a.csv file into vector.我正在研究 C++ 中的一个程序。它的一部分将 a.csv 文件的列读入向量。 Column two contains characters.第二列包含字符。 Most of them are being read correctly, but some (eg "Umlaute" like ä, ö, ü or exotic characters like · ) are not being correctly.他们中的大多数都被正确阅读,但有些(例如“Umlaute”,如 ä、ö、ü 或异国情调的字符,如·)没有被正确阅读。 When they are printed, a stand in character (question mark) appears.打印时,会出现一个反字符(问号)。

What am I doing wrong?我究竟做错了什么?

The program:该程序:

// Bad but (somewhat) fast encryption. //

#include <iostream>  // For printing to command line (e.g. cout).
#include <fstream>  // For file io.
#include <string>  // Working with strings.
#include <vector>  // For the class "vector".
#include <map>  // For the class "dictianory".


// A function to convert a string to a float.
// TODO: Why is this nececarry and I can't just use stof().
// But stof( "17.40" ) gives just 17.
float stringToFloat(std::string x){
  float result= 0.0;  // Initialize "result" as float and 0,0.
  int len = x.length();  // Extract the length of the passed string.
  int dotPosition = 0;  // Position of the decimal point.

  // Iterate over all characters in the string.
  for (int i = 0; i < len; i++){
    if (x[i] == '.' or x[i] == ','){  // Find the decimal deilimieter.
      dotPosition = len - i  - 1;  // Lock its' position.
    } else {  // Haven't reached decimal point yet.
      result = result * 10.0 + (x[i]-'0');  // Multiply according to position.
    }
  }

  // Handle everything behind the decimal point.
  while (dotPosition--){  // Count the post decimal point digits down.
        result /= 10.0;  // Divide according to position.
  }

  return result;  // Return the produced float.
}


// A new struct containing the three important rows,
struct histogramLine {
  std::string place;
  std::string character;
  std::string abundance;
};


// Function for reading a single line of the histogram file.
std::istream& readLine( std::istream& inputStream, histogramLine& x ){
  getline( inputStream, x.place, '\t');
  getline( inputStream, x.character, '\t');
  getline( inputStream, x.abundance, '\n');
  return inputStream;
} 

// Function for creating a cipher_table.csv.
void newTable( std::string charset, std::string file ){

  std::vector<std::string> places;
  std::vector<char> characters;
  std::vector<float> abundances;
  int lineIndex = 0;

  // Read the file.
  std::ifstream csvRead( charset );

  if ( !csvRead.is_open() ){  // For whatever reason the file wasn't opened.
    std::cout << "File " << charset << " couldn't be opened.";
  } else {  // Otherwise …
    for ( histogramLine line; readLine( csvRead, line ); lineIndex++){
      if( lineIndex > 0 ){
        places.push_back(line.place);
        characters.push_back(line.character[0]);
        abundances.push_back(stringToFloat(line.abundance));
      }
    }
  }

}

// Main function.
int main(int argc, char **argv){  // Must have 2 or 0 arguments. Why?

  std::locale::global( std::locale( "de_DE.utf8" ) );

  /* TODO: Understand, why I need to pass argv[1] to a string instead  of
     using it directly. */
  std::string command = argv[1];  // The command given to the programm.

  if ( command == "table" ){  // Creation of a new cipher table.
    if ( argc > 3 ){  // Enough arguments provided.
      newTable( argv[2], argv[3] );  // Call function for creating a new table.
    } else {
      std::cout << "You must provide a alphabet histogram"
                 << " and a file to output.\n";
    }
  }

  return(0);

}

The file beeing read:正在读取的文件:

place   letter  abundance
1.  E   17.40
2.  N   9.78
3.  I   7.55
4.  S   7.27
5.  R   7.00
6.  A   6.51
7.  T   6.15
8.  D   5.08
9.  H   4.76
10. U   4.35
11. L   3.44
12. C   3.06
13. G   3.01
14. M   2.53
15. O   2.51
16. B   1.89
17. W   1.89
18. F   1.66
19. K   1.21
20. Z   1.13
21. P   0.79
22. V   0.67
23. ẞ   0.31
24. J   0.27
25. Y   0.04
26. X   0.03
27. Q   0.02
28. e   17.40
29. n   9.78
30. i   7.55
31. s   7.27
32. r   7.00
33. a   6.51
34. t   6.15
35. d   5.08
36. h   4.76
37. u   4.35
38. l   3.44
39. c   3.06
40. g   3.01
41. m   2.53
42. o   2.51
43. b   1.89
44. w   1.89
45. f   1.66
46. k   1.21
47. z   1.13
48. p   0.79
49. v   0.67
50. ß   0.31
51. j   0.27
52. y   0.04
53. x   0.03
54. q   0.02
55. 1   3
56. 2   14
57. 3   15
58. 4   14
59. 5   20
60. 6   9
61. 7   12
62. 8   3
63. 9   11
64. 0   14
65. !   3
66. "   13
67. §   7
68. $   3
69. %   8
70. &   18
71. /   6
72. (   20
73. )   7
74. =   18
75. ?   10
76. *   3
77. +   7
78. #   7
79. '   9
80. ,   17
81. ;   13
82. .   12
83. -   16
84. _   9
85.     5
86. Ä   6.51
87. Ö   2.51
88. Ü   4.35
89. ä   6.51
90. ö   2.51
91. ü   4.34
92. :   7
93. <   11
94. >   5
95. {   17
96. }   9
97. ^   12
98. ·   13

Thanks Dan M. Declaring the characters as a string instead of a vector did the trick.谢谢 Dan M。将characters声明为字符串而不是向量就可以了。 Also I removed the index and just passed the entire string.我还删除了索引并传递了整个字符串。

I thouht I use char to make it more "memory efficient".我想我使用 char 来使其更“内存高效”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM