[英]Reading in exotic characters in C++
I'm working on a program in C++. Part of it reads the columns of a.csv file into vector.我正在研究 C++ 中的一个程序。它的一部分将 a.csv 文件的列读入向量。 Column two contains characters.第二列包含字符。 Most of them are being read correctly, but some (eg "Umlaute" like ä, ö, ü or exotic characters like · ) are not being correctly.他们中的大多数都被正确阅读,但有些(例如“Umlaute”,如 ä、ö、ü 或异国情调的字符,如·)没有被正确阅读。 When they are printed, a stand in character (question mark) appears.打印时,会出现一个反字符(问号)。
What am I doing wrong?我究竟做错了什么?
The program:该程序:
// Bad but (somewhat) fast encryption. //
#include <iostream> // For printing to command line (e.g. cout).
#include <fstream> // For file io.
#include <string> // Working with strings.
#include <vector> // For the class "vector".
#include <map> // For the class "dictianory".
// A function to convert a string to a float.
// TODO: Why is this nececarry and I can't just use stof().
// But stof( "17.40" ) gives just 17.
float stringToFloat(std::string x){
float result= 0.0; // Initialize "result" as float and 0,0.
int len = x.length(); // Extract the length of the passed string.
int dotPosition = 0; // Position of the decimal point.
// Iterate over all characters in the string.
for (int i = 0; i < len; i++){
if (x[i] == '.' or x[i] == ','){ // Find the decimal deilimieter.
dotPosition = len - i - 1; // Lock its' position.
} else { // Haven't reached decimal point yet.
result = result * 10.0 + (x[i]-'0'); // Multiply according to position.
}
}
// Handle everything behind the decimal point.
while (dotPosition--){ // Count the post decimal point digits down.
result /= 10.0; // Divide according to position.
}
return result; // Return the produced float.
}
// A new struct containing the three important rows,
struct histogramLine {
std::string place;
std::string character;
std::string abundance;
};
// Function for reading a single line of the histogram file.
std::istream& readLine( std::istream& inputStream, histogramLine& x ){
getline( inputStream, x.place, '\t');
getline( inputStream, x.character, '\t');
getline( inputStream, x.abundance, '\n');
return inputStream;
}
// Function for creating a cipher_table.csv.
void newTable( std::string charset, std::string file ){
std::vector<std::string> places;
std::vector<char> characters;
std::vector<float> abundances;
int lineIndex = 0;
// Read the file.
std::ifstream csvRead( charset );
if ( !csvRead.is_open() ){ // For whatever reason the file wasn't opened.
std::cout << "File " << charset << " couldn't be opened.";
} else { // Otherwise …
for ( histogramLine line; readLine( csvRead, line ); lineIndex++){
if( lineIndex > 0 ){
places.push_back(line.place);
characters.push_back(line.character[0]);
abundances.push_back(stringToFloat(line.abundance));
}
}
}
}
// Main function.
int main(int argc, char **argv){ // Must have 2 or 0 arguments. Why?
std::locale::global( std::locale( "de_DE.utf8" ) );
/* TODO: Understand, why I need to pass argv[1] to a string instead of
using it directly. */
std::string command = argv[1]; // The command given to the programm.
if ( command == "table" ){ // Creation of a new cipher table.
if ( argc > 3 ){ // Enough arguments provided.
newTable( argv[2], argv[3] ); // Call function for creating a new table.
} else {
std::cout << "You must provide a alphabet histogram"
<< " and a file to output.\n";
}
}
return(0);
}
The file beeing read:正在读取的文件:
place letter abundance
1. E 17.40
2. N 9.78
3. I 7.55
4. S 7.27
5. R 7.00
6. A 6.51
7. T 6.15
8. D 5.08
9. H 4.76
10. U 4.35
11. L 3.44
12. C 3.06
13. G 3.01
14. M 2.53
15. O 2.51
16. B 1.89
17. W 1.89
18. F 1.66
19. K 1.21
20. Z 1.13
21. P 0.79
22. V 0.67
23. ẞ 0.31
24. J 0.27
25. Y 0.04
26. X 0.03
27. Q 0.02
28. e 17.40
29. n 9.78
30. i 7.55
31. s 7.27
32. r 7.00
33. a 6.51
34. t 6.15
35. d 5.08
36. h 4.76
37. u 4.35
38. l 3.44
39. c 3.06
40. g 3.01
41. m 2.53
42. o 2.51
43. b 1.89
44. w 1.89
45. f 1.66
46. k 1.21
47. z 1.13
48. p 0.79
49. v 0.67
50. ß 0.31
51. j 0.27
52. y 0.04
53. x 0.03
54. q 0.02
55. 1 3
56. 2 14
57. 3 15
58. 4 14
59. 5 20
60. 6 9
61. 7 12
62. 8 3
63. 9 11
64. 0 14
65. ! 3
66. " 13
67. § 7
68. $ 3
69. % 8
70. & 18
71. / 6
72. ( 20
73. ) 7
74. = 18
75. ? 10
76. * 3
77. + 7
78. # 7
79. ' 9
80. , 17
81. ; 13
82. . 12
83. - 16
84. _ 9
85. 5
86. Ä 6.51
87. Ö 2.51
88. Ü 4.35
89. ä 6.51
90. ö 2.51
91. ü 4.34
92. : 7
93. < 11
94. > 5
95. { 17
96. } 9
97. ^ 12
98. · 13
Thanks Dan M. Declaring the characters
as a string instead of a vector did the trick.谢谢 Dan M。将characters
声明为字符串而不是向量就可以了。 Also I removed the index and just passed the entire string.我还删除了索引并传递了整个字符串。
I thouht I use char to make it more "memory efficient".我想我使用 char 来使其更“内存高效”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.