[英]pandas dataframe issue with special characters
I am struggling with the following issue with pandas data frame Python 2.7.12 pandas 0.18.1 我正在努力解决以下与熊猫数据框架Python 2.7.12 pandas 0.18.1的问题
df = pd.read_csv(file_name, encoding='utf-16', header=0, index_col=False, error_bad_lines=False,
names=['Package_Name','Crash_Report_Date_And_Time','Crash_Report_Millis_Since_Epoch','Device', 'Android_OS_Version','App_Version_Name','App_Version_Code','Exception_Class_Name','Exception_Message','Throwing_File_Name','Throwing_Class_Name','Throwing_Method_Name','Throwing_Line_Number','Native_Crash_Library','Native_Crash_Location','Crash_Link'])
I debug the code and found following data is not inserting properly in the dataframe. 我调试代码,发现以下数据没有在数据帧中正确插入。
There are some special characters in the Exception_Message
field which is telling pandas to move rest of the data on that row to the next row. Exception_Message
字段中有一些特殊字符,它告诉pandas将该行的其余数据移动到下一行。
Some how Pandas is not reading the file properly. 一些Pandas没有正确读取文件。
Following is the output for both the rows. 以下是两行的输出。 137 and 138 are the row numbers. 137和138是行号。
Package_Name Crash_Report_Date_And_Time \
137 com.vcast.mediamanager 2016-09-05 14:54:13
138 NaN Class.java
Crash_Report_Millis_Since_Epoch Device Android_OS_Version \
137 1473087253130 victara 22
138 java.lang.Class classForName -2
App_Version_Name App_Version_Code \
137 14.3.34 1.503050e+09
138 NaN NaN
Exception_Class_Name \
137 java.lang.ClassNotFoundException
138 https://play.google.com/apps/publish?dev_acc=0...
Exception_Message Throwing_File_Name Throwing_Class_Name \
137 Invalid name: com.strumsoft.appen NaN NaN
138 NaN NaN NaN
Throwing_Method_Name Throwing_Line_Number Native_Crash_Library \
137 NaN NaN NaN
138 NaN NaN NaN
Native_Crash_Location Crash_Link account_id
137 NaN NaN NONE
138
NaN NaN NONE
Row 138 is created erroneously with some data from row 137. Exception Message
field in 137 has some value which is breaking that row to the next row. 使用来自行137的一些数据错误地创建行138. 137中的Exception Message
字段具有将该行打破到下一行的某个值。 Which is wrong. 哪个错了。
I tried different encoding nothing helped. 我试过不同的编码没什么帮助。 Can anyone please help? 有人可以帮忙吗?
So the solution was really simple. 所以解决方案非常简单。 In Pandas 0.18 I had to specify the lineterminator='n'
在Pandas 0.18中我必须指定lineterminator='n'
df = pd.read_csv(file_name,lineterminator='\n', encoding='utf-16', delimiter=',', header=0, index_col=False, error_bad_lines=False,...
This simple flag fixed my issue. 这个简单的旗帜解决了我的问题
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.