简体   繁体   English

pandas数据帧问题与特殊字符

[英]pandas dataframe issue with special characters

I am struggling with the following issue with pandas data frame Python 2.7.12 pandas 0.18.1 我正在努力解决以下与熊猫数据框架Python 2.7.12 pandas 0.18.1的问题

df = pd.read_csv(file_name, encoding='utf-16', header=0, index_col=False, error_bad_lines=False,
                     names=['Package_Name','Crash_Report_Date_And_Time','Crash_Report_Millis_Since_Epoch','Device', 'Android_OS_Version','App_Version_Name','App_Version_Code','Exception_Class_Name','Exception_Message','Throwing_File_Name','Throwing_Class_Name','Throwing_Method_Name','Throwing_Line_Number','Native_Crash_Library','Native_Crash_Location','Crash_Link'])

I debug the code and found following data is not inserting properly in the dataframe. 我调试代码,发现以下数据没有在数据帧中正确插入。

There are some special characters in the Exception_Message field which is telling pandas to move rest of the data on that row to the next row. Exception_Message字段中有一些特殊字符,它告诉pandas将该行的其余数据移动到下一行。

Some how Pandas is not reading the file properly. 一些Pandas没有正确读取文件。

Following is the output for both the rows. 以下是两行的输出。 137 and 138 are the row numbers. 137和138是行号。

Package_Name Crash_Report_Date_And_Time  \
137  com.vcast.mediamanager        2016-09-05 14:54:13   
138                     NaN                 Class.java   

    Crash_Report_Millis_Since_Epoch        Device  Android_OS_Version  \
137                   1473087253130       victara                  22   
138                 java.lang.Class  classForName                  -2   

    App_Version_Name  App_Version_Code  \
137          14.3.34      1.503050e+09   
138              NaN               NaN   

                                  Exception_Class_Name  \
137                   java.lang.ClassNotFoundException   
138  https://play.google.com/apps/publish?dev_acc=0...   

                     Exception_Message Throwing_File_Name Throwing_Class_Name  \
137  Invalid name: com.strumsoft.appen                NaN                 NaN   
138                                NaN                NaN                 NaN   

    Throwing_Method_Name  Throwing_Line_Number Native_Crash_Library  \
137                  NaN                   NaN                  NaN   
138                  NaN                   NaN                  NaN   

    Native_Crash_Location Crash_Link account_id  
137                   NaN        NaN       NONE  
138      

         NaN        NaN       NONE  

Row 138 is created erroneously with some data from row 137. Exception Message field in 137 has some value which is breaking that row to the next row. 使用来自行137的一些数据错误地创建行138. 137中的Exception Message字段具有将该行打破到下一行的某个值。 Which is wrong. 哪个错了。

I tried different encoding nothing helped. 我试过不同的编码没什么帮助。 Can anyone please help? 有人可以帮忙吗?

So the solution was really simple. 所以解决方案非常简单。 In Pandas 0.18 I had to specify the lineterminator='n' 在Pandas 0.18中我必须指定lineterminator='n'

df = pd.read_csv(file_name,lineterminator='\n', encoding='utf-16', delimiter=',', header=0, index_col=False, error_bad_lines=False,...

This simple flag fixed my issue. 这个简单的旗帜解决了我的问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM