[英]How to erase the last line from CSV file
I've been importing CSVs using pandas, but I keep getting a random extra line every time I try to use it and it causes errors in my code. 我一直在使用熊猫导入CSV,但是每次尝试使用它时,都会不断获得随机的额外行,这会导致代码出错。 How do I completely erase this line?
如何完全擦除此行?
The code I used to import it was: import itertools import copy import networkx as nx import pandas as pd import matplotlib.pyplot as plt import csv 我用来导入的代码是:import itertools import copy import networkx as nx import pandas as pd import matplotlib.pyplot as plt import csv
df3=pd.read_csv(r"U:\\user\edge_list_4.csv")
print(df3)
df4=pd.read_csv(r"U:\\user\nodes_fixed_2.csv")
df4.dropna()
print(df4)
g=nx.Graph()
for i,elrow in df3.iterrows():
g.add_edge(elrow[0], elrow[1], **elrow[2:].to_dict())
# Add node attributes
for i, nlrow in df4.iterrows():
# g.node[nlrow['id']] = nlrow[1:].to_dict() # deprecated after NX 1.11
nx.set_node_attributes(g, {nlrow['ID']: nlrow[1:].to_dict()})
# Node list example
print(nlrow)
# Preview first 5 edges
list(g.edges(data=True))[0:5]
# Preview first 10 nodes
list(g.nodes(data=True))[0:10]
print('# of edges: {}'.format(g.number_of_edges()))
print('# of nodes: {}'.format(g.number_of_nodes()))
# Define node positions data structure (dict) for plotting
for node in g.nodes(data=True):
print(node)
print("")
node_positions = {node[0]: (node[1]['X'], -node[1]['Y']) for node in
g.nodes(data=True)}
My table is a simple ID, X ,Y table. 我的表是一个简单的ID X,Y表。 I've tried using the:
我试过使用:
drop.na()
code, but couldn't seem to take it away. 代码,但似乎无法消除它。 I've tried editing it on Notepad++ and import it as a txt file, but it still keeps appearing.
我试过在Notepad ++上对其进行编辑,并将其作为txt文件导入,但是它仍然不断出现。 Is there any way I should specifically edit the csv file on excel or is there a code I can use?
有什么方法可以在excel上专门编辑csv文件,还是可以使用的代码?
('rep1', {'X': 1, 'Y': 1811})
('rep2', {'X': 2, 'Y': 1811})
('rep3', {'X': 3, 'Y': 1135})
('rep4', {'X': 4, 'Y': 420})
('rep5', {'X': 5, 'Y': 885})
('rep6', {'X': 6, 'Y': 1010})
('rep7', {'X': 7, 'Y': 1010})
('rep8', {'X': 8, 'Y': 1135})
('rep9', {'X': 9, 'Y': 1135})
('rep10', {'X': 10, 'Y': 885})
('rep1 ', {})
The line is only meant to the rep 10. 该行仅用于代表10。
KeyError: 'X'
Try using error_bad_lines option while reading csv file. 读取csv文件时,尝试使用error_bad_lines选项。 Hope it should work.
希望它能工作。
df_csv = pd.read_csv(FILENAME.csv, error_bad_lines=False)
If you always wanted to ignore last line try skipfooter 如果您始终想忽略最后一行,请尝试skipfooter
df_csv = pd.read_csv(FILENAME.csv, skipfooter = 1)
Number of lines at bottom of file to skip (Unsupported with engine='c'). 要跳过的文件底部的行数(不支持engine ='c')。 Documentation
文献资料
Basically you receive a parsing error, because csv lines have some data missing. 基本上,您会收到解析错误,因为csv行缺少一些数据。
Generally, the best way to address this problem would be read a file tolerating missing values. 通常,解决此问题的最佳方法是读取允许丢失值的文件。 For this, your code should filter lines with missing values.
为此,您的代码应过滤缺少值的行。
if 'X' not in line:
# skip the line
Skipping one line is not a perfect solution, it is a data format knowledge that should't be stored in code. 跳过一行并不是一个完美的解决方案,它是一种数据格式知识,不应存储在代码中。 Instead of reading an arbitrary
.csv
file your code will only read a particular kind of files. 您的代码将只读取特定类型的文件,而不是读取任意的
.csv
文件。
You could try to select the column valid elements this way: drop[bool(drop.<column_name>[1]) == True]
. 您可以尝试以这种方式选择列有效元素:
drop[bool(drop.<column_name>[1]) == True]
。 I use the bool cast on the 2nd element of the set, because an empty dict casted to bool is False
. 我使用在集合的第2个元素上强制转换的布尔值 ,因为强制转换为布尔值的空dict 是
False
。
However, it would be better, as akhetos said, to show us more of your code and also your source CSV file. 但是,正如akhetos所说,最好向我们显示更多代码以及源CSV文件。
请阅读skipfooter - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv
df_csv = pd.read_csv(FILENAME.csv, skipfooter = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.