简体   繁体   English

如何从CSV文件中删除最后一行

[英]How to erase the last line from CSV file

I've been importing CSVs using pandas, but I keep getting a random extra line every time I try to use it and it causes errors in my code. 我一直在使用熊猫导入CSV,但是每次尝试使用它时,都会不断获得随机的额外行,这会导致代码出错。 How do I completely erase this line? 如何完全擦除此行?

The code I used to import it was: import itertools import copy import networkx as nx import pandas as pd import matplotlib.pyplot as plt import csv 我用来导入的代码是:import itertools import copy import networkx as nx import pandas as pd import matplotlib.pyplot as plt import csv

df3=pd.read_csv(r"U:\\user\edge_list_4.csv")
print(df3)

df4=pd.read_csv(r"U:\\user\nodes_fixed_2.csv")
df4.dropna() 
print(df4)


g=nx.Graph()

for i,elrow in df3.iterrows():
    g.add_edge(elrow[0], elrow[1], **elrow[2:].to_dict())


# Add node attributes
for i, nlrow in df4.iterrows():
# g.node[nlrow['id']] = nlrow[1:].to_dict()  # deprecated after NX 1.11
nx.set_node_attributes(g, {nlrow['ID']:  nlrow[1:].to_dict()}) 

# Node list example
print(nlrow)

# Preview first 5 edges

list(g.edges(data=True))[0:5] 

# Preview first 10 nodes

list(g.nodes(data=True))[0:10] 

print('# of edges: {}'.format(g.number_of_edges()))
print('# of nodes: {}'.format(g.number_of_nodes()))

# Define node positions data structure (dict) for plotting
for node in g.nodes(data=True):
print(node)
print("")
node_positions = {node[0]: (node[1]['X'], -node[1]['Y']) for node in 
g.nodes(data=True)}

My table is a simple ID, X ,Y table. 我的表是一个简单的ID X,Y表。 I've tried using the: 我试过使用:

drop.na() 

code, but couldn't seem to take it away. 代码,但似乎无法消除它。 I've tried editing it on Notepad++ and import it as a txt file, but it still keeps appearing. 我试过在Notepad ++上对其进行编辑,并将其作为txt文件导入,但是它仍然不断出现。 Is there any way I should specifically edit the csv file on excel or is there a code I can use? 有什么方法可以在excel上专门编辑csv文件,还是可以使用的代码?

('rep1', {'X': 1, 'Y': 1811})

('rep2', {'X': 2, 'Y': 1811})

('rep3', {'X': 3, 'Y': 1135})

('rep4', {'X': 4, 'Y': 420})

('rep5', {'X': 5, 'Y': 885})

('rep6', {'X': 6, 'Y': 1010})

('rep7', {'X': 7, 'Y': 1010})

('rep8', {'X': 8, 'Y': 1135})

('rep9', {'X': 9, 'Y': 1135})

('rep10', {'X': 10, 'Y': 885})

('rep1 ', {})

The line is only meant to the rep 10. 该行仅用于代表10。

KeyError: 'X'

Try using error_bad_lines option while reading csv file. 读取csv文件时,尝试使用error_bad_lines选项。 Hope it should work. 希望它能工作。

df_csv = pd.read_csv(FILENAME.csv, error_bad_lines=False)

If you always wanted to ignore last line try skipfooter 如果您始终想忽略最后一行,请尝试skipfooter

df_csv = pd.read_csv(FILENAME.csv, skipfooter = 1)

Number of lines at bottom of file to skip (Unsupported with engine='c'). 要跳过的文件底部的行数(不支持engine ='c')。 Documentation 文献资料

Basically you receive a parsing error, because csv lines have some data missing. 基本上,您会收到解析错误,因为csv行缺少一些数据。

Generally, the best way to address this problem would be read a file tolerating missing values. 通常,解决此问题的最佳方法是读取允许丢失值的文件。 For this, your code should filter lines with missing values. 为此,您的代码应过滤缺少值的行。

if 'X' not in line:
    # skip the line

Skipping one line is not a perfect solution, it is a data format knowledge that should't be stored in code. 跳过一行并不是一个完美的解决方案,它是一种数据格式知识,不应存储在代码中。 Instead of reading an arbitrary .csv file your code will only read a particular kind of files. 您的代码将只读取特定类型的文件,而不是读取任意的.csv文件。

You could try to select the column valid elements this way: drop[bool(drop.<column_name>[1]) == True] . 您可以尝试以这种方式选择列有效元素: drop[bool(drop.<column_name>[1]) == True] I use the bool cast on the 2nd element of the set, because an empty dict casted to bool is False . 我使用在集合的第2个元素上强制转换的布尔值 ,因为强制转换为布尔值的空dict False

However, it would be better, as akhetos said, to show us more of your code and also your source CSV file. 但是,正如akhetos所说,最好向我们显示更多代码以及源CSV文件。

请阅读skipfooter - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv

df_csv = pd.read_csv(FILENAME.csv, skipfooter = 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM