[英]Value error while writing from one csv file to another using pandas
I am writing a code that goes through many csv files in a folder(using a for loop), removes bad data from each csv file(where row values are more than number of columns or sometimes lesser than number of columns).我正在编写一个代码,它遍历文件夹中的许多 csv 文件(使用 for 循环),从每个 csv 文件中删除错误数据(其中行值大于列数,有时小于列数)。 After removing, I rearrange the columns and then I write the useful data into a new csv file.删除后,我重新排列列,然后将有用的数据写入新的 csv 文件。
Here in the code below the for loop is for cycling between different files present in a folder.在下面的代码中,for 循环用于在文件夹中存在的不同文件之间循环。 You can assume the df=pd.read_csv
line as the beginning and assume correct indentation.您可以假定df=pd.read_csv
行作为开头并假定正确的缩进。
import pandas as pd
import os
for filename in os.listdir("csv files copy"):
filenames=os.path.join("csv files copy",filename)
print(filename)
df=pd.read_csv(filenames, error_bad_lines=False)
for row in df:
df.columns=["id","FirstName","LastName","UserName","Phone","IsContact","RestrictionReason","Status","IsScam","Date"]
df = df.drop(labels="Status", axis=1)
df = df.reindex(columns=['id', 'Phone', 'FirstName', 'LastName', 'UserName',"IsContact","IsScam","Date","RestrictionReason"])
df.to_csv(filenames,index=False)
While doing so this is the error I recieve.这样做时,这是我收到的错误。
ValueError: Length mismatch: Expected axis has 9 elements, new values have 10 elements
This is the first 4 values and the header of the dataframe that I am using:这是我使用的 dataframe 的前 4 个值和 header:
id Phone FirstName LastName UserName IsContact IsScam Date RestrictionReason Status
Forex Pips Fire Free NaN Goldenboy NaN Goldenboyys False False 5/7/2022 8:34:07 AM NaN NaN
Forex Pips Fire Free NaN Abu 3odeh NaN oudah12 False False 5/7/2022 8:38:03 AM NaN NaN
Forex Pips Fire Free NaN Rahman Azar Rahman_Azar False False 5/7/2022 8:41:22 AM NaN NaN
Forex Pips Fire Free NaN HUDLE NaN Hudle1051 False False 5/7/2022 8:41:11 AM NaN NaN
And given below is the header of the destination csv file that the above data needs to be entered into下面给出的是目标csv文件的header,上面的数据需要输入
id Phone FirstName LastName UserName IsContact IsScam Date RestrictionReason
You need to remove the for loop as follows:您需要删除 for 循环,如下所示:
import pandas as pd
import os
for filename in os.listdir("csv files copy"):
filenames = os.path.join("csv files copy", filename)
print(filename)
df = pd.read_csv(filenames, error_bad_lines=False)
df.columns = ["id", "FirstName", "LastName", "UserName", "Phone", "IsContact", "RestrictionReason", "Status", "IsScam", "Date"]
df = df.drop(labels="Status", axis=1)
df = df.reindex(columns=["id", "Phone", "FirstName", "LastName", "UserName","IsContact","IsScam","Date","RestrictionReason"])
df.to_csv(filenames, index=False)
This was causing the error and is not needed.这是导致错误的原因,不需要。 The first time through the loop it correctly removes Status
column and saves the CSV file.第一次通过循环时,它正确地删除了Status
列并保存了 CSV 文件。 The second time through the loop (on the same dataframe) it attempts to do df.columns
again but now there is no Status
column, so an incorrect number of columns are given.第二次通过循环(在同一数据帧上)它尝试再次执行df.columns
但现在没有Status
列,因此给出了不正确的列数。
The code for row in df:
would actually iterate over the column names in the dataframe, for row in df:
的代码实际上会遍历 dataframe 中的列名,
eg id
then FirstName
etc.例如id
然后FirstName
等。
Because you give only 9 columns in this line you missed the 'Status'
column因为您在这一行中只给出了 9 列,所以您错过了'Status'
列
df = df.reindex(columns=['id', 'Phone', 'FirstName', 'LastName', 'UserName', 'IsContact', 'IsScam', 'Date', 'RestrictionReason'])
df.to_csv(filenames, index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.