简体   繁体   English

导入 csv:从第一行的列名中删除文件名

[英]Import csv: remove filename from column names in first row

I am using Python 3.5.我正在使用 Python 3.5。 I have several csv files:我有几个 csv 文件:

The csv files are named according to a fixed structure. csv个文件按照固定结构命名。 They have a fixed prefix (always the same) plus a varying filename part:它们有一个固定的前缀(总是相同的)加上一个不同的文件名部分:

099_2019_01_01_filename1.csv
099_2019_01_01_filename2.csv

My original csv files look like this:我原来的 csv 文件是这样的:

filename1-Streetname filename1-ZIPCODE
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Street1 2012932
Street2 3023923

filename2-Name filename2-Phone
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Name1 2012932
Name2 3023923

I am manipulating these files using the following code (I am reading the csv files from a source folder and writing them to a destination folder. I am skipping certain rows as I do not want to include this information):我正在使用以下代码处理这些文件(我正在从源文件夹读取 csv 文件并将它们写入目标文件夹。我跳过某些行,因为我不想包含此信息):

I cut off the TEXT rows, as I do not need them:我切断了文本行,因为我不需要它们:

import csv
    
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
    with open(os.path.join(sourcefolder,file)) as fp_in:
        reader = csv.reader(fp_in, delimiter=';')
        rows = [row for i, row in enumerate(reader) if i not in skiprows]
        with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
            writer = csv.writer(fp_out)
            writer.writerows(rows)

(this code works) gives (此代码有效)给出

filename1-Streetname filename1-ZIPCODE
Street1 2012932
Street2 3023923

filename2-Name filename2-Phone
Name1 2012932
Name2 3023923

The first row contains the header. In the header names there is always the filename (however without the 099_2019_01_01_ prefix) plus a "-".第一行包含 header。在 header 名称中始终有文件名(但没有 099_2019_01_01_ 前缀)加上一个“-”。 The filename ending.csv is missing.缺少以 .csv 结尾的文件名。 I want to remove this "filename-" for each csv file.我想为每个 csv 文件删除这个“文件名-”。

The core part now is to get the first row and only for this row to perform a replace.现在的核心部分是获取第一行,并且只针对这一行进行替换。 I need to cut off the prefix and the.csv and then perform a general replace.我需要切断前缀和 the.csv,然后执行一般替换。 The first replace could be something like this:第一个替换可能是这样的:

  1. Either I could start with a function to cut off the first n signs, as the length is fixed or我可以从 function 开始切掉前 n 个符号,因为长度是固定的,或者
  2. According to this solution just use string.removeprefix('099_2019_01_01_')根据解决方案,只需使用string.removeprefix('099_2019_01_01_')

As I have Python 3.5 I cannot use removeprefix so I try to just simple replace it.因为我有 Python 3.5 我不能使用 removeprefix 所以我尝试简单地替换它。

string.replace("099_2019_01_01_","") string.replace("099_2019_01_01_","")

Then I need to remove the.csv which is easy:然后我需要删除 .csv 这很简单:

string.replace(".csv","")

I put this together and I get (string.replace("099_2019_01_01_","")).replace(".csv","") .我把它放在一起,我得到(string.replace("099_2019_01_01_","")).replace(".csv","") (Plus at the end the "-" needs to be removed too, see in the code below). (加上末尾的“-”也需要删除,请参见下面的代码)。 I am not sure if this works.我不确定这是否有效。

My main problem is now for this csv import code that I do not know how I can manipulate only the first row when reading/writing the csv. So I want to replace this only in the first row.我现在的主要问题是这个 csv 导入代码,我不知道如何在读/写 csv 时只操作第一行。所以我只想在第一行替换它。 I tried something like this:我试过这样的事情:

import csv
    
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
    with open(os.path.join(sourcefolder,file)) as fp_in:
        reader = csv.reader(fp_in, delimiter=';')
        rows = [row for i, row in enumerate(reader) if i not in skiprows]
        with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
            writer = csv.writer(fp_out)
            rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")
            writer.writerows(rows)

This gives an error as the idea with rows[0] is not working.这给出了一个错误,因为 rows[0] 的想法不起作用。 How can I do this?我怎样才能做到这一点?

(I am not sure if I should try to include this replacing in the code or to put it into a second code which runs after the first code. However, then I would read and write csv files again I assume. So I think it would be most efficient to implement it into this code. Otherwise I need to open and change and save every file again. However, if it is not possible to include it into this code I would be also fine with a code which runs stand-alone and just does the replacing assuming the csv file have the rows 0 as header and then the data comes.) (我不确定我是否应该尝试将此替换包含在代码中或将其放入在第一个代码之后运行的第二个代码中。但是,我假设我会再次读写 csv 个文件。所以我认为它会将它实现到这段代码中是最有效的。否则我需要再次打开、更改和保存每个文件。但是,如果不可能将它包含到这段代码中,我也可以使用独立运行的代码只是假设 csv 文件的第 0 行为 header,然后数据就来了。)

Please note that I do want to go this way with csv and not use pandas.请注意,我确实希望通过这种方式使用 go 和 csv,而不是使用 pandas。

EDIT: At the end the csv files should look like this:编辑:最后 csv 文件应如下所示:

Streetname ZIPCode
Street1 9999
Street2 9848

Name Phone
Name1 23421
Name2 23232

Try by replacing this:尝试替换这个:

rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")

By this in your code:通过这个在你的代码中:

x=file.replace('099_2019_01_01_','').replace('.csv', '')
rows[0]=[i.replace(x+'-', '') for i in rows[0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用第一行中的列名将CSV导入到BigQuery上的现有表? - How to import CSV to an existing table on BigQuery using columns names from first row? Python:加载CSV,第一列作为行名,第一行作为列名 - Python: Load CSV, first column as row names, first row as column names Python:比较CSV文件并保存与第一行的区别(列名) - Python : Compare CSV files and save the difference with first row(Column Names) CSV导入后从列中删除$ - Remove $ from Column after CSV Import 从目录到.csv 列的文件名(每行文件名;无间隙行) - filenames from directory to .csv column (filename per row; no gap rows) 熊猫csv导入第一行 - Pandas csv import first row 从具有指定行名和列名的 csv 文件中提取特定数据 - extract specific data from a csv file with specified row and column names 如果第一列为文本或空白,则删除 csv 中的行,如果第一列为数字,则仅保留该行 - Remove rows in csv if the first column is text or blank, and only keep the row if the first column is number 从多个文件夹导入CSV文件,并将文件名附加为python中的附加列 - Import CSV files from multiple folders and append filename as additional column in python Python-删除从csv读取的数据框中的第一列0 - Python - remove first column of 0 in dataframe read from csv
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM