简体   繁体   English

使用python pandas重命名csv中的列标题

[英]Rename a column header in csv using python pandas

I have some giant CSV files - like 23 GB size - in which i want to accomplish this with their column headers - 我有一些巨型CSV文件(例如23 GB大小),在其中我要使用其列标题来完成此操作-

If there is a column name SFID, perform this - Rename column "Id" to "IgnoreId" Rename column "SFID" to "Id" else- Do nothing 如果列名称为SFID,请执行以下操作-将列“ Id”重命名为“ IgnoreId”,将列“ SFID”重命名为“ Id”,否则-不执行任何操作

All the google search results i see are about how to import the csv in a dataframe, rename the column, export it back into a csv. 我看到的所有Google搜索结果都是关于如何在数据框中导入csv,重命名该列,将其导出回csv的信息。

To me it feels like giant waste of time/memory, because we are effectively just working with very first row of the CSV file (which represents headers). 对我来说,这感觉像是在浪费时间/内存,因为我们实际上只是在处理CSV文件的第一行(代表标题)。 I dont know if it is necessary to load whole csv as dataframe and export to a new csv (or export it to same csv, effectively overwriting it). 我不知道是否有必要将整个csv作为数据帧加载并导出到新的csv(或将其导出到相同的csv,有效地覆盖它)。

Being huge CSVs, i have to load them in small chunksize and perform the operation which takes time and memory. 作为巨大的CSV,我必须以小块大小加载它们并执行需要时间和内存的操作。 Again, feels liek waste of memory becuase apart from the headers, we are not really doing anything with remaining chunksizes 再说一次,除了头文件之外,还有一点浪费的内存,因为我们实际上并没有对剩余的块大小做任何事情

Is there a way i just load up header of a csv file, make changes to headers, and save it back into same csv file? 有没有办法我只加载一个csv文件的标头,更改标头,然后将其保存回同一csv文件中?

I am open to ideas of using something other that pandas as well. 我愿意使用熊猫以外的其他东西。 Only real constraint is that CSV files are too big to just double click and open. 唯一真正的限制是CSV文件太大而无法双击并打开。

Write the header row first and copy the data rows using shutil.copyfileobj 首先写标题行,然后使用shutil.copyfileobj复制数据行

shutil.copyfileobj took 38 seconds for a 0.5 GB file whereas fileinput took 125 seconds for the same. shutil.copyfileobj用了38秒,0.5 GB的文件而的FileInput了125秒对于相同。

Using shutil.copyfileobj 使用shutil.copyfileobj

df = pd.read_csv(filename, nrows=0) # read only the header row
if 'SFID' in df.columns:
    # rename columns
    df.rename(columns = {"Id": "IgnoreId", "SFID":"Id"}, inplace = True)
    # construct new header row
    header_row = ','.join(df.columns) + "\n"
    # modify header in csv file
    with open(filename, "r+") as f1, open(filename, "r+") as f2:
        f1.readline() # to move the pointer after header row
        f2.write(header_row)
        shutil.copyfileobj(f1, f2) # copies the data rows

Using fileinput 使用文件输入

if 'SFID' in df.columns:
    # rename columns
    df.rename(columns = {"Id": "IgnoreId", "SFID":"Id"}, inplace = True)
    # construct new header row
    header_row = ','.join(df.columns)
    # modify header in csv file
    f = fileinput.input(filename, inplace=True)
    for line in f:
        if fileinput.isfirstline():
            print(header_row)
        else:
            print(line, end = '')
    f.close()

For huge file a simple command line solution with the stream editor sed might be faster than a python script: 对于大文件,使用流编辑器sed的简单命令行解决方案可能比python脚本快:

sed -e '1 {/SFID/ {s/Id/IgnoreId/; s/SFID/Id/}}' -i myfile.csv

This changes Id to IgnoreId and SFID to Id in the first line if it contains SFID . 如果包含SFID ,则在第一行IgnoreId Id更改为IgnoreId ,将SFID更改为Id If other column header also contain the string Id (eg ImportantId ) then you'll have to refine the regexes in the s command accordingly. 如果其他列标题也包含字符串Id (例如ImportantId ),则必须相应地在s命令中优化正则表达式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 将 header 重命名为 csv 文件并在该列上添加数值? - How to rename a header to a csv file using python and adding numeric value on that column? 使用python将多个CSV Header重命名为标准标题 - Rename multiple CSV Header to standard header using python 重命名 pandas 中的多行 header 列 - Rename multiline header column in pandas 如何使用Pandas中的索引编号重命名列标题 - How to rename column header using index numbering in Pandas Python/Pandas - 如何在不丢失列标题中的现有数据的情况下重命名 DataFrame 中的列标题? - Python/Pandas - How do I rename a column header in a DataFrame, without losing the existing data within the column header? Python 中的 Header 列重命名循环 - Column Header rename loop in Python 如何使用 Z3A43B4F88325D94022C0EFA 库在 python 的 2 列 CSV 文件上更改 header 而不创建新的 C9 文件? - How do I change the header on a 2 column CSV file in python using the pandas library without creating a new file? 重命名 Pandas Python 中的列不起作用 - Rename column in Pandas Python not working 使用 Python 重命名 Excel 文件中的标题列名称? - Rename header column names in Excel files using Python? How to read every column of a csv file in python after every 10-15 rows which have the same header using pandas or csv? - How to read every column of a csv file in python after every 10-15 rows which have the same header using pandas or csv?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM