简体   繁体   English

如何使用熊猫读取csv,追加新数据以及写入新csv

[英]how read csv, append new data, and write to a new csv with pandas

I have not used Pandas before and looks like I need some initial help. 我以前没有使用过Pandas,看起来我需要一些初步帮助。 I could not really find this specific example anywhere. 我在任何地方都找不到真正的特定示例。

I have a csv file, say file1.csv as following: 我有一个csv文件,例如file1.csv,如下所示:

ID     value1     value2
1       100        200
2       101        201

I need to read 1 line at a time from file1.csv, append 2 new column info/data, and then write everything to a new file called file2.csv. 我需要一次从file1.csv中读取1行,追加2个新的列信息/数据,然后将所有内容写入名为file2.csv的新文件中。 file2.csv is supposed to look like following: file2.csv应该如下所示:

ID     value1     value2     value3     value4
1       100        200        10         20
2       101        201        11         21

Can anyone guide or give a short example showing how to do this (reading file1, appending the new data (value3 and value4 columns), and writing it to file2)? 任何人都可以指导或给出一个简短的示例来说明如何执行此操作(读取file1,附加新数据(value3和value4列)并将其写入file2)吗?

ADDENDUM: I need to read 1 line at a time from file1 and write 1 line at a time to file2. 附录:我需要一次从file1读取1行,并一次向file2写1行。

The following will load file1.csv , add in columns 'value3' and 'value4' and output the resulting dataframe as a csv. 以下将加载file1.csv ,在'value3''value4'列中添加并将结果数据帧输出为csv。

import pandas as pd

df = pd.read_csv('file1.csv')
df['value3'] = [10, 11]
df['value4'] = [20, 21]
df.to_csv('file2.csv')

Contents of file1.csv : file1.csv内容:

ID,value1,value2
1,100,200
2,101,201

Contents of file2.csv : file2.csv内容:

,ID,value1,value2,value3,value4
0,1,100,200,10,20
1,2,101,201,11,21

Use read_csv and to_csv . 使用read_csvto_csv Use the index keyword arg in to_csv to keep or remove the index. to_csv使用index关键字arg保留或删除索引。

In [117]: df = pd.read_csv('eg.csv')

In [118]: df
Out[118]:
   col 1  col 2  col 3
0      4      5      6
1      7      8      9

In [119]: df['new col'] = 'data'

In [120]: df
Out[120]:
   col 1  col 2  col 3 new col
0      4      5      6    data
1      7      8      9    data

In [121]: df.to_csv('eg.new.csv')

In [122]: new_df = pd.read_csv('eg.new.csv')      # includes the index

In [123]: new_df
Out[123]:
   Unnamed: 0  col 1  col 2  col 3 new col
0           0      4      5      6    data
1           1      7      8      9    data

In [124]: df.to_csv('eg.new.csv', index=False)    # excludes index

In [125]: new_df = pd.read_csv('eg.new.csv')

In [126]: new_df
Out[126]:
   col 1  col 2  col 3 new col
0      4      5      6    data
1      7      8      9    data

Though there are typically better solutions, like using Dask , changing the dtypes or using categorical variables, one alternative is to simply process the file in chunks. 尽管通常有更好的解决方案,例如使用Dask ,更改dtypes或使用分类变量,但一种替代方法是简单地按块处理文件。

import pandas as pd

# Read one line at at time. Change chunksize to process more lines at a time. 
reader = pd.read_csv('test.csv', chunksize=1)
write_header = True  # Needed to get header for first chunk

for chunk in reader:
    # Do some stuff
    chunk['val3'] = chunk.val1**2
    chunk['val4'] = chunk.val2*4

    # Save the file to a csv, appending each new chunk you process. mode='a' means append.
    chunk.to_csv('final.csv', mode='a', header=write_header, index=False)
    write_header = False  # Update so later chunks don't write header

Sample Data: test.csv 样本数据:test.csv

val1,val2
1,2
3,4
5,6
7,8
9,10
11,12
13,14
15,16

Output: final.csv 输出:final.csv

val1,val2,val3,val4
1,2,1,8
3,4,9,16
5,6,25,24
7,8,49,32
9,10,81,40
11,12,121,48
13,14,169,56
15,16,225,64

Looks like the following code snippet is solving my problem. 看起来以下代码片段正在解决我的问题。 Thanks to @aydow and @Arda Arslan for given inspiration. 感谢@aydow和@Arda Arslan给予的启发。

The following piece of code creates the file2 with header names only, and the rest is empty. 以下代码段仅使用标题名称创建file2,其余为空。

column_names = ['ID', 'value1', 'value2', 'value3', 'value4']
raw_data = {column_names[0]: [], 
            column_names[1]: [],
            column_names[2]: [],
            column_names[3]: [], 
            column_names[4]: []}
df = pd.DataFrame(raw_data, columns = column_names)
df.to_csv("file2.csv", index=False) 

And the following piece of code reads 1 line at a time from file1 and appends it to file2. 下面的代码一次从file1读取1行,并将其追加到file2。

for df in pd.read_csv('file1.csv', chunksize=1):
    df['value3'] = 11
    df['value4'] = 22
    df.to_csv("file2.csv", header=False, index=False, mode='a')

And changing the value of parameter chunksize is helping to change the # rows that you want to read/write at a time. 更改参数chunksize的值有助于更改您想一次读取/写入的#行。 Your improvement comments are more than welcome if you think it can be done more elegantly. 如果您认为可以更优雅地进行改进,那么欢迎您提出改进意见。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何读取 csv 并写入新的 CSV - How to read a csv and write to a new CSV 如何使用熊猫将新行追加到csv文件中? - How to append new row to csv file with pandas? 如何使用 pandas 比较 2 个 csv 并将结果写入新的 csv - How to compare 2 csv and write result to new csv using pandas 在python 2.7中-如何从csv读取数据,重新格式化数据并写入新的csv - In python 2.7 - How to read data from csv, reformat data, and write to new csv 如何使用 python Z3A43B4F889325D94022C0EFA43B4F889325D94022C0EFA2FZ 文件中的 csv 文件中的新列读取文件名和 append 名称 - How to read a file name and append the name to a new column in a csv file using python pandas? 如何读取csv数据,删除空格/制表符并写入新的csv文件? - How to read csv data, strip spaces/tabs and write to new csv file? 如何创建 append 和 pandas dataframe 到 Z628CB5675FF524F3E719B7AA2E88 - How to append a pandas dataframe to a csv and if necessary create new columns? 如何在 pandas dataframe 中将新值 append 保存到 Z628CB5675FF524F3EZFE87B 文件中? - How to append new values in a pandas dataframe and save it to a csv file? Pandas.read_csv(),如何将每个字符读取为新元素 - Pandas.read_csv(), how to read every character as a new element 如何读取csv文件,并根据csv中的数据添加标题和新列并输出新的csv - How to read csv file, and add header and new columns based on data in csv and output new csv
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM