[英]how read csv, append new data, and write to a new csv with pandas
I have not used Pandas before and looks like I need some initial help. 我以前没有使用过Pandas,看起来我需要一些初步帮助。 I could not really find this specific example anywhere.
我在任何地方都找不到真正的特定示例。
I have a csv file, say file1.csv as following: 我有一个csv文件,例如file1.csv,如下所示:
ID value1 value2
1 100 200
2 101 201
I need to read 1 line at a time from file1.csv, append 2 new column info/data, and then write everything to a new file called file2.csv. 我需要一次从file1.csv中读取1行,追加2个新的列信息/数据,然后将所有内容写入名为file2.csv的新文件中。 file2.csv is supposed to look like following:
file2.csv应该如下所示:
ID value1 value2 value3 value4
1 100 200 10 20
2 101 201 11 21
Can anyone guide or give a short example showing how to do this (reading file1, appending the new data (value3 and value4 columns), and writing it to file2)? 任何人都可以指导或给出一个简短的示例来说明如何执行此操作(读取file1,附加新数据(value3和value4列)并将其写入file2)吗?
ADDENDUM: I need to read 1 line at a time from file1 and write 1 line at a time to file2. 附录:我需要一次从file1读取1行,并一次向file2写1行。
The following will load file1.csv
, add in columns 'value3'
and 'value4'
and output the resulting dataframe as a csv. 以下将加载
file1.csv
,在'value3'
和'value4'
列中添加并将结果数据帧输出为csv。
import pandas as pd
df = pd.read_csv('file1.csv')
df['value3'] = [10, 11]
df['value4'] = [20, 21]
df.to_csv('file2.csv')
Contents of file1.csv
: file1.csv
内容:
ID,value1,value2
1,100,200
2,101,201
Contents of file2.csv
: file2.csv
内容:
,ID,value1,value2,value3,value4
0,1,100,200,10,20
1,2,101,201,11,21
Use read_csv
and to_csv
. 使用
read_csv
和to_csv
。 Use the index
keyword arg in to_csv
to keep or remove the index. 在
to_csv
使用index
关键字arg保留或删除索引。
In [117]: df = pd.read_csv('eg.csv')
In [118]: df
Out[118]:
col 1 col 2 col 3
0 4 5 6
1 7 8 9
In [119]: df['new col'] = 'data'
In [120]: df
Out[120]:
col 1 col 2 col 3 new col
0 4 5 6 data
1 7 8 9 data
In [121]: df.to_csv('eg.new.csv')
In [122]: new_df = pd.read_csv('eg.new.csv') # includes the index
In [123]: new_df
Out[123]:
Unnamed: 0 col 1 col 2 col 3 new col
0 0 4 5 6 data
1 1 7 8 9 data
In [124]: df.to_csv('eg.new.csv', index=False) # excludes index
In [125]: new_df = pd.read_csv('eg.new.csv')
In [126]: new_df
Out[126]:
col 1 col 2 col 3 new col
0 4 5 6 data
1 7 8 9 data
Though there are typically better solutions, like using Dask
, changing the dtypes
or using categorical variables, one alternative is to simply process the file in chunks. 尽管通常有更好的解决方案,例如使用
Dask
,更改dtypes
或使用分类变量,但一种替代方法是简单地按块处理文件。
import pandas as pd
# Read one line at at time. Change chunksize to process more lines at a time.
reader = pd.read_csv('test.csv', chunksize=1)
write_header = True # Needed to get header for first chunk
for chunk in reader:
# Do some stuff
chunk['val3'] = chunk.val1**2
chunk['val4'] = chunk.val2*4
# Save the file to a csv, appending each new chunk you process. mode='a' means append.
chunk.to_csv('final.csv', mode='a', header=write_header, index=False)
write_header = False # Update so later chunks don't write header
val1,val2
1,2
3,4
5,6
7,8
9,10
11,12
13,14
15,16
val1,val2,val3,val4
1,2,1,8
3,4,9,16
5,6,25,24
7,8,49,32
9,10,81,40
11,12,121,48
13,14,169,56
15,16,225,64
Looks like the following code snippet is solving my problem. 看起来以下代码片段正在解决我的问题。 Thanks to @aydow and @Arda Arslan for given inspiration.
感谢@aydow和@Arda Arslan给予的启发。
The following piece of code creates the file2 with header names only, and the rest is empty. 以下代码段仅使用标题名称创建file2,其余为空。
column_names = ['ID', 'value1', 'value2', 'value3', 'value4']
raw_data = {column_names[0]: [],
column_names[1]: [],
column_names[2]: [],
column_names[3]: [],
column_names[4]: []}
df = pd.DataFrame(raw_data, columns = column_names)
df.to_csv("file2.csv", index=False)
And the following piece of code reads 1 line at a time from file1 and appends it to file2. 下面的代码一次从file1读取1行,并将其追加到file2。
for df in pd.read_csv('file1.csv', chunksize=1):
df['value3'] = 11
df['value4'] = 22
df.to_csv("file2.csv", header=False, index=False, mode='a')
And changing the value of parameter chunksize is helping to change the # rows that you want to read/write at a time. 更改参数chunksize的值有助于更改您想一次读取/写入的#行。 Your improvement comments are more than welcome if you think it can be done more elegantly.
如果您认为可以更优雅地进行改进,那么欢迎您提出改进意见。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.