[英]How to prevent Panda's itertuple method from adding extra decimals to records from a csv file? -Python
我有一個Python函數,該函數讀取csv文件並在一個元組中返回csv中的每一行。
我正在使用Python的Pandas庫來實現此目的。
問題是Pandas返回元組后,它向看起來像整數的記錄追加了一個額外的小數點。 例如1001 becomes 1001.0
樣本csv文件:
key1, key2
a, '1001'
b, '2002'
代碼是這樣的:
import pandas as pd
file_content_df = pd.read_csv(path_to_csv_file)
for each_row in file_content_df.itertuples():
row_item1, row_item2 = each_row
print row_item1 # Prints 'a'
print row_item2 # Prints 1001.0 (Desired result is 1001)
有沒有辦法控制這種行為?
首先,您可以檢查dtypes
列key2
是int
還是float
或object
,然后可以通過each_row[1]
使用第二項,並通過each_row[2]
第三項:
print df
key1 key2
0 a 1001
1 b 2002
print df.dtypes
key1 object
key2 int64
dtype: object
for each_row in df.itertuples():
print each_row
print each_row[1]
print each_row[2]
print '******'
Pandas(Index=0, key1='a', key2=1001)
a
1001
******
Pandas(Index=1, key1='b', key2=2002)
b
2002
******
如果key2
列的dtypes
是object
並且df
像這樣:
print df
key1 key2
0 a '1001'
1 b '2002'
print df.dtypes
key1 object
key2 object
dtype: object
#remove ' and cast to integer
df['key2'] = df['key2'].str.strip("'").astype(int)
print df.dtypes
key1 object
key2 int32
dtype: object
for each_row in df.itertuples():
print each_row
print each_row[1]
print each_row[2]
print '******'
Pandas(Index=0, key1='a', key2=1001)
a
1001
******
Pandas(Index=1, key1='b', key2=2002)
b
2002
******
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.