[英]Pandas using loc for assignment in a Multi Index DataFrame
我已經初始化了一個這樣的數據框:
df = pd.DataFrame(columns=["stockname","timestamp","price","volume"])
df.timestamp = pd.to_datetime(df.timestamp, format = "%Y-%m-%d %H:%M:%S:%f")
df.set_index(['stockname', 'timestamp'], inplace = True)
現在我從某個地方獲取數據流,但為了程序,讓我這樣寫:
filehandle = open("datasource")
for line in filehandle:
line = line.rstrip()
data = line.split(",")
stockname = data[4]
price = float(data[3])
timestamp = pd.to_datetime(data[0], format = "%Y-%m-%d %H:%M:%S:%f")
volume = int(data[6])
df.loc[stockname, timestamp] = [price, volume]
filehandle.close()
print df
但這給出了錯誤:
ValueError:無法使用長度與值不同的多索引選擇索引器進行設置
指定您要為其分配數據的列名稱,即
df = pd.DataFrame(columns=["a","b","c","d"])
df.set_index(['a', 'b'], inplace = True)
df.loc[('3','4'),['c','d']] = [4,5]
df.loc[('4','4'),['c','d']] = [3,1]
c d
a b
3 4 4.0 5.0
4 4 3.0 1.0
此外,如果您有一個逗號分隔的文件,那么您可以使用read_csv
即:
import io
import pandas as pd
st = '''2017-12-08 15:29:58:740657,245.0,426001,248.65,APPL,190342,2075673,249.35,244.2
2017-12-08 16:29:58:740657,245.0,426001,248.65,GOOGL,190342,2075673,249.35,244.2
2017-12-08 18:29:58:740657,245.0,426001,248.65,GOOGL,190342,2075673,249.35,244.2
'''
#instead of `io`, add the source name
df = pd.read_csv(io.StringIO(st),header=None)
# Now set the index and select what you want
df.set_index([0,4])[[1,7]]
1 7
0 4
2017-12-08 15:29:58.740657 APPL 245.0 249.35
2017-12-08 16:29:58.740657 GOOGL 245.0 249.35
2017-12-08 18:29:58.740657 GOOGL 245.0 249.35
我認為你正在尋找的是:
df.loc[a,b,:] = [c,d]
這是您的數據框的示例:
for i in range(3):
for j in range(3):
df.loc[(str(i),str(j)),:] = [i,j]
輸出:
c d
a b
0 0 0 0
1 0 1
2 0 2
1 0 1 0
1 1 1
2 1 2
2 0 2 0
1 2 1
2 2 2
您可能想使用df.at[index, column_name] = value
來逃避此錯誤
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.