[英]How do I replace a slice of a dataframe column with values from another dataframe column slice?
I have a two dataframes with several columns including a timestamp column.我有两个数据框,其中包含多个列,包括时间戳列。 I would like to copy the first 1000 timestamps from the second dataframe to the first one.
我想将前 1000 个时间戳从第二个数据帧复制到第一个。
df1 = pd.read_csv(file1.csv)
df2 = pd.read_csv(file2.csv)
df1.timestamp.iloc[:1000] = df2.timestamp.iloc[:1000]
I tried various things like adding .copy()
to the right hand side, using .loc[:1000, 'timestamp']
instead of the columnname.iloc syntax, converting the column series into a numpy array first, but I keep getting errors ranging from "too many indexers", to a directive to use .loc[rowindexing, columnindexing] (which doesn't fix the issue), and other error messages.我尝试了各种方法,例如将
.copy()
添加到右侧,使用.loc[:1000, 'timestamp']
而不是 columnname.iloc 语法,首先将列序列转换为 numpy 数组,但我不断收到错误从“太多索引器”到使用 .loc[rowindexing, columnindexing] 的指令(不能解决问题)和其他错误消息。
Use Index.get_loc
for positions of columns by names, so possible pass to DataFrame.iloc
:使用
Index.get_loc
按名称获取列的位置,因此可以传递给DataFrame.iloc
:
s = df2.iloc[:1000, df2.columns.get_loc('timestamp')]
df1.iloc[:1000, df1.columns.get_loc('timestamp')] = s
Or if use DataFrame.loc
with slice index, but working only if length of both DataFrames is greater like 1000
:或者,如果使用带有切片索引的
DataFrame.loc
,但仅当两个 DataFrame 的长度都大于1000
时才有效:
df1.loc[:df1.index[1000], 'timestamp'] = df2.loc[:df2.index[1000], 'timestamp']
I think your solution failed, because different lengths of DataFrames.我认为您的解决方案失败了,因为 DataFrames 的长度不同。
Sample :样品:
df1 = pd.DataFrame({ "timestamp" : [2000, 2001, 2002, 2003, 1990, 1991,
1992, 1993, 1994, 2010, 2011, 2012]})
df2 = pd.DataFrame({
'A':list('abcdef'),
'timestamp':[4,5,4,5,5,4],
})
s = df2.iloc[:1000, df2.columns.get_loc('timestamp')]
df1.iloc[:1000, df1.columns.get_loc('timestamp')] = s
print (df1)
timestamp
0 4.0
1 5.0
2 4.0
3 5.0
4 5.0
5 4.0
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
df1 = pd.DataFrame({ "timestamp" : [2000, 2001, 2002, 2003, 1990, 1991,
1992, 1993, 1994, 2010, 2011, 2012]})
df2 = pd.DataFrame({
'A':list('abcdef'),
'timestamp':[4,5,4,5,5,4],
})
s = df1.iloc[:1000, df1.columns.get_loc('timestamp')]
df2.iloc[:1000, df2.columns.get_loc('timestamp')] = s
print (df2)
A timestamp
0 a 2000
1 b 2001
2 c 2002
3 d 2003
4 e 1990
5 f 1991
Given df1, df2:给定 df1,df2:
df1 = pd.DataFrame({'timestamp': range(0,2000)})
df2 = -df1
using .loc:使用 .loc:
df1.loc[:999,'timestamp'] = df2.loc[:999,'timestamp']
df1.loc[997:1002,'timestamp']
997 -997
998 -998
999 -999
1000 1000
1001 1001
1002 1002
Name: timestamp, dtype: int64
or using iloc (optionally converting loc -> iloc using get_loc
)或使用 iloc (可选地使用 get_loc 转换 loc ->
get_loc
)
df1.iloc[:1000,0] = df2.iloc[:1000,0]
df1.loc[997:1002,'timestamp']
997 -997
998 -998
999 -999
1000 1000
1001 1001
1002 1002
Name: timestamp, dtype: int64
note that the slicing behavior on iloc and loc is differrent.请注意, iloc 和 loc 上的切片行为是不同的。
.loc
includes the right value, .iloc
doesn't include it (like in range) .loc
包含正确的值, .iloc
不包含它(例如在范围内)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.