简体   繁体   English

如何用另一个数据框列切片中的值替换数据框列的切片?

[英]How do I replace a slice of a dataframe column with values from another dataframe column slice?

I have a two dataframes with several columns including a timestamp column.我有两个数据框,其中包含多个列,包括时间戳列。 I would like to copy the first 1000 timestamps from the second dataframe to the first one.我想将前 1000 个时间戳从第二个数据帧复制到第一个。

df1 = pd.read_csv(file1.csv)
df2 = pd.read_csv(file2.csv)
df1.timestamp.iloc[:1000] = df2.timestamp.iloc[:1000]  

I tried various things like adding .copy() to the right hand side, using .loc[:1000, 'timestamp'] instead of the columnname.iloc syntax, converting the column series into a numpy array first, but I keep getting errors ranging from "too many indexers", to a directive to use .loc[rowindexing, columnindexing] (which doesn't fix the issue), and other error messages.我尝试了各种方法,例如将.copy()添加到右侧,使用.loc[:1000, 'timestamp']而不是 columnname.iloc 语法,首先将列序列转换为 numpy 数组,但我不断收到错误从“太多索引器”到使用 .loc[rowindexing, columnindexing] 的指令(不能解决问题)和其他错误消息。

Use Index.get_loc for positions of columns by names, so possible pass to DataFrame.iloc :使用Index.get_loc按名称获取列的位置,因此可以传递给DataFrame.iloc

s = df2.iloc[:1000, df2.columns.get_loc('timestamp')]  
df1.iloc[:1000, df1.columns.get_loc('timestamp')] = s

Or if use DataFrame.loc with slice index, but working only if length of both DataFrames is greater like 1000 :或者,如果使用带有切片索引的DataFrame.loc ,但仅当两个 DataFrame 的长度都大于1000时才有效:

df1.loc[:df1.index[1000], 'timestamp'] = df2.loc[:df2.index[1000], 'timestamp']

I think your solution failed, because different lengths of DataFrames.我认为您的解决方案失败了,因为 DataFrames 的长度不同。

Sample :样品

df1 = pd.DataFrame({ "timestamp" : [2000, 2001, 2002, 2003, 1990, 1991,
                                    1992, 1993, 1994, 2010, 2011, 2012]})
df2 = pd.DataFrame({
        'A':list('abcdef'),
         'timestamp':[4,5,4,5,5,4],
})

s = df2.iloc[:1000, df2.columns.get_loc('timestamp')]  
df1.iloc[:1000, df1.columns.get_loc('timestamp')] = s
print (df1)
    timestamp
0         4.0
1         5.0
2         4.0
3         5.0
4         5.0
5         4.0
6         NaN
7         NaN
8         NaN
9         NaN
10        NaN
11        NaN

df1 = pd.DataFrame({ "timestamp" : [2000, 2001, 2002, 2003, 1990, 1991,
                                    1992, 1993, 1994, 2010, 2011, 2012]})
df2 = pd.DataFrame({
        'A':list('abcdef'),
         'timestamp':[4,5,4,5,5,4],
})

s = df1.iloc[:1000, df1.columns.get_loc('timestamp')]  
df2.iloc[:1000, df2.columns.get_loc('timestamp')] = s

print (df2)
   A  timestamp
0  a       2000
1  b       2001
2  c       2002
3  d       2003
4  e       1990
5  f       1991

Given df1, df2:给定 df1,df2:

df1 = pd.DataFrame({'timestamp': range(0,2000)})
df2 = -df1

using .loc:使用 .loc:

df1.loc[:999,'timestamp'] = df2.loc[:999,'timestamp']
df1.loc[997:1002,'timestamp']

997     -997
998     -998
999     -999
1000    1000
1001    1001
1002    1002
Name: timestamp, dtype: int64

or using iloc (optionally converting loc -> iloc using get_loc )或使用 iloc (可选地使用 get_loc 转换 loc -> get_loc

df1.iloc[:1000,0] = df2.iloc[:1000,0]
df1.loc[997:1002,'timestamp']

997     -997
998     -998
999     -999
1000    1000
1001    1001
1002    1002
Name: timestamp, dtype: int64

note that the slicing behavior on iloc and loc is differrent.请注意, iloc 和 loc 上的切片行为是不同的。
.loc includes the right value, .iloc doesn't include it (like in range) .loc包含正确的值, .iloc不包含它(例如在范围内)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM