[英]Apply function that operates on Pandas dataframes on a subset of the rows
[英]Apply the same function to a subset of rows in pandas dataframe
我正在使用以下代码使用类型和值列过滤 dataframe,然后删除 tdiff > 0 和 < 3 的任何条目。
import pandas as pd
d = {'Timestamp': ['2020-09-02 07:00:00','2020-09-02 07:10:00', '2020-09-02 07:30:00', '2020-09-02 08:00:00', '2020-09-02 10:00:00', '2020-09-02 11:10:00', '2020-09-02 11:30:00'],
'type': ['A','A','B','A', 'A','A','B'], 'value': [1,2,3,1,1,2,3]}
df = pd.DataFrame(data=d)
df3 = pd.DataFrame()
unique_type = pd.unique(df['type']).astype(str)
for i in range(0,len(unique_type)):
df1 = df[df.type == unique_type[i]]
unique_val = pd.unique(df1['value']).astype(int)
for j in range(0, len(unique_val)):
df2 = df1[df1.value == unique_val[j]]
trange = pd.to_datetime(df2.Timestamp)
tdiff = (trange-min(trange)).dt.total_seconds()/3600
df2['tdiff'] = tdiff#.round(1)
df3 = df3.append(df2, ignore_index=True)
df4 = df3[~((df3.tdiff>0) & (df3.tdiff<3))]
print(df)
df4.sort_values(by=['Timestamp'])
虽然这可行,但我想摆脱 for 循环并使用更高效的代码。
您可以使用groupby
和transform
应用min
:
df['Timestamp'] = pd.to_datetime(df['Timestamp'] )
df['tmin'] = df.groupby(['type','value'])['Timestamp'].transform(min)
df['tdiff'] = (df['Timestamp'] - df['tmin']).dt.total_seconds()/3600
df[~((df.tdiff>0) & (df.tdiff<3))]
output
Timestamp type value tmin tdiff
-- ------------------- ------ ------- ------------------- -------
0 2020-09-02 07:00:00 A 1 2020-09-02 07:00:00 0
1 2020-09-02 07:10:00 A 2 2020-09-02 07:10:00 0
2 2020-09-02 07:30:00 B 3 2020-09-02 07:30:00 0
4 2020-09-02 10:00:00 A 1 2020-09-02 07:00:00 3
5 2020-09-02 11:10:00 A 2 2020-09-02 07:10:00 4
6 2020-09-02 11:30:00 B 3 2020-09-02 07:30:00 4
试试这个简单的代码,用lambda
function
数据:
d = {'type': ['A','A','B','C'], 'col1': [1,2,3,9]}
df = pd.DataFrame(data=d)
东风:
type col1
0 A 1
1 A 2
2 B 3
3 C 9
f = pd.Series(pd.Series(df['type'].unique()).apply(lambda type_: (type_, df[df['type'] == type_].col1.sum()))).to_list()
df['New-column'] = df.type.replace(dict(f))
df.loc[df['New-column'] >= 9, 'New-column'] = np.nan
东风:
type col1 New-column
0 A 1 3.0
1 A 2 3.0
2 B 3 3.0
3 C 9 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.