[英]How can I filter multiple dataframes with multiple conditions (pandas)?
I have this dataframe:我有这个 dataframe:
Company![]() |
Version![]() |
Disp Version![]() |
complement![]() |
Value![]() |
---|---|---|---|---|
1 ![]() |
1 ![]() |
0 ![]() |
1 ![]() |
100 ![]() |
1 ![]() |
1 ![]() |
0 ![]() |
2 ![]() |
200 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
300 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
2 ![]() |
400 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
500 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
600 ![]() |
2 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
700 ![]() |
2 ![]() |
2 ![]() |
1 ![]() |
2 ![]() |
800 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
900 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
1000 ![]() |
4 ![]() |
1 ![]() |
0 ![]() |
1 ![]() |
1100 ![]() |
4 ![]() |
1 ![]() |
0 ![]() |
2 ![]() |
1200 ![]() |
4 ![]() |
2 ![]() |
0 ![]() |
1 ![]() |
1300 ![]() |
4 ![]() |
2 ![]() |
0 ![]() |
2 ![]() |
1400 ![]() |
4 ![]() |
3 ![]() |
0 ![]() |
1 ![]() |
1500 ![]() |
4 ![]() |
3 ![]() |
0 ![]() |
2 ![]() |
1600 ![]() |
5 ![]() |
1 ![]() |
0 ![]() |
1 ![]() |
1700 ![]() |
5 ![]() |
1 ![]() |
0 ![]() |
2 ![]() |
1800 ![]() |
5 ![]() |
2 ![]() |
0 ![]() |
1 ![]() |
1900 ![]() |
5 ![]() |
2 ![]() |
0 ![]() |
2 ![]() |
2000 ![]() |
5 ![]() |
3 ![]() |
0 ![]() |
1 ![]() |
2100 ![]() |
5 ![]() |
3 ![]() |
0 ![]() |
2 ![]() |
2200 ![]() |
5 ![]() |
4 ![]() |
1 ![]() |
1 ![]() |
2300 ![]() |
5 ![]() |
4 ![]() |
1 ![]() |
2 ![]() |
2400 ![]() |
6 ![]() |
1 ![]() |
0 ![]() |
1 ![]() |
2500 ![]() |
6 ![]() |
1 ![]() |
0 ![]() |
2 ![]() |
2600 ![]() |
6 ![]() |
2 ![]() |
0 ![]() |
1 ![]() |
2700 ![]() |
6 ![]() |
2 ![]() |
0 ![]() |
2 ![]() |
2800 ![]() |
7 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
400 ![]() |
7 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
400 ![]() |
I want my dataframe to be filtered with some conditions:我希望在某些条件下过滤我的 dataframe:
Furthermore, you need to have for each 'Company' value, the values 1 and 2 for the column 'complement'.此外,对于每个“Company”值,您需要为“complement”列设置值 1 和 2。
Examples:例子:
For the first condition I need a dataframe like this:对于第一个条件,我需要这样的 dataframe:
Company![]() |
Version![]() |
Disp Version![]() |
complement![]() |
Value![]() |
---|---|---|---|---|
1 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
300 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
2 ![]() |
400 ![]() |
5 ![]() |
4 ![]() |
1 ![]() |
1 ![]() |
2300 ![]() |
5 ![]() |
4 ![]() |
1 ![]() |
2 ![]() |
2400 ![]() |
For the second condition I need a dataframe like this:对于第二种情况,我需要这样的 dataframe:
Company![]() |
Version![]() |
Disp Version![]() |
complement![]() |
Value![]() |
---|---|---|---|---|
2 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
700 ![]() |
2 ![]() |
2 ![]() |
1 ![]() |
2 ![]() |
800 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
900 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
1000 ![]() |
7 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
400 ![]() |
7 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
400 ![]() |
For the third condition I need a dataframe like this:对于第三个条件,我需要这样的 dataframe:
Company![]() |
Version![]() |
Disp Version![]() |
complement![]() |
Value![]() |
---|---|---|---|---|
4 ![]() |
3 ![]() |
0 ![]() |
1 ![]() |
1500 ![]() |
4 ![]() |
3 ![]() |
0 ![]() |
2 ![]() |
1600 ![]() |
6 ![]() |
2 ![]() |
0 ![]() |
1 ![]() |
2700 ![]() |
6 ![]() |
2 ![]() |
0 ![]() |
2 ![]() |
2800 ![]() |
I need this output (with is the 3 dataframes together):我需要这个 output(连同 3 个数据帧):
Company![]() |
Version![]() |
Disp Version![]() |
complement![]() |
Value![]() |
---|---|---|---|---|
1 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
300 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
2 ![]() |
400 ![]() |
2 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
700 ![]() |
2 ![]() |
2 ![]() |
1 ![]() |
2 ![]() |
800 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
900 ![]() |
3 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
1000 ![]() |
4 ![]() |
3 ![]() |
0 ![]() |
1 ![]() |
1500 ![]() |
4 ![]() |
3 ![]() |
0 ![]() |
2 ![]() |
1600 ![]() |
5 ![]() |
4 ![]() |
1 ![]() |
1 ![]() |
2300 ![]() |
5 ![]() |
4 ![]() |
1 ![]() |
2 ![]() |
2400 ![]() |
6 ![]() |
2 ![]() |
0 ![]() |
1 ![]() |
2700 ![]() |
6 ![]() |
2 ![]() |
0 ![]() |
2 ![]() |
2800 ![]() |
7 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
400 ![]() |
7 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
400 ![]() |
To filter a dataframe with multiple conditions, you can use the要使用多个条件筛选 dataframe,您可以使用
pandas.DataFrame.query
pandas.DataFrame.query
function. This function allows you to filter the dataframe using a boolean expression.Here's an example of how you can use the query function to filter the dataframe based on the first condition you mentioned: function。此 function 允许您使用 boolean 表达式过滤 dataframe。以下是如何使用查询 function 根据您提到的第一个条件过滤 dataframe 的示例:
df1 = df[(df['Disp Version'] == 1) | (df['Disp Version'] == 0)]
df1 = df1.groupby('Company').apply(lambda x: x[x['Version'] == x['Version'].max()])
df1 = df1[df1['Disp Version'] == 1]
df1 = df1[['Company', 'Version', 'Disp Version', 'complement', 'Value']]
This will create a new dataframe df1 that contains the rows that meet the first condition.这将创建一个新的 dataframe df1,其中包含满足第一个条件的行。 The first line filters the rows based on the value of the Disp Version column being 1 or 0. The second line groups the data by the Company column and applies a lambda function to each group that filters the rows to only include the rows with the maximum value in the Version column.
第一行根据Disp Version列的值为 1 或 0 来过滤行。第二行按Company列对数据进行分组,并将 lambda function 应用于每个组,过滤行以仅包含具有最大值的行版本列中的值。 The third line filters the data again to only include rows with Disp Version equal to 1. Finally, the fourth line selects the relevant columns and assigns them to the df1 dataframe.
第三行再次过滤数据,只包含Disp Version等于 1 的行。最后,第四行选择相关列并将它们分配给 df1 dataframe。
To filter the data based on the second and third conditions, you can use similar code, using the query function to filter the data based on the values of the Disp Version column:要根据第二个和第三个条件筛选数据,可以使用类似的代码,使用查询 function 根据Disp Version列的值筛选数据:
df2 = df[df['Disp Version'] == 1]
df2 = df2.groupby('Company').apply(lambda x: x[x['Version'] == x['Version'].max()])
df2 = df2[['Company', 'Version', 'Disp Version', 'complement', 'Value']]
df3 = df[df['Disp Version'] == 0]
df3 = df3.groupby('Company').apply(lambda x: x[x['Version'] == x['Version'].max()])
df3 = df3[['Company', 'Version', 'Disp Version', 'complement', 'Value']]
To get the final dataframe that includes all the rows that meet the conditions, you can concatenate the three dataframes using the pandas.concat function:要获得包含所有满足条件的行的最终 dataframe,您可以使用 pandas.concat function 连接三个数据帧:
result = pd.concat([df1, df2, df3])
1. Get maximum version number for each company
maxv = df.groupby('Company')['Version'].max()
2. Join with the original dataframe
merged_df = pd.merge(df, maxv, on=['Company'])
3. Get companies which satisfy the conditions (have both 1 and 0)
idx = df.groupby('Company').apply(lambda g: g['Disp Version'].sum() < g['Disp Version'].count() and g['Disp Version'].sum() > 0)
valids = idx.loc[idx ==True]
4. Final result
df.loc[df['Company'].isin(valids.index) & df['Disp Version'] == 1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.