简体   繁体   English

如何过滤具有多个条件(熊猫)的多个数据框?

[英]How can I filter multiple dataframes with multiple conditions (pandas)?

I have this dataframe:我有这个 dataframe:

Company公司 Version版本 Disp Version显示版本 complement补充 Value价值
1 1个 1 1个 0 0 1 1个 100 100
1 1个 1 1个 0 0 2 2个 200 200
1 1个 2 2个 1 1个 1 1个 300 300
1 1个 2 2个 1 1个 2 2个 400 400
2 2个 1 1个 1 1个 1 1个 500 500
2 2个 1 1个 1 1个 2 2个 600 600
2 2个 2 2个 1 1个 1 1个 700 700
2 2个 2 2个 1 1个 2 2个 800 800
3 3个 1 1个 1 1个 1 1个 900 900
3 3个 1 1个 1 1个 2 2个 1000 1000
4 4个 1 1个 0 0 1 1个 1100 1100
4 4个 1 1个 0 0 2 2个 1200 1200
4 4个 2 2个 0 0 1 1个 1300 1300
4 4个 2 2个 0 0 2 2个 1400 1400
4 4个 3 3个 0 0 1 1个 1500 1500
4 4个 3 3个 0 0 2 2个 1600 1600
5 5个 1 1个 0 0 1 1个 1700 1700
5 5个 1 1个 0 0 2 2个 1800 1800
5 5个 2 2个 0 0 1 1个 1900 1900
5 5个 2 2个 0 0 2 2个 2000 2000
5 5个 3 3个 0 0 1 1个 2100 2100
5 5个 3 3个 0 0 2 2个 2200 2200
5 5个 4 4个 1 1个 1 1个 2300 2300
5 5个 4 4个 1 1个 2 2个 2400 2400
6 6个 1 1个 0 0 1 1个 2500 2500
6 6个 1 1个 0 0 2 2个 2600 2600
6 6个 2 2个 0 0 1 1个 2700 2700
6 6个 2 2个 0 0 2 2个 2800 2800
7 7 1 1个 1 1个 1 1个 400 400
7 7 1 1个 1 1个 2 2个 400 400

I want my dataframe to be filtered with some conditions:我希望在某些条件下过滤我的 dataframe:

  1. If the column 'Company' value has 'Disp Version' equals to 1 or 0, you need to get the rows that have 'Disp Version' equals to 1 and the max of the column 'Version';如果“Company”列的“Disp Version”值为 1 或 0,则需要获取“Disp Version”等于 1 且“Version”列的最大值的行;
  2. If the column 'Company' value has 'Disp Version' equals only to 1, you need to get max of the column 'Version';如果“Company”列的“Disp Version”值仅等于 1,则需要获取“Version”列的最大值;
  3. If the column 'Company' value has 'Disp Version' equals only to 0, you need to get max of the column 'Version'.如果“Company”列的“Disp Version”值仅等于 0,则需要获取“Version”列的最大值。

Furthermore, you need to have for each 'Company' value, the values 1 and 2 for the column 'complement'.此外,对于每个“Company”值,您需要为“complement”列设置值 1 和 2。

Examples:例子:

For the first condition I need a dataframe like this:对于第一个条件,我需要这样的 dataframe:

Company公司 Version版本 Disp Version显示版本 complement补充 Value价值
1 1个 2 2个 1 1个 1 1个 300 300
1 1个 2 2个 1 1个 2 2个 400 400
5 5个 4 4个 1 1个 1 1个 2300 2300
5 5个 4 4个 1 1个 2 2个 2400 2400

For the second condition I need a dataframe like this:对于第二种情况,我需要这样的 dataframe:

Company公司 Version版本 Disp Version显示版本 complement补充 Value价值
2 2个 2 2个 1 1个 1 1个 700 700
2 2个 2 2个 1 1个 2 2个 800 800
3 3个 1 1个 1 1个 1 1个 900 900
3 3个 1 1个 1 1个 2 2个 1000 1000
7 7 1 1个 1 1个 1 1个 400 400
7 7 1 1个 1 1个 2 2个 400 400

For the third condition I need a dataframe like this:对于第三个条件,我需要这样的 dataframe:

Company公司 Version版本 Disp Version显示版本 complement补充 Value价值
4 4个 3 3个 0 0 1 1个 1500 1500
4 4个 3 3个 0 0 2 2个 1600 1600
6 6个 2 2个 0 0 1 1个 2700 2700
6 6个 2 2个 0 0 2 2个 2800 2800

I need this output (with is the 3 dataframes together):我需要这个 output(连同 3 个数据帧):

Company公司 Version版本 Disp Version显示版本 complement补充 Value价值
1 1个 2 2个 1 1个 1 1个 300 300
1 1个 2 2个 1 1个 2 2个 400 400
2 2个 2 2个 1 1个 1 1个 700 700
2 2个 2 2个 1 1个 2 2个 800 800
3 3个 1 1个 1 1个 1 1个 900 900
3 3个 1 1个 1 1个 2 2个 1000 1000
4 4个 3 3个 0 0 1 1个 1500 1500
4 4个 3 3个 0 0 2 2个 1600 1600
5 5个 4 4个 1 1个 1 1个 2300 2300
5 5个 4 4个 1 1个 2 2个 2400 2400
6 6个 2 2个 0 0 1 1个 2700 2700
6 6个 2 2个 0 0 2 2个 2800 2800
7 7 1 1个 1 1个 1 1个 400 400
7 7 1 1个 1 1个 2 2个 400 400

To filter a dataframe with multiple conditions, you can use the要使用多个条件筛选 dataframe,您可以使用

pandas.DataFrame.query pandas.DataFrame.query

function. This function allows you to filter the dataframe using a boolean expression.Here's an example of how you can use the query function to filter the dataframe based on the first condition you mentioned: function。此 function 允许您使用 boolean 表达式过滤 dataframe。以下是如何使用查询 function 根据您提到的第一个条件过滤 dataframe 的示例:

df1 = df[(df['Disp Version'] == 1) | (df['Disp Version'] == 0)]
df1 = df1.groupby('Company').apply(lambda x: x[x['Version'] == x['Version'].max()])
df1 = df1[df1['Disp Version'] == 1]
df1 = df1[['Company', 'Version', 'Disp Version', 'complement', 'Value']]

This will create a new dataframe df1 that contains the rows that meet the first condition.这将创建一个新的 dataframe df1,其中包含满足第一个条件的行。 The first line filters the rows based on the value of the Disp Version column being 1 or 0. The second line groups the data by the Company column and applies a lambda function to each group that filters the rows to only include the rows with the maximum value in the Version column.第一行根据Disp Version列的值为 1 或 0 来过滤行。第二行按Company列对数据进行分组,并将 lambda function 应用于每个组,过滤行以仅包含具有最大值的行版本列中的值。 The third line filters the data again to only include rows with Disp Version equal to 1. Finally, the fourth line selects the relevant columns and assigns them to the df1 dataframe.第三行再次过滤数据,只包含Disp Version等于 1 的行。最后,第四行选择相关列并将它们分配给 df1 dataframe。

To filter the data based on the second and third conditions, you can use similar code, using the query function to filter the data based on the values of the Disp Version column:要根据第二个和第三个条件筛选数据,可以使用类似的代码,使用查询 function 根据Disp Version列的值筛选数据:

df2 = df[df['Disp Version'] == 1]
df2 = df2.groupby('Company').apply(lambda x: x[x['Version'] == x['Version'].max()])
df2 = df2[['Company', 'Version', 'Disp Version', 'complement', 'Value']]

df3 = df[df['Disp Version'] == 0]
df3 = df3.groupby('Company').apply(lambda x: x[x['Version'] == x['Version'].max()])
df3 = df3[['Company', 'Version', 'Disp Version', 'complement', 'Value']]

To get the final dataframe that includes all the rows that meet the conditions, you can concatenate the three dataframes using the pandas.concat function:要获得包含所有满足条件的行的最终 dataframe,您可以使用 pandas.concat function 连接三个数据帧:

result = pd.concat([df1, df2, df3])
    1. Get maximum version number for each company
    maxv = df.groupby('Company')['Version'].max()

    2. Join with the original dataframe
    merged_df = pd.merge(df, maxv, on=['Company'])

    3. Get companies which satisfy the conditions (have both 1 and 0)
    idx = df.groupby('Company').apply(lambda g: g['Disp Version'].sum() < g['Disp Version'].count() and g['Disp Version'].sum() > 0)
    valids = idx.loc[idx ==True]

    4. Final result
    df.loc[df['Company'].isin(valids.index) & df['Disp Version'] == 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM