简体   繁体   English

过滤数据

[英]Filtering out data

I am trying to filter out values from a pandas data-frame and then generate a column with those values. 我正在尝试从熊猫数据框中筛选出值,然后使用这些值生成一列。 To further clarify myself here is an example 为了进一步说明自己,这里是一个例子

print (temp.head())

Index  Work-Assigned  Location
  A         R            NL
  B         df           MB
  A                      NL
  C         SL           NL
  D         RC           MB
  A         RC           AB

Now what I want to do is to filter out all the R and SL values from this data-frame and create another data-frame with just those values and the index. 现在,我要做的是从此数据帧中筛选出所有R和SL值,并仅使用这些值和索引创建另一个数据帧。 Something like this: 像这样:

print (result.head())

Index    R/SL
  A       R
  B      
  C       SL
  D

I tried pivoting the data with Work-Assigned as the value, as you see certain value in the index column is repeated, but that didn't work. 我尝试使用“工作分配”作为值来透视数据,因为您看到索引列中的某些值被重复,但这没有用。

I believe the following gives the requested output: 我相信以下给出了要求的输出:

# your original dataframe
df = pd.DataFrame({'Index': {0: 'A', 1: 'B', 2: 'A', 3: 'C', 4: 'D', 5: 'A'}, 'Location': {0: 'NL', 1: 'MB', 2: 'NL', 3: 'NL', 4: 'MB', 5: 'AB'}, 'Work-Assigned': {0: 'R', 1: 'df', 2: '', 3: 'SL', 4: 'RC', 5: 'RC'}}).set_index('Index').reindex(['Work-Assigned', 'Location'], axis=1)


df
Out[5]: 
      Work-Assigned Location
Index                       
A                 R       NL
B                df       MB
A                         NL
C                SL       NL
D                RC       MB
A                RC       AB

def some_filtering(df_, filter_values=['R', 'SL']):
    # use regex to create a Series which contains bool of whether any `filter_values` are found
    s_filter = df_['Work-Assigned'].str.extract('^({})$'.format('|'.join(filter_values)), expand=False)

    # if nothing was found then return a blank string; otherwise return the unique value found
    if s_filter.dropna().empty:
        val = ['']
    else:
        val = pd.unique(s_filter.dropna())

    # return a DataFrame containing the unique value found (could be blank) at the present index value passed to .groupby
    return pd.DataFrame(data=val, index=pd.unique(df_.index), columns=['/'.join(filter_values)])


df.groupby(level='Index', group_keys=False).apply(some_filtering)
Out[7]: 
  R/SL
A    R
B     
C   SL
D     

IIUC, you want to group by Index and collect the values into a set . IIUC,您想按Index分组并将值收集到set Then check the set for the values 'R' or 'SL' . 然后检查set中的值'R''SL'

Assuming your DataFrame is named df , you could do the following: 假设您的DataFrame名为df ,则可以执行以下操作:

Group by 'Index' and apply the set constructor to the 'Work-Assigned' column. 'Index'分组,然后将set构造函数应用于'Work-Assigned'列。 This will condense all distinct values for each Index into one row. 这会将每个Index所有不同值压缩为一行。

df2 = pd.DataFrame(df.groupby('Index')['Work-Assigned'].apply(set)).reset_index()
print(df2)
#  Index Work-Assigned
#0     A  {nan, R, RC}
#1     B          {df}
#2     C          {SL}
#3     D          {RC}

Next check for the intersection of each row's set with the values you want to search for. 接下来,检查每行集合与您要搜索的值的交集。 If the intersection is null, return an empty string (or np.nan if you prefer). 如果交集为null,则返回一个空字符串(如果愿意,则返回np.nan )。 Otherwise, pick the first value. 否则,选择第一个值。 1 1个

my_values = {'R', 'SL'}
df2['Work-Assigned'] = df2['Work-Assigned'].apply(
    lambda x: '' if not my_values.intersection(x) else list(my_values.intersection(x))[0]
)
print(df2)
#  Index Work-Assigned
#0     A             R
#1     B              
#2     C            SL
#3     D              

References 参考文献

Notes 笔记

1 In the case where multiple (in your case both) values exist, you will get one arbitrarily. 1如果存在多个(对于您而言均为两个)值,则将任意获得一个。 If that is a problem, please update your problem statement on how you would like to handle that case. 如果出现问题,请更新您的问题说明,以了解如何处理该情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM