如何通过比较其他两个列来创建具有值的新列？

Question

I am working on a project that will perform an audit of employees with computer accounts. 我正在做一个项目，该项目将对具有计算机帐户的员工进行审核。 I want to print one data frame with the two new columns in it. 我要打印一个带有两个新列的数据框。 This is different from the Comparing Columns in Dataframes question because I am working with strings. 这与“数据帧中的比较列”问题不同，因为我正在使用字符串。 I will also need to do some fuzzy logic but that is further down the line. 我还需要做一些模糊逻辑，但这是进一步的。

The data I receive is in Excel sheets. 我收到的数据在Excel工作表中。 It comes from two sources that I don't have control over and so I format them to be [First Name, Last Name] and print them to the console to ensure the data I am working with is correct. 它来自两个我无法控制的来源，因此我将其格式化为[First Name，Last Name]，然后将它们打印到控制台以确保我正在使用的数据正确。 I convert the .xls to .csv files, format the information and am able to output the two lists of names in a single dataframe with two columns but have not been able to put the values I want in the last two columns. 我将.xls转换为.csv文件，对信息进行了格式设置，并且能够在具有两列的单个数据框中输出两个名称列表，但是无法将我想要的值放在最后两列中。 I have used query (which returned True/False, not the names), diff and regex. 我用过查询（返回的是True / False，不是名称），diff和regex。 I assume that I am just using the tools incorrectly. 我认为我只是错误地使用了这些工具。

    import pandas as pd

    nd = {'col1': ["Abraham Hansen","Demetrius McMahon","Hilary 
          Emerson","Amelia H. Hayden","Abraham Oliver"],
          'col2': ["Abraham Hansen","Abe Oliver","Hillary Emerson","DJ 
          McMahon","Amelia H. Hayden"]}
    info = pd.DataFrame(data=nd)

    for row in info:
    if info.col1.value not in info.col2:
        info["Need Account"] = info.col1.value

    if info.col2.value not in info.col1:
        info["Delete Account"] = info.col2.value

    print(info)

What I would like is a new dataframe with 2 columns: Need Account and Delete Account and fill in the appropriate values based on the other columns in the dataframe. 我想要一个包含2列的新数据框：“需要帐户”和“删除帐户”，然后根据数据框中的其他列填写适当的值。 In this case, I am getting an error that 'Series' has not attribute 'value'. 在这种情况下，我得到一个错误，即“系列”没有属性“值”。 Here is an example of my expected output: 这是我预期输出的示例：

    df_out: 
    Need Account       Delete Account
    Demetrius McMahon  Abe Oliver
    Abraham Oliver     Hillary Emerson
    Hilary Emerson     DJ McMahon

From this list I can look to see who's nickname showed up and pare the list down from there. 从该列表中，我可以看到出现了谁的昵称，然后从那里删除列表。

Answer 1

I'm taking a chance without seeing your expected output, but reading what you are attempting in your code. 我碰巧没有看到您的预期输出，但阅读了您在代码中尝试的内容。 Let me know if this is what you are looking for? 让我知道这是您要寻找的吗？

nd = {'col1': ["Abraham Hansen","Demetrius McMahon","Hilary Emerson","Amelia H. Hayden","Abraham Oliver"],
      'col2': ["Abraham Hansen","Abe Oliver","Hillary Emerson","DJ McMahon","Amelia H. Hayden"], 
      'Need Account':"", 
      'Delete Account':""
     }
info = pd.DataFrame(data=nd)

print(info)

               col1              col2 Need Account Delete Account
0     Abraham Hansen    Abraham Hansen                            
1  Demetrius McMahon        Abe Oliver                            
2     Hilary Emerson   Hillary Emerson                            
3   Amelia H. Hayden        DJ McMahon                            
4     Abraham Oliver  Amelia H. Hayden

Don't use loops, use vectors... 不要使用循环，使用向量...

info.loc[info['col1'] != info['col2'], 'Need Account'] = info['col1']
info.loc[info['col2'] != info['col1'], 'Delete Account'] = info['col2']

print(info)

               col1              col2       Need Account    Delete Account
0     Abraham Hansen    Abraham Hansen                                     
1  Demetrius McMahon        Abe Oliver  Demetrius McMahon        Abe Oliver
2     Hilary Emerson   Hillary Emerson     Hilary Emerson   Hillary Emerson
3   Amelia H. Hayden        DJ McMahon   Amelia H. Hayden        DJ McMahon
4     Abraham Oliver  Amelia H. Hayden     Abraham Oliver  Amelia H. Hayden

Answer 2

You want to use isin and np.where to conditionally assign the new values: 你想用isin和np.where有条件地赋予新的价值：

info['Need Account'] = np.where(~info['col1'].isin(info['col2']), info['col1'], np.NaN)
info['Delete Account'] = np.where(~info['col2'].isin(info['col1']), info['col2'], np.NaN)

                col1              col2       Need Account   Delete Account
0     Abraham Hansen    Abraham Hansen                NaN              NaN
1  Demetrius McMahon        Abe Oliver  Demetrius McMahon       Abe Oliver
2     Hilary Emerson   Hillary Emerson     Hilary Emerson  Hillary Emerson
3   Amelia H. Hayden        DJ McMahon                NaN       DJ McMahon
4     Abraham Oliver  Amelia H. Hayden     Abraham Oliver              NaN

Or if you want a new dataframe like you stated in your question: 或者，如果您想要一个新的数据框，如您在问题中所述：

need = np.where(~info['col1'].isin(info['col2']), info['col1'], np.NaN)
delete = np.where(~info['col2'].isin(info['col1']), info['col2'], np.NaN)

newdf = pd.DataFrame({'Need Account':need,
                      'Delete Account':delete})

        Need Account   Delete Account
0                NaN              NaN
1  Demetrius McMahon       Abe Oliver
2     Hilary Emerson  Hillary Emerson
3                NaN       DJ McMahon
4     Abraham Oliver              NaN

Answer 3

IIUC, it doesn't seem like there is much 'structure' to be maintained from your input dataframe, so you could use sets to compare membership in groups directly. IIUC，似乎从输入数据框中不需要维护太多的“结构”，因此您可以使用集合直接比较组中的成员身份。

nd = {'col1': ["Abraham Hansen","Demetrius McMahon","Hilary Emerson","Amelia H. Hayden","Abraham Oliver"],
      'col2': ["Abraham Hansen","Abe Oliver","Hillary Emerson","DJ McMahon","Amelia H. Hayden"]}
df = pd.DataFrame(data=nd)

col1 = set(df['col1'])
col2 = set(df['col2'])

need = col1 - col2
delete = col2 - col1

print('need = ', need)
print('delete =  ', delete)

yields 产量

need =  {'Hilary Emerson', 'Demetrius McMahon', 'Abraham Oliver'}
delete =   {'Hillary Emerson', 'DJ McMahon', 'Abe Oliver'}

You could then place in a new dataframe: 然后，您可以放置在新的数据框中：

data = {'need':list(need), 'delete':list(delete)}
new_df = pd.DataFrame.from_dict(data, orient='index').transpose()

(Edited to account for possibility that need and delete are of unequal length.) （编辑以考虑到need和delete的长度不相等的可能性。）

如何通过比较其他两个列来创建具有值的新列？

问题描述

3 个解决方案

解决方案1
0 2019-07-03 19:42:36

解决方案2
0 已采纳 2019-07-03 19:54:23

解决方案3
0 2019-07-03 19:57:47

如何通过比较其他两个列来创建具有值的新列？

问题描述

3 个解决方案

解决方案1 0 2019-07-03 19:42:36

解决方案2 0 已采纳 2019-07-03 19:54:23

解决方案3 0 2019-07-03 19:57:47

解决方案1
0 2019-07-03 19:42:36

解决方案2
0 已采纳 2019-07-03 19:54:23

解决方案3
0 2019-07-03 19:57:47