简体   繁体   English

Pandas.DataFrame - 根据另一列中的值是否已发生创建一个新列

[英]Pandas.DataFrame - create a new column, based on whether value in another column has occur-ed or not

I'm an amateur user having some experiences VBA but trying to switch to Python because my beautiful new MBP runs VBA miserably.我是一个业余用户,有一些经验 VBA 但试图切换到 Python 因为我漂亮的新 MBP 运行 VBA 很糟糕。 I'm trying to create a df column, based on whether another column value has occur-ed already.我正在尝试根据是否已经出现另一个列值来创建一个 df 列。 If it has, then the new column value is 0 on that row, if not 1.如果有,则该行的新列值为 0,如果不是 1。

For example: I want to create column C in the example below.例如:我想在下面的示例中创建列 C。 How do I do it quickly?我该如何快速完成?

 AB C (to create column C) 0 001 USA 1 1 002 Canada 1 3 003 China 1 4 004 India 1 5 005 UK 1 6 006 Japan 1 7 007 USA 0 8 008 UK 0

You can check for duplicates on the 'B' column and set duplicates to 0. Then set any non-duplicates to 1 like this:您可以检查'B'列上的重复项并将重复项设置为 0。然后将任何非重复项设置为 1,如下所示:

 df = pd.DataFrame({'A':[1, 2, 3, 4, 5, 6, 7, 8], 'B':['USA', 'Canada', 'China', 'India', 'UK', 'Jpan', 'USA', 'UK']}) df.loc[df['B'].duplicated(), 'C'] = 0 df['C'] = df['C'].fillna(1).astype(int) print(df)

Output: Output:

 AB C 0 1 USA 1 1 2 Canada 1 2 3 China 1 3 4 India 1 4 5 UK 1 5 6 Jpan 1 6 7 USA 0 7 8 UK 0

After creating your dataframe:创建 dataframe 后:

 import pandas as pandas data = [["001", "USA"], ["002", "Canada"], ["003", "China"], ["004", "India"], ["005", "UK"], ["006", "Japan"], ["007", "USA"], ["008", "UK"]] # Create a dataframe df = pandas.DataFrame(data, columns=["A", "B"])

You can apply a function to each value of one of the columns (in your case, the B column) and have the output of the function as the value of your column.您可以将 function 应用于其中一列(在您的情况下为B列)的每个值,并将 function 的值的 output 作为列的值。

 df["C"] = df.B.apply(lambda x: 1 if df.B.value_counts()[x] == 1 else 0)

This will check if the value in the B column appears somewhere else in the column, and will return 1 if unique and 0 if duplicated.这将检查 B 列中的值是否出现在列中的其他位置,如果唯一则返回1 ,如果重复则返回0

The dataframe looks like this: dataframe 看起来像这样:

 AB C 0 001 USA 0 1 002 Canada 1 2 003 China 1 3 004 India 1 4 005 UK 0 5 006 Japan 1 6 007 USA 0 7 008 UK 0

If you want the values to be recalculated each time you need to have the command如果您希望每次需要命令时都重新计算值

df["C"] = df.B.apply(lambda x: 1 if df.B.value_counts()[x] == 1 else 0)

executed each time after you add a row.每次添加一行后执行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 循环将创建新的Pandas.DataFrame列 - Loop that will create new Pandas.DataFrame column 将 pandas.DataFrame 附加到另一个 pandas.DataFrame 的一列 - Appending a pandas.DataFrame to one column of another pandas.DataFrame 根据条件在另一列上填充 pandas.DataFrame 的 NaN - Fill NaNs of pandas.DataFrame based on condition over another column 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 根据行中的值是否再次出现在数据框中,在pandas数据框中创建新列 - Create new column in pandas dataframe based on whether a value in the row reappears in dataframe Pandas.DataFrame:创建一个新列,使用当前df中的一列并在另一个df中查找一列,并进行计算 - Pandas.DataFrame: Create a new column, using one column from current df and by looking up one column in another df, with calculation 根据另一列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in another column 根据最新的列创建一个新列,并在数据框上有一个值 - Pandas - Create a new Column based on the latest column with a value on a dataframe - Pandas pandas dataframe 中给定子索引根据另一列中的最大值创建新列 - Create a new column based on the maximum value in another column for a given sub index in pandas dataframe 根据另一列的值在熊猫中创建新列 - Create new column in pandas based on value of another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM