[英]Pandas.DataFrame - create a new column, based on whether value in another column has occur-ed or not
I'm an amateur user having some experiences VBA but trying to switch to Python because my beautiful new MBP runs VBA miserably.我是一个业余用户,有一些经验 VBA 但试图切换到 Python 因为我漂亮的新 MBP 运行 VBA 很糟糕。 I'm trying to create a df column, based on whether another column value has occur-ed already.我正在尝试根据是否已经出现另一个列值来创建一个 df 列。 If it has, then the new column value is 0 on that row, if not 1.如果有,则该行的新列值为 0,如果不是 1。
For example: I want to create column C in the example below.例如:我想在下面的示例中创建列 C。 How do I do it quickly?我该如何快速完成?
AB C (to create column C) 0 001 USA 1 1 002 Canada 1 3 003 China 1 4 004 India 1 5 005 UK 1 6 006 Japan 1 7 007 USA 0 8 008 UK 0
You can check for duplicates on the 'B'
column and set duplicates to 0. Then set any non-duplicates to 1 like this:您可以检查'B'
列上的重复项并将重复项设置为 0。然后将任何非重复项设置为 1,如下所示:
df = pd.DataFrame({'A':[1, 2, 3, 4, 5, 6, 7, 8], 'B':['USA', 'Canada', 'China', 'India', 'UK', 'Jpan', 'USA', 'UK']}) df.loc[df['B'].duplicated(), 'C'] = 0 df['C'] = df['C'].fillna(1).astype(int) print(df)
Output: Output:
AB C 0 1 USA 1 1 2 Canada 1 2 3 China 1 3 4 India 1 4 5 UK 1 5 6 Jpan 1 6 7 USA 0 7 8 UK 0
After creating your dataframe:创建 dataframe 后:
import pandas as pandas data = [["001", "USA"], ["002", "Canada"], ["003", "China"], ["004", "India"], ["005", "UK"], ["006", "Japan"], ["007", "USA"], ["008", "UK"]] # Create a dataframe df = pandas.DataFrame(data, columns=["A", "B"])
You can apply a function to each value of one of the columns (in your case, the B
column) and have the output of the function as the value of your column.您可以将 function 应用于其中一列(在您的情况下为B
列)的每个值,并将 function 的值的 output 作为列的值。
df["C"] = df.B.apply(lambda x: 1 if df.B.value_counts()[x] == 1 else 0)
This will check if the value in the B column appears somewhere else in the column, and will return 1
if unique and 0
if duplicated.这将检查 B 列中的值是否出现在列中的其他位置,如果唯一则返回1
,如果重复则返回0
。
The dataframe looks like this: dataframe 看起来像这样:
AB C 0 001 USA 0 1 002 Canada 1 2 003 China 1 3 004 India 1 4 005 UK 0 5 006 Japan 1 6 007 USA 0 7 008 UK 0
If you want the values to be recalculated each time you need to have the command如果您希望每次需要命令时都重新计算值
df["C"] = df.B.apply(lambda x: 1 if df.B.value_counts()[x] == 1 else 0)
executed each time after you add a row.每次添加一行后执行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.