简体   繁体   English

在 Pandas 中迭代数据框以查找字符串并生成新列

[英]Iterate dataframe in pandas looking for a string and generate new column

I have the following dataframe:我有以下数据框:

      import pandas as pd
      df = pd.DataFrame({'Id_Sensor': [1, 2, 3, 4],'Old_Column': ['P55X', 'MEC8901', 'P58Y', 'M59X']})

      print(df)

        Id_Sensor   Old_Column
           1           P55X
           2           MEC8901
           3           P58Y
           4           M59X

I need to create a new column on this dataframe.我需要在这个数据框上创建一个新列。 If the first letter is P, the column should receive 'computer_typeA'.如果第一个字母是 P,则该列应接收“computer_typeA”。 If the first three letters are MEC the column should receive 'computer_typeB'如果前三个字母是 MEC,则该列应收到“computer_typeB”

I tried to do the following:我尝试执行以下操作:

        #This code segment is incorrect
        for i in range(0, len(df)):

          if(df['Old_Column'].iloc[i][:1] == 'P'):
               df['New_Column'].iloc[i] == 'computer_typeA'

         elif(df['Old_Column'].iloc[i][:3] == 'MEC'):
               df['New_Column'].iloc[i] == 'computer_typeB'

         else:
               df['New_Column'].iloc[i] == 'computer_other'   

The answer is incorrect:答案是错误的:

      print(df)
        Id_Sensor   Old_Column  New_Column
           1           P55X       Nan
           2          MEC8901     Nan
           3           P58Y       Nan
           4           M59X       Nan

I would like the answer to be like this:我希望答案是这样的:

        Id_Sensor   Old_Column       New_Column
           1           P55X       computer_typeA
           2          MEC8901     computer_typeB
           3           P58Y       computer_typeA
           4           M59X       computer_other

You can use numpy.select for conditional statements:您可以将numpy.select用于条件语句:

cond1 = df.Old_Column.str.startswith('P')
cond2 = df.Old_Column.str.startswith('MEC')
condlist = [cond1,cond2]
choicelist = ['computer_typeA', 'computer_typeB']
df['New Column'] = np.select(condlist,choicelist)
df['New Column'] = df['New Column'].replace('0','computer_other')

   Id_Sensor    Old_Column  New Column
0   1   P55X    computer_typeA
1   2   MEC8901 computer_typeB
2   3   P58Y    computer_typeA
3   4   M59X    computer_other

This simple code should do the work:这个简单的代码应该可以完成工作:

df["New_Column"] = "computer_other"

df.loc[df.Old_Column.apply(lambda x: x[0] == "P"), "New_Column"] = "computer_typeA"

df.loc[df.Old_Column.apply(lambda x: x[:3] == "MEC"), "New_Column"] = "computer_typeB"

Note: The reason of the initial declaration of New_Column as computer_other is to simplify the process.注意:将New_Column初始声明为computer_other是为了简化过程。

Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM