[英]How to add a new column in pandas dataframe based on conditions satisfied in another column?
My dataframe looks like the following:我的 dataframe 如下所示:
id state level
0 1 [p, t] [dsd]
1 3 [t, t] [dsds, dsd]
2 4 [l, l] [jgddf, vdv]
3 6 [u, c] [cxxc, jgddf]
What I am trying to do is to check if the level
column contains part or whole string in the list and add a new column based on that.我要做的是检查
level
列是否包含列表中的部分或整个字符串,并在此基础上添加一个新列。 This is how I am trying to accomplish that (it includes how I am creating dataframe and sorting and filtering and merging elements in each row):这就是我试图实现的方式(它包括我如何创建 dataframe 以及对每一行中的元素进行排序、过滤和合并):
import numpy as np
import pandas as pd
something = [[1, "p", "dsd"], [3, "t", "dsd"], [6, "u", "jgddf"], [1, "p", "dsd"], [4, "l", "jgddf"], [1, "t", "dsd"],
[3, "t", "dsds"], [6, "c", "cxxc"], [1, "p", "dsd"], [4, "l", "vdv"]]
test = pd.DataFrame(something)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()
df_aslist = test.groupby(['id']).aggregate(lambda x: list(x)).reset_index()
#making it a set to remove duplicates
df_aslist['level'] = df_aslist['level'].apply(lambda x: list(set(x)))
print(df_aslist)
conditions = [(df_aslist["level"].str.contains("ds") & df_aslist["level"].str.contains("sd")),
(df_aslist["level"].str.contains("cx") & df_aslist["level"].str.contains("vd"))]
values = ["term 1", "term 2"]
df_aslist["label"] = np.select(conditions, values)
print(df_aslist)
Output: Output:
id state level label
0 1 [p, t] [tere] 0
1 3 [t, t] [dsds, dsd] 0
2 4 [l, l] [vdv, jgddf] 0
3 6 [u, c] [cxxc, jgddf] 0
Ideally it should show the following, where the rows that didnt match the condition should disappear and rest remain with new labels.理想情况下,它应该显示以下内容,其中不符合条件的行应该消失,并且 rest 保留有新标签。
id state level label
1 3 [t, t] [dsds, dsd] term 1
2 4 [l, l] [vdv, jgddf] term 2
3 6 [u, c] [cxxc, jgddf] term 2
Try with astype()
method:尝试使用
astype()
方法:
df_aslist[['state','level']]=df_aslist[['state','level']].astype(str)
#the above code change the list inside your columns to string
conditions=[(df_aslist["level"].str.contains("ds") & df_aslist["level"].str.contains("sd")),
(df_aslist["level"].str.contains("cx") & df_aslist["level"].str.contains("vd"))
]
values = ["term 1", "term 2"]
df_aslist["label"] = np.select(conditions, values)
Finally filter out your dataframe:最后过滤掉你的 dataframe:
df_aslist=df_aslist.query("label!='0'")
If you print df_aslist
you will get your desired output如果你打印
df_aslist
你会得到你想要的 output
Note: If you want those list back then use pd.eval()
:注意:如果您想要这些列表,请使用
pd.eval()
:
df_aslist[['state','level']]=df_aslist[['state','level']].apply(pd.eval)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.