[英]Modify row values in new column based on string value with loop in Python
I'd like to recode row values in a different column based on a string match in pandas using a loop.我想使用循环根据 pandas 中的字符串匹配来重新编码不同列中的行值。 I found a way to do it by creating an entirely new column each time, but that doesn't work when I need to modify select rows from multiple columns at different points in the analysis.我找到了一种方法,每次都创建一个全新的列,但是当我需要在分析的不同点从多列中修改 select 行时,这不起作用。
Here was the solution I used with an example dataframe:这是我在示例 dataframe 中使用的解决方案:
iris = sns.load_dataset('iris')
iris.head()
iris.species.value_counts()
pattern = ['setosa', 'virginica']
iris['new_column'] = 0
lis = []
for index, row in iris.iterrows():
#print (row['species'])
if any(ele in row.species for ele in pattern):
lis.append('matched')
else:
lis.append("notmatched")
iris['new_column'] = lis
I know there may be other ways through list comprehensions in Pandas or using lambda/apply methods, but I'm requesting a solution using loops.我知道可能还有其他方法可以通过 Pandas 中的列表理解或使用 lambda/apply 方法,但我请求使用循环的解决方案。 (I don't have the full dataset here, but there's some complications with it and I believe a loop may be the most flexible). (我这里没有完整的数据集,但它有一些复杂性,我相信循环可能是最灵活的)。
Any suggestions on how to use a loop and string match to modify rows in a different column?关于如何使用循环和字符串匹配来修改不同列中的行的任何建议? Thank you and let me know if I can make this question better!谢谢你,如果我能把这个问题做得更好,请告诉我!
One of the simpliest loop solution is iterate by each value of column iris['species']
and append to list lis
by condition with in
:最简单的循环解决方案之一是通过列iris['species']
和 append 的每个值进行迭代,以按条件in
lis
:
pattern = ['setosa', 'virginica']
lis = []
for val in iris['species']:
if val in pattern:
lis.append('matched')
else:
lis.append("notmatched")
iris['new_column'] = lis
Pandas solution is possible by numpy.where
and Series.isin
: Pandas 解决方案可以通过numpy.where
和Series.isin
:
iris['new_column'] = np.where(iris['species'].isin(pattern), 'matched', 'notmatched')
I ended up finding an answer through a few different threads.我最终通过几个不同的线程找到了答案。
Here's how I did it我是这样做的
iris = sns.load_dataset('iris')
iris.head()
print (iris.species.value_counts())
pattern = ['setosa', 'virginica']
iris['new_column'] = 0
for index, row in iris.iterrows():
match = re.match('|'.join(pattern), row.species)
if match:
iris.loc[index, "new_column"] = match.group(0)
else:
iris.loc[index, "new_column"] = 'no match'
print (iris.new_column.value_counts())
I imagine there's a more efficient way to do this and I also have to specify the column, which isn't ideal.我想有一种更有效的方法可以做到这一点,而且我还必须指定列,这并不理想。 Feel free to comment!随意评论!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.