简体   繁体   English

Python Pandas-在多列中添加基于名字和姓氏的新列

[英]Python Pandas - Add a new column with value based on first and last name in multiple columns

Although I'm still a beginner myself, I'm trying to explain some Pandas fundamentals to colleagues who usually manipulate CSV files with Excel. 尽管我自己还是一个初学者,但我正在尝试向通常使用Excel处理CSV文件的同事解释一些Pandas基础知识。

I hit a wall with my ability to find a "good" answer for solving a given problem I'd like to use as an example. 我有能力找到一个“好的”答案来解决给定的问题,我想以此为例。

I have a CSV file like this: 我有这样的CSV文件:

"Id","First","Last"
"109","Karl","Evans"
"113","Louise","Hudson"
"106","Catherine","Johnson"

and I'm importing it into Python like this: 然后将其导入到Python中,如下所示:

import pandas
df = pandas.read_csv('C:\\example.csv')

I want to add a new column to df called "StartsWithJOrK". 我想在df添加一个名为“ StartsWithJOrK”的新列。

It should say "Yay!" 它应该说“是!” for anyone whose lowercased-first-name OR whose lowercased-last-name starts with a "j" or a "k". 对于小写的姓氏或小写的姓氏以“ j”或“ k”开头的任何人。 It should say "BooHiss" for anyone for whom neither lowercased-name starts with a "j" or a "k". 对于小写名称都不以“ j”或“ k”开头的任何人,应说“ BooHiss”。

(It's a rather overwrought example, but I feel like it packs in a lot of things I either don't know how to do or don't know how combine "pythonically.") (这是一个过度紧张的示例,但是我觉得它包含了很多我不知道如何做或不知道如何“ Python地”组合的东西。)

What's the most pythonic, fewest-lines-of-code way to do this? 什么是最pythonic,最少代码行的方法?

Not the easiest introduction to Pandas... 不是最简单的熊猫入门...

df['StartsWithJorK'] = 'BooHiss'
starting_letters = ['j', 'k']
df.loc[(df.First.str[0].str.lower().isin(starting_letters)) | 
        df.Last.str[0].str.lower().isin(starting_letters), 'StartsWithJorK'] = 'Yay!'

>>> df
     Id       First     Last StartsWithJorK
0   109        Karl    Evans           Yay!
1   113      Louise   Hudson        BooHiss
2   106   Catherine  Johnson           Yay!

df.First.str[0] finds the first character of the name. df.First.str[0]查找名称的第一个字符。

.str.lower() converts this series of letters to lower case. .str.lower()将这一系列字母转换为小写。

.isin(starting_letters) checks if each lower case letter is in our list of starting letters, ie 'j' and 'k'. .isin(starting_letters)检查每个小写字母是否在我们的起始字母列表中,即“ j”和“ k”。

.loc is for label and boolean based indexing where the column StartsWithJorK is set to Yay! .loc用于基于标签和布尔的索引 ,其中StartsWithJorK列设置为Yay! for each matching condition. 对于每个匹配条件。

If you don't mind importing numpy too, you can do 如果您也不想导入numpy ,则可以执行

import numpy as np
import pandas as pd

mask = df['Last'].str.match('[JjKk]') | df['First'].str.match('[JjKk]')
df['StartsWithJOrK'] = np.where(mask, 'Yay!', 'BooHiss')

Output: 输出:

    Id      First     Last StartsWithJOrK
0  109       Karl    Evans           Yay!
1  113     Louise   Hudson        BooHiss
2  106  Catherine  Johnson           Yay!

There are other ways of creating the above mask . 还有其他创建上述mask Here is one: 这是一个:

mask = (df[['First', 'Last']]
            .apply(lambda x: x.str.match('[JjKk]'), axis=1)
            .any(axis=1))

Or, taking a cue from @Alexander's answer's use of .str.lower() : 或者,从.str.lower()的答案对.str.lower()的使用中.str.lower()提示:

mask = (df[['First', 'Last']]
            .apply(lambda x: x.str.lower().str.match('[jk]'), axis=1)
            .any(axis=1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas:根据已存在的列值添加新列,并将新列的值设置为1或0 - Python pandas: add new columns based on the existed a column value, and set the value of new columns as 1 or 0 基于groupby Python的第一个和最后一个值的条件创建一个新列 - Creating a new column based on conditions of first and last value of groupby Python 根据多个条件将现有列的值分配给 Pandas 中的新列 - Assign value of existing column to new columns in pandas based on multiple conditions python:在groupby pandas中迭代,根据以前的值添加新列 - python: iterate in groupby pandas, add new columns based on previous value 熊猫-python-使用列将值添加到新列 - pandas - python - using columns to add value to new column 熊猫-多列到“列名-值”列 - pandas - multiple columns to “column name - value” columns 基于pandas中的公共列值将多列合并在一起-python - Merge multiple columns together based on a common column value in pandas - python Python Pandas - 在名字和姓氏列中有多个名称的拆分列 - Python Pandas - Split Column with multiple names in first name and last name column Python Pandas dataframe - 根据索引值添加新列 - Python Pandas dataframe - add a new column based on index value 基于多准则和多列的PANDAS新列 - PANDAS NEW COLUMN BASED ON MULTIPLE CRITERIA AND COLUMNS
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM