Python Pandas-在多列中添加基于名字和姓氏的新列

Question

Although I'm still a beginner myself, I'm trying to explain some Pandas fundamentals to colleagues who usually manipulate CSV files with Excel. 尽管我自己还是一个初学者，但我正在尝试向通常使用Excel处理CSV文件的同事解释一些Pandas基础知识。

I hit a wall with my ability to find a "good" answer for solving a given problem I'd like to use as an example. 我有能力找到一个“好的”答案来解决给定的问题，我想以此为例。

I have a CSV file like this: 我有这样的CSV文件：

"Id","First","Last"
"109","Karl","Evans"
"113","Louise","Hudson"
"106","Catherine","Johnson"

and I'm importing it into Python like this: 然后将其导入到Python中，如下所示：

import pandas
df = pandas.read_csv('C:\\example.csv')

I want to add a new column to df called "StartsWithJOrK". 我想在df添加一个名为“ StartsWithJOrK”的新列。

It should say "Yay!" 它应该说“是！” for anyone whose lowercased-first-name OR whose lowercased-last-name starts with a "j" or a "k". 对于小写的姓氏或小写的姓氏以“ j”或“ k”开头的任何人。 It should say "BooHiss" for anyone for whom neither lowercased-name starts with a "j" or a "k". 对于小写名称都不以“ j”或“ k”开头的任何人，应说“ BooHiss”。

(It's a rather overwrought example, but I feel like it packs in a lot of things I either don't know how to do or don't know how combine "pythonically.") （这是一个过度紧张的示例，但是我觉得它包含了很多我不知道如何做或不知道如何“ Python地”组合的东西。）

What's the most pythonic, fewest-lines-of-code way to do this? 什么是最pythonic，最少代码行的方法？

Answer 1

Not the easiest introduction to Pandas... 不是最简单的熊猫入门...

df['StartsWithJorK'] = 'BooHiss'
starting_letters = ['j', 'k']
df.loc[(df.First.str[0].str.lower().isin(starting_letters)) | 
        df.Last.str[0].str.lower().isin(starting_letters), 'StartsWithJorK'] = 'Yay!'

>>> df
     Id       First     Last StartsWithJorK
0   109        Karl    Evans           Yay!
1   113      Louise   Hudson        BooHiss
2   106   Catherine  Johnson           Yay!

df.First.str[0] finds the first character of the name. df.First.str[0]查找名称的第一个字符。

.str.lower() converts this series of letters to lower case. .str.lower()将这一系列字母转换为小写。

.isin(starting_letters) checks if each lower case letter is in our list of starting letters, ie 'j' and 'k'. .isin(starting_letters)检查每个小写字母是否在我们的起始字母列表中，即“ j”和“ k”。

.loc is for label and boolean based indexing where the column StartsWithJorK is set to Yay! .loc用于基于标签和布尔的索引，其中StartsWithJorK列设置为Yay! for each matching condition. 对于每个匹配条件。

Answer 2

If you don't mind importing numpy too, you can do 如果您也不想导入numpy ，则可以执行

import numpy as np
import pandas as pd

mask = df['Last'].str.match('[JjKk]') | df['First'].str.match('[JjKk]')
df['StartsWithJOrK'] = np.where(mask, 'Yay!', 'BooHiss')

Output: 输出：

    Id      First     Last StartsWithJOrK
0  109       Karl    Evans           Yay!
1  113     Louise   Hudson        BooHiss
2  106  Catherine  Johnson           Yay!

There are other ways of creating the above mask . 还有其他创建上述mask 。 Here is one: 这是一个：

mask = (df[['First', 'Last']]
            .apply(lambda x: x.str.match('[JjKk]'), axis=1)
            .any(axis=1))

Or, taking a cue from @Alexander's answer's use of .str.lower() : 或者，从.str.lower()的答案对.str.lower()的使用中.str.lower()提示：

mask = (df[['First', 'Last']]
            .apply(lambda x: x.str.lower().str.match('[jk]'), axis=1)
            .any(axis=1))

Python Pandas-在多列中添加基于名字和姓氏的新列

问题描述

2 个解决方案

解决方案1
2 2016-08-24 15:33:57

解决方案2
2 2016-08-24 15:37:43

Python Pandas-在多列中添加基于名字和姓氏的新列

问题描述

2 个解决方案

解决方案1 2 2016-08-24 15:33:57

解决方案2 2 2016-08-24 15:37:43

解决方案1
2 2016-08-24 15:33:57

解决方案2
2 2016-08-24 15:37:43