根据部分字符串匹配创建两个新的 pandas 列

Question

I have a dataframe of construction titles and names arranged in a random order (but someone's name is always in the cell to the right of their title) like so:我有一个以随机顺序排列的结构标题和名称的数据框（但某人的名字总是在其标题右侧的单元格中），如下所示：

   contact_1_title contact_1_name contact_2_title contact_2_name contact_3_title contact_3_name      contact_4_title contact_4_name
0  owner_architect            joe    other_string   other_string    other_string   other_string         other_string   other_string
1     other_string   other_string       architect           jack    other_string   other_string         other_string   other_string
2     other_string   other_string    other_string   other_string    other_string   other_string  self_cert_architect           mary
3     other_string   other_string    other_string   other_string           owner           phil         other_string   other_string
4       contractor          sarah    other_string   other_string    other_string   other_string         other_string   other_string
5     other_string   other_string       expeditor           kate    other_string   other_string         other_string   other_string

I want to pull every title with the word "architect" in it and insert it into its own, new column.我想提取每个带有“建筑师”一词的标题，并将其插入到它自己的新列中。 I also want to pull every name in the cell immediately to the right and insert it into its own column as well.我还想立即将单元格中的每个名称都拉到右侧，并将其插入到自己的列中。 My desired output:我想要的输出：

        arch_title_col arch_name_col
0      owner_architect           joe
1            architect          jack
2  self_cert_architect          mary

I'm at a loss as to how to go about this.我不知道该怎么做。 I tried working with iterrtuples() but I didn't get very far.我尝试使用iterrtuples()但我并没有走得太远。

Answer 1

What you need is pd.wide_to_long , but I couldn't get the syntax right for how your columns are formatted.您需要的是pd.wide_to_long ，但我无法获得正确格式化列的语法。 So here it is manually:所以这里是手动的：

title = pd.concat([df[col] for col in df.filter(like='title')], axis=0)
name = pd.concat([df[col] for col in df.filter(like='name')], axis=0)
df = pd.concat([title, name], axis=1)
df.columns = ['title', 'name']

Now that we have things in a good format, it's a simple check:现在我们有了一个好的格式，这是一个简单的检查：

out = df[df.title.str.contains('architect')]
print(out)

Output:输出：

                 title  name
0      owner_architect   joe
1            architect  jack
2  self_cert_architect  mary

I promise you that 99% of the time, iter... is not what you want, and there is a far better panda's specific way to do whatever you want to do.我向你保证，在 99% 的情况下， iter...不是你想要的，并且有一种更好的 panda 特定方法可以做任何你想做的事情。

根据部分字符串匹配创建两个新的 pandas 列

问题描述

1 个解决方案

解决方案1
0 2022-07-18 23:37:41

根据部分字符串匹配创建两个新的 pandas 列

问题描述

1 个解决方案

解决方案1 0 2022-07-18 23:37:41

解决方案1
0 2022-07-18 23:37:41