简体   繁体   English

将单个 Dataframe 中的 2 列合并到 Pandas

[英]Merge 2 columns from a single Dataframe in Pandas

I want to merge 2 columns of the same dataframe, but by using some specific condition.我想合并相同 dataframe 的 2 列,但要使用一些特定条件。

consider the following dataframe:考虑以下 dataframe:

number-first数字优先 Number-second秒数
1 1个 Nan
2 2个 4C 4C
3A 3A 5 5个
Nan 6 6个
Nan 7 7
Nan Nan

The conditions are:条件是:

  1. If the Number-first column has a alphanumeric value and the Number-second Column has a Nan value or a '' (empty string) -> the Result column should only consider the value from Number-first如果 Number-first 列具有字母数字值并且 Number-second 列具有 Nan 值或 ''(空字符串)-> 结果列应仅考虑 Number-first 的值
  2. If the Number-first column has a Nan or '' (empty string) value and the Number-second Column has a alphanumeric value -> the Result column should only consider the value from Number-second如果 Number-first 列具有 Nan 或 ''(空字符串)值并且 Number-second 列具有字母数字值 -> 结果列应仅考虑 Number-second 的值
  3. If the values from both the columns are alphanumeric the result column should consist of value from Number-first and Number-second, which are separated by a '-'如果两列的值都是字母数字,则结果列应包含数字第一和数字第二的值,它们之间用“-”分隔
  4. If both the Columns have Nan or empty string values, the result should consist of a '' (empty string) value如果两个列都有 Nan 或空字符串值,则结果应包含一个 ''(空字符串)值

Following would be the output for the above dataframe:以下是上述 dataframe 的 output:

Number-first数字优先 Number-second秒数 Result结果
1 1个 Nan 1 1个
2 2个 4C 4C 2 - 4C 2 - 4C
3A 3A 5 5个 3A - 5 3A - 5
Nan 6 6个 6 6个
Nan 7 7 7 7
Nan Nan Nan

I have been unsuccessful using the.select method and providing the above conditions.我使用.select方法并提供以上条件一直没有成功。

Thanks in advance for the help !先谢谢您的帮助 !

below is the code snippet of the conditions, which don't seem to work for me:下面是条件的代码片段,它似乎对我不起作用:

conditions = [
    df['Number-first'].str.isalnum(),
    df['Number-second'].str.isalnum(), 
    df['Number-first'].str.isalnum() & df['Number-second'].str.isalnum() ]

You can use the combine function to do this with a custom function like so:您可以使用combine function 与自定义 function 一起执行此操作,如下所示:

import pandas as pd
import numpy as np

def custom_combine(v1, v2):
    if pd.isna(v1) & pd.isna(v2):
        return np.nan
    elif pd.isna(v1):
        return v2
    elif pd.isna(v2):
        return v1
    else:
        return f'{v1} - {v2}'

df['Result'] = (
    # ignore non alphanumeric values
    df.where(df.apply(lambda s: s.str.isalnum()))
    .pipe(lambda df: 
        df['Number-first'].combine(df['Number-second'], custom_combine)
    )
)

print(df)
  Number-first Number-second  Result
0            1           NaN       1
1            2            4C  2 - 4C
2           3A             5  3A - 5
3          NaN             6       6
4          NaN             7       7
5          NaN           NaN     NaN

Alternatively, you can take advantage of pandas' vectorized string methods或者,您可以利用 pandas 的矢量化字符串方法

import pandas as pd
import numpy as np

df['Result'] = (
    df.where(df.apply(lambda s: s.str.isalnum()))
    .pipe(lambda df: 
        df['Number-first'].str.cat(df['Number-second'], '-', na_rep='')
    )
    .str.strip('-')
    .replace('', np.nan)
)

print(df)
  Number-first Number-second Result
0            1           NaN      1
1            2            4C   2-4C
2           3A             5   3A-5
3          NaN             6      6
4          NaN             7      7
5          NaN           NaN    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM