简体   繁体   English

错误的列连接 Python

[英]Erroneous column concatenation Python

I have a data frame where in the first column I have to concatenate the other two if this record is empty.我有一个数据框,如果该记录为空,则在第一列中我必须连接其他两个。

 Cuenta CeCo   GLAccount   CeCoCeBe
  123 A           123         A
  234 S           234         S
  NaN             345         B
  NaN             987         A


for x in df1["Cuenta CeCo"].isna():
if x:
    df1["Cuenta CeCo"]=df1["GLAccount"].apply(str)+" "+df1["CeCoCeBe"]
else :
    df1["Cuenta CeCo"]

TYPES:类型:

df1["Cuenta CeCo"] = dtype('O')
df1["GLAccount"] = dtype('float64')
df1["CeCoCeBe"] = dtype('O')

expected output:预期 output:

Cuenta CeCo   GLAccount   CeCoCeBe
  123 A           123         A
  234 S           234         S
  345 B           345         B
  987 A           987         A

however it seems that when concatenating it does something strange and throws me other numbers and letters但是,似乎在连接时它会做一些奇怪的事情,并向我抛出其他数字和字母

 Cuenta CeCo   
  251 O
  471 B
  791 R
  341 O

Could someone support me to know why this happens and how to correct it to have my expected exit?有人可以支持我知道为什么会发生这种情况以及如何纠正它以实现我的预期退出吗?

Iterating over dataframes is typically bad practice and not what you intend.迭代数据框通常是不好的做法,而不是您想要的。 As you have done it, you are actually iterating over the columns.正如您所做的那样,您实际上是在遍历列。 Try尝试

for x in df:
    print(x)

and you will see it print the column headings.你会看到它打印列标题。

As for what you're trying to do, try this:至于你想要做什么,试试这个:

cols = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
mask = df[cols[0]].isna()
df.loc[mask, cols[0]] = df.loc[mask, cols[1]].map(str) + " " + df.loc[mask, cols[2]]

This generates a mask (in this case a series of True and False) that we use to get a series of just the NaN rows, then replace them by getting the string of the second column and concatenating with the third, using the mask again to get only the rows we need.这会生成一个掩码(在本例中是一系列 True 和 False),我们使用它来获取一系列仅 NaN 行,然后通过获取第二列的字符串并与第三列连接来替换它们,再次使用掩码来只获取我们需要的行。

import pandas as pd
import numpy as np

df = pd.DataFrame([
        ['123 A', 123, 'A'],
        ['234 S', 234, 'S'],
        [np.NaN, 345, 'B'],
        [np.NaN, 987, 'A']
    ], columns = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
)

def f(r):
    if pd.notna(r['Cuenta CeCo']):
        return r['Cuenta CeCo']
    else:
        return f"{r['GLAccount']} {r['CeCoCeBe']}"

df['Cuenta CeCo'] = df.apply(f, axis=1)
df

prints印刷

index指数 Cuenta CeCo昆塔CeCo GLAccount GL帐户 CeCoCeBe CeCoCeBe
0 0 123 A 123 一个 123 123 A一个
1 1 234 S 234秒 234 234 S小号
2 2 345 B 345乙 345 345 B
3 3 987 A 987一 987 987 A一个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM