[英]Erroneous column concatenation Python
I have a data frame where in the first column I have to concatenate the other two if this record is empty.我有一个数据框,如果该记录为空,则在第一列中我必须连接其他两个。
Cuenta CeCo GLAccount CeCoCeBe
123 A 123 A
234 S 234 S
NaN 345 B
NaN 987 A
for x in df1["Cuenta CeCo"].isna():
if x:
df1["Cuenta CeCo"]=df1["GLAccount"].apply(str)+" "+df1["CeCoCeBe"]
else :
df1["Cuenta CeCo"]
TYPES:类型:
df1["Cuenta CeCo"] = dtype('O')
df1["GLAccount"] = dtype('float64')
df1["CeCoCeBe"] = dtype('O')
expected output:预期 output:
Cuenta CeCo GLAccount CeCoCeBe
123 A 123 A
234 S 234 S
345 B 345 B
987 A 987 A
however it seems that when concatenating it does something strange and throws me other numbers and letters但是,似乎在连接时它会做一些奇怪的事情,并向我抛出其他数字和字母
Cuenta CeCo
251 O
471 B
791 R
341 O
Could someone support me to know why this happens and how to correct it to have my expected exit?有人可以支持我知道为什么会发生这种情况以及如何纠正它以实现我的预期退出吗?
Iterating over dataframes is typically bad practice and not what you intend.迭代数据框通常是不好的做法,而不是您想要的。 As you have done it, you are actually iterating over the columns.
正如您所做的那样,您实际上是在遍历列。 Try
尝试
for x in df:
print(x)
and you will see it print the column headings.你会看到它打印列标题。
As for what you're trying to do, try this:至于你想要做什么,试试这个:
cols = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
mask = df[cols[0]].isna()
df.loc[mask, cols[0]] = df.loc[mask, cols[1]].map(str) + " " + df.loc[mask, cols[2]]
This generates a mask (in this case a series of True and False) that we use to get a series of just the NaN rows, then replace them by getting the string of the second column and concatenating with the third, using the mask again to get only the rows we need.这会生成一个掩码(在本例中是一系列 True 和 False),我们使用它来获取一系列仅 NaN 行,然后通过获取第二列的字符串并与第三列连接来替换它们,再次使用掩码来只获取我们需要的行。
import pandas as pd
import numpy as np
df = pd.DataFrame([
['123 A', 123, 'A'],
['234 S', 234, 'S'],
[np.NaN, 345, 'B'],
[np.NaN, 987, 'A']
], columns = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
)
def f(r):
if pd.notna(r['Cuenta CeCo']):
return r['Cuenta CeCo']
else:
return f"{r['GLAccount']} {r['CeCoCeBe']}"
df['Cuenta CeCo'] = df.apply(f, axis=1)
df
prints印刷
index![]() |
Cuenta CeCo![]() |
GLAccount ![]() |
CeCoCeBe ![]() |
---|---|---|---|
0 ![]() |
123 A ![]() |
123 ![]() |
A![]() |
1 ![]() |
234 S ![]() |
234 ![]() |
S![]() |
2 ![]() |
345 B ![]() |
345 ![]() |
B![]() |
3 ![]() |
987 A ![]() |
987 ![]() |
A![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.