[英]Pandas: Create new column based on existing, return existing if conditionals don't match
I have a dataset that contains a column with categorical values.我有一个数据集,其中包含一个具有分类值的列。 I need to standardize the column because some values are coded incorrectly.
我需要对列进行标准化,因为某些值的编码不正确。 For example, '1.0' and '3.0' should be '01' and '03', respectively.
例如,“1.0”和“3.0”应分别为“01”和“03”。 When the values are correct, however, I just need to return the value of the column I'm cleaning.
但是,当值正确时,我只需要返回我正在清理的列的值。 I'd like to include the cleaned data in a new column.
我想将清理后的数据包含在一个新列中。
I am relatively new to Python and Pandas.我对 Python 和 Pandas 比较陌生。 I usually work in R. I've tried various techniques I found on Stack, but I keep running into an issue when attempting to return the values from the original column if they are correct.
我通常在 R 中工作。我尝试了在 Stack 上找到的各种技术,但是在尝试从原始列返回值是否正确时,我一直遇到问题。
Any assistance would be much appreciated!任何帮助将不胜感激! Here's some sample data:
以下是一些示例数据:
import pandas as pd
d = {'col1':['01','03','1.0','10.0','7.0','3.0']}
df = pd.DataFrame(data=d)
This returns ....这返回....
col1
0 01
1 03
2 1.0
3 10.0
4 7.0
5 3.0
And I'm hoping to get ...而我希望得到...
col1 col2
0 01 01
1 03 03
2 1.0 01
3 10.0 10
4 7.0 07
5 3.0 03
You can convert the number column to float then to int and finally add leading zeros.您可以将数字列转换为浮点数,然后转换为 int,最后添加前导零。
df['col2'] = (df['col1']
.astype(float).astype(int)
.apply('{:0>2}'.format))
df['col3'] = (df['col1']
.astype(float).astype(int).astype(str)
.str.zfill(2))
print(df)
col1 col2 col3
0 01 01 01
1 03 03 03
2 1.0 01 01
3 10.0 10 10
4 7.0 07 07
5 3.0 03 03
This is the style format approach where you individually style each column.这是您单独设置每一列的样式的样式格式方法。
Code:代码:
df['col2'] = df['col1']
df = df.astype(float)
df = df.style.format({'col1': "{:.1f}",'col2': "{:,.0f}"})
df
Output:输出:
col1 col2
0 1.0 1
1 3.0 3
2 1.0 1
3 10.0 10
4 7.0 7
5 3.0 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.