简体   繁体   English

pandas中列不存在或列为NaN时根据行数据创建列

[英]Create column based on row data when column doesn't exist or column is NaN in pandas

I have a dataframe from OSM data.我有一个来自 OSM 数据的 dataframe。 In this I got everything but the colour column in my area.在这里,除了我所在区域的colour列之外,我什么都得到了。 However in other areas the column may exist.但是在其他区域可能存在该列。 Now I want to create the column if it is missing by providing calculated colors and also want to replace any NaN values with a color code when the column exists but a row has no color value yet.现在我想通过提供计算的 colors 来创建列(如果它丢失了),并且还想在列存在但行还没有颜色值时用颜色代码替换任何 NaN 值。

TLDR: How do I create a colum if needed and otherwise map NaN otherwise? TLDR:如果需要,我如何创建一个列,否则如何创建 map NaN?

I already tried just doing:我已经尝试过这样做:

import random
def setColor(_):
    r = lambda: random.randint(0,255)
    return '#%02X%02X%02X' % (r(),r(),r())



lines.loc[lines['colour'].isnull(),'colour'] = lines["colour"].map(setColor)

However this fails if colour doesnt exist initially.但是,如果颜色最初不存在,则会失败。

I could run lines["colour"] = np.nan first but while that works for empty colums this doesn't work for the case when the column already partially exists.我可以先运行lines["colour"] = np.nan但是虽然它适用于空列,但它不适用于列已经部分存在的情况。 So I wonder if there is a better way.所以我想知道是否有更好的方法。

It's not fully clear what you want, but maybe this is close.目前还不完全清楚你想要什么,但也许这很接近。

Given df1 and df2 :给定df1df2

import pandas as pd
import numpy as np
import random

df1 = pd.DataFrame({'Col_01': ['x', 'y', 'z']})
df2 = pd.DataFrame({'Col_01': ['x', 'y', 'z'], 'colour': ['#D30000', '#C21807', '']})

print("df1:\n", df1)
print("df2:\n", df2)

Console output:控制台 output:

df1:
   Col_01
0      x
1      y
2      z
df2:
   Col_01   colour
0      x  #D30000
1      y  #C21807
2      z

With a slight change to your function (removing argument) and looping through all dataframes:对您的 function 稍作更改(删除参数)并循环遍历所有数据帧:

def setColor(): # change: remove the "_" here
    r = lambda: random.randint(0, 255)
    return '#%02X%02X%02X' % (r(),r(),r())

for df in [df1, df2]:
    if "colour" not in df:
        df["colour"] = df.apply(lambda x: setColor(), axis=1)
    else:
        df["colour"] = np.where(df["colour"] == '', setColor(), df["colour"])

print("df1:\n", df1)
print("df2:\n", df2)

Console output:控制台 output:

df1:
   Col_01   colour
0      x  #C0ACB3
1      y  #1FA09E
2      z  #4A35FF
df2:
   Col_01   colour
0      x  #D30000
1      y  #C21807
2      z  #D97652

It's probably self-explanatory, but the loop first looks to see if the colour column exists;这可能是不言自明的,但是循环首先查看colour列是否存在; if not, it adds it and creates a hex code for each row.如果没有,它会添加它并为每一行创建一个十六进制代码。 Otherwise, if the column exists, it uses np.where() to create a hex code for blank rows, otherwise keeping hex code if it's there.否则,如果该列存在,它会使用np.where()为空行创建一个十六进制代码,否则保留十六进制代码(如果存在)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么在聚合不存在的列时,pandas会为列值提供NaN? - Why does pandas give NaN for column values when aggregating a column that doesn't exist? Pandas 分组查找列的最小值(如果不存在则返回 NaN) - Pandas group by find minimum of column if it doesn't exist return NaN 根据另一列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in another column 通过在给定列中选择不包含NaN的第一行来过滤DataFrame(如果不存在,则选择任一行) - Filter DataFrame by selecting the first row with not-NaN in the given column (or any one row if doesn't exist) pandas DataFrame based on an existing column data 添加新列时的NaN值 - NaN values when new column is added to pandas DataFrame based on an existing column data 在 Pandas DataFrame 中创建 NaN 列 - Create NaN column in pandas DataFrame 使用 Python Pandas,仅当“nan”值不存在时,我可以根据另一列替换 df 中一列的值吗? - Using Python Pandas, can I replace values of one column in a df based on another column only when a "nan" value does not exist? Pandas 基于另一个 dataframe 将多个列和行值设置为 nan - Pandas Set multiple column and row values to nan based on another dataframe CSV读取特定列,如果不存在,则输入NaN - CSV read specific column, if it doesn't exist put NaN 使用NaN在pandas中按列进行Winsorizing数据 - Winsorizing data by column in pandas with NaN
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM