簡體   English   中英

使用錯誤列中的某些單元格值修復 pandas DataFrame 的最佳方法

[英]Best way to fix pandas DataFrame with some cell values in wrong columns

我有一個 pandas DataFrame ,其中一些單元格位於錯誤的列中。 它看起來像這樣:

城市 價格
紐約 哈德宗 低的 大眾
洛杉磯 育空地區 高的 特斯拉
拉斯維加斯 低的 哈德宗 大眾
低的 紐約 特斯拉 育空地區

我列舉了 City、River、Price 和 Car 列中所有可能值的列表。

修復此數據集並將所有值放在 pandas 的右列中的最佳方法是什么?

更新:

這就是我創建的代碼,對我有用:

x = pd.DataFrame({'name':['andrew', 'karl', 'jhon', 'jack', 'bob', 'high'], 'education': ['high', 'middle', 'lviv', 'high', 'elementary', 'kyiv'], 'city':['lviv', 'kyiv', 'elementary', 'kharkiv', 'kyiv', 'mike']})

ydata = pd.DataFrame(x)

fdict = {}
fdict['name'] = ['andrew', 'karl', 'jhon', 'jack', 'bob', 'mike']
fdict['education'] = ['high', 'middle', 'elementary']
fdict['city'] = ['lviv', 'kyiv', 'kharkiv']

filter = {}
for key in fdict:
    filter[key] = {}

for key1 in fdict:
    regexx = '|'.join(fdict[key1])
    for key2 in fdict:
        filter[key2][key1] = ydata[key2].str.contains(regexx, regex=True, na=False)


indata = {}
for key in fdict:
    indata[key] = {}

for key1 in fdict:
    for key2 in fdict:
        if key1 != key2:
            indata[key1][key2] = ydata.loc[filter[key1][key2], key1]

for key in fdict:
    fn = ~filter[key][key]
    ydata.loc[fn, key] = np.nan


for key1 in fdict:
    for key2 in fdict:
        if key1 != key2:
            ydata.loc[filter[key1][key2], key2] = indata[key1][key2]

這就是我創建的代碼,對我有用:

x = pd.DataFrame({'name':['andrew', 'karl', 'jhon', 'jack', 'bob', 'high'], 'education': ['high', 'middle', 'lviv', 'high', 'elementary', 'kyiv'], 'city':['lviv', 'kyiv', 'elementary', 'kharkiv', 'kyiv', 'mike']})

ydata = pd.DataFrame(x)

fdict = {}
fdict['name'] = ['andrew', 'karl', 'jhon', 'jack', 'bob', 'mike']
fdict['education'] = ['high', 'middle', 'elementary']
fdict['city'] = ['lviv', 'kyiv', 'kharkiv']

filter = {}
for key in fdict:
    filter[key] = {}

for key1 in fdict:
    regexx = '|'.join(fdict[key1])
    for key2 in fdict:
        filter[key2][key1] = ydata[key2].str.contains(regexx, regex=True, na=False)


indata = {}
for key in fdict:
    indata[key] = {}

for key1 in fdict:
    for key2 in fdict:
        if key1 != key2:
            indata[key1][key2] = ydata.loc[filter[key1][key2], key1]

for key in fdict:
    fn = ~filter[key][key]
    ydata.loc[fn, key] = np.nan


for key1 in fdict:
    for key2 in fdict:
        if key1 != key2:
            ydata.loc[filter[key1][key2], key2] = indata[key1][key2]

其他解決方案:

cols_list = []
for col in ydata.columns:
  ydata[col] = ydata[col].apply(lambda x: {key: x for key in fdict.keys() if x in fdict[key]})
ydata_list = ydata.to_dict('split')['data']

# dataset reconstraction
update_ydata_list = []
for y in ydata_list:
  temp_dict = {}
  for d in y:
      temp_dict.update(d)
  update_ydata_list.append(temp_dict.copy())

# final result
pd.DataFrame(update_ydata_list)

您可以在完整數據集上對其進行測試

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM