简体   繁体   English

如何避免熊猫的 concate 和 to_csv 函数中的空集?

[英]How to avoid empty set in pandas's concate and to_csv function?

I have a dictionary to be stored in csv through pandas:我有一个字典要通过 Pandas 存储在 csv 中:

df = pd.concat([pd.Series(node_dict[k], name=k) for k in HEADERS], 1)
df.to_csv(os.path.join(abspath, outputfile), sep='\t', index=False)

The keys correspond to the columns in the CSV or pandas frame, and the values are a list of sets.键对应于 CSV 或 Pandas 框架中的列,值是一组列表。 Each set is the current row's values.每组都是当前行的值。 Let's see if I have two columns:让我们看看我是否有两列:

   names                     companies                      
{'john', 'smith', 'mary'}   {'ms', 'fb'} 
 set()                      {'ms', 'fb', 'tw', 'g', 'lk'}
 ...                         ...

Some rows's values are empty, as indicated by the set() printout in the file.某些行的值为空,如文件中的 set() 打印输出所示。 I hope there is a way to modify this line:我希望有一种方法可以修改这一行:

[pd.Series(node_dict[k], name=k) for k in HEADERS]

to write the invisible '' into the file, instead of the string 'set()'.将不可见的 '' 写入文件,而不是字符串 'set()'。

Sample of the dict:字典示例:

node_dict['names'] = [{'john', 'smith', 'mary'}, {}]
node_dict['companies'] = [{'ms', 'fb'}, {'ms', 'fb', 'tw', 'g', 'lk'} ]

Of course the actual lists are much longer in the dictionary.当然,字典中的实际列表要长得多。

I think you can do something like:我认为您可以执行以下操作:

node_dict = {k: [x if x else "invisible" for x in v] for k,v in node_dict.items()}

prior to doing [pd.Series(node_dict[k], name=k) for k in HEADERS]在做[pd.Series(node_dict[k], name=k) for k in HEADERS]

You can just drop all the {} .您可以删除所有{} Convert the dict to a string , drop and re-evaluate as dictionary.dict转换为string ,删除并重新评估为字典。 Done.完毕。

df = pd.concat([pd.Series(eval(str(node_dict[k]).replace('{}',' ')), name=k) for k in HEADERS], 1)
df
                 names            companies
0  {john, mary, smith}             {fb, ms}
1                  NaN  {g, ms, lk, fb, tw}

Even works with trailing , in the dictionary.甚至可以在字典中使用尾随, df.to_csv() evaluates the NaN automatically as empty string DataFrame.to_csv(path, sep: str = ',', na_rep: str = ''...) df.to_csv()自动将 NaN 评估为空字符串DataFrame.to_csv(path, sep: str = ',', na_rep: str = ''...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM