[英]Python Pandas change column values to NULL and return to its original values
I am creating on a Python script that will automate in changing the column values to NULL
before sending it via e-mail. 我正在创建一个Python脚本,该脚本会自动将列值更改为
NULL
然后再通过电子邮件发送。
My goal is to temporarily change some column values due to its confidential data. 我的目标是由于机密数据而暂时更改某些列值。 Here is what it looks like:
看起来像这样:
svc_no last_name first_name acc_no some_column
12345 Parker Peter 1111111 some_value
11111 Stark Tony 2222222 some_value
22222 Rogers Steve 3333333 some_value
I have multiple Excel files and I will be sending the Excel files to someone. 我有多个Excel文件,我将把Excel文件发送给某人。 That someone will be doing some processing on those Excel files but before I send it via e-mail, I need to change some column values to
NULL
due to its confidentiality. 有人会对这些Excel文件进行一些处理,但是在我通过电子邮件发送它之前,由于其机密性,我需要将某些列值更改为
NULL
。
My desired output will be like this: 我想要的输出将是这样的:
svc_no last_name first_name acc_no some_column
12345 NULL NULL NULL some_value
11111 NULL NULL NULL some_value
22222 NULL NULL NULL some_value
Here is what I did: 这是我所做的:
I iterate all the files and get the path of the directory to back-up all the Excel files which I plan to use as a reference for later in returning the original values of the columns. 我迭代所有文件并获取目录的路径以备份所有Excel文件,这些文件我计划用作以后返回列的原始值的参考。 I used os ,* shutil** and glob libraries.
我使用了os ,* shutil **和glob库。
path = os.path.absolute(__file__) new_path = path + 'source' files = [] if not os.path.exists(new_path): os.makedirs(new_path) for file in files: if file not in new_path: shutil.copy(file, new_path) # line continue in number 2 list
These codes will create a folder in the same directory as the script and copy the all the Excel files in the newly created directory which is new_path
. 这些代码将在与脚本相同的目录中创建一个文件夹,并将所有Excel文件复制到新创建的目录
new_path
。
Now, I declare each Excel file to be a DataFrame and change the column values to NULL
using .loc
: 现在,我将每个Excel文件声明为一个DataFrame,然后使用
.loc
将列值更改为NULL
:
df = pd.read_excel(file) df.loc[df['l_name'].notnull(), 'last_name'] = 'NULL'
I also tried inserting a column that contains NULL
values and copy the it to the desired column using iloc
although nothing also happened. 我也尝试插入包含
NULL
值的列,并使用iloc
将其复制到所需的列,尽管也没有发生任何事情。 It also did not create the column. 它还没有创建列。
df.insert(loc=5, column='empty_column', value='NULL')
df.iloc[:,1] = df.iloc[:,5]
My problem is that it doesn't change the last_name
column values to NULL
. 我的问题是它不会将
last_name
列的值更改为NULL
。 Is there another way to this? 还有另一种方法吗?
I have already used .iloc
and .loc
in some of my projects and they are working but I am confused here why they are not doing anything. 我已经在一些项目中使用了
.iloc
和.loc
,它们正在工作,但是我在这里感到困惑,为什么他们没有做任何事情。
Any help will be highly appreciated. 任何帮助将不胜感激。
I really don't see the issue here. 我真的看不到这里的问题。 You seem to be overcomplicating things.
您似乎使事情复杂化了。 Would this not suffice:
这不够吗:
df
0 12345 Parker Peter 1111111 some_value
1 11111 Stark Tony 2222222 some_value
2 22222 Rogers Steve 3333333 some_value
Create a confidential version: 创建一个机密版本:
confidential_columns = ['last_name', 'first_name', 'acc_no']
confidential_df = df.copy()
confidential_df[confidential_columns] = 'NULL'
You get this: 你得到这个:
confidential_df
0 12345 NULL NULL NULL some_value
1 11111 NULL NULL NULL some_value
2 22222 NULL NULL NULL some_value
Then decide which on to write based off of some decision: 然后根据一些决定来决定写在哪个:
confidential = True
def write()
writer = pd.ExcelWriter('output.xlsx')
if confidential:
confidential_df.to_excel(writer, sheet_name='report')
else:
df.to_excel(writer, sheet_name='report')
write()
I'm not going to deal with path/file/directory management when it comes time to write because that seems like it's out of the scope of your issue. 我不打算写路径/文件/目录管理,因为这似乎超出了您的讨论范围。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.